FAQ
Hi,
How can I efficiently pick a random element from a sorted-set?

If I try rand-nth I get this:
user=> (rand-nth (sorted-set 1 2 3))
java.lang.UnsupportedOperationException: nth not supported on this
type: PersistentTreeSet (NO_SOURCE_FILE:0)

I can get this expression to work if I naively apply seq:
user=> (rand-nth (seq (sorted-set 1 2 3)))
1

However this performs very badly on large sets. Is there a more
efficient way to do this?

(I need to keep my elements in a sorted-set for other parts of the
application, where I rely on subseq.)

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+unsubscribe@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Search Discussions

  • Michael Gardner at Sep 26, 2011 at 4:18 pm

    On Sep 26, 2011, at 8:12 AM, Paul Richards wrote:

    How can I efficiently pick a random element from a sorted-set?
    If your sorted set is densely packed (if most possible values do appear in the set), then you could just pick a random value, see if it's in the set, and repeat until you find a match.

    Otherwise, you could use a sorted-vector. I don't know if there's a good implementation out there, though, so you might have to roll your own.

    Or you could use a sorted-set that can do a "random binary search", where it chooses between the first and second halves at random rather than by comparison with a particular value. Maybe you could subclass the Clojure sorted-set to do this, though I'm guessing it's not made for subclassing.

    --
    You received this message because you are subscribed to the Google
    Groups "Clojure" group.
    To post to this group, send email to clojure@googlegroups.com
    Note that posts from new members are moderated - please be patient with your first post.
    To unsubscribe from this group, send email to
    clojure+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/clojure?hl=en
  • Benny Tsai at Sep 26, 2011 at 5:04 pm
    The reason that (rand-nth (seq (sorted-set 1 2 3))) performs badly on large
    sets is probably because nth is O(n) on sequences. nth is much much faster
    on vectors, so I would suggest trying out (rand-nth (vec (sorted-set 1 2
    3))) and see if that works for your application.

    --
    You received this message because you are subscribed to the Google
    Groups "Clojure" group.
    To post to this group, send email to clojure@googlegroups.com
    Note that posts from new members are moderated - please be patient with your first post.
    To unsubscribe from this group, send email to
    clojure+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/clojure?hl=en
  • Paul Richards at Sep 27, 2011 at 1:46 am

    On Sep 26, 6:04 pm, Benny Tsai wrote:
    The reason that (rand-nth (seq (sorted-set 1 2 3))) performs badly on large
    sets is probably because nth is O(n) on sequences.  nth is much much faster
    on vectors, so I would suggest trying out (rand-nth (vec (sorted-set 1 2
    3))) and see if that works for your application.
    This will replace an O(n) call to "nth" with an O(n) call to copy into
    a vector, so still leaving me with O(n).

    --
    You received this message because you are subscribed to the Google
    Groups "Clojure" group.
    To post to this group, send email to clojure@googlegroups.com
    Note that posts from new members are moderated - please be patient with your first post.
    To unsubscribe from this group, send email to
    clojure+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/clojure?hl=en
  • Benny Tsai at Sep 27, 2011 at 4:04 am

    On Monday, September 26, 2011 2:58:59 PM UTC-6, Paul Richards wrote:
    This will replace an O(n) call to "nth" with an O(n) call to copy into
    a vector, so still leaving me with O(n).
    Oops, right :) If you're getting random elements out of the same sorted set
    multiple times, then it might be still be worth it to pay the cost of
    vectorizing the set once in exchange for faster subsequent random
    selections. But if not, then I guess not so much.

    --
    You received this message because you are subscribed to the Google
    Groups "Clojure" group.
    To post to this group, send email to clojure@googlegroups.com
    Note that posts from new members are moderated - please be patient with your first post.
    To unsubscribe from this group, send email to
    clojure+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/clojure?hl=en
  • Jeremy Heiler at Sep 26, 2011 at 5:13 pm

    On Mon, Sep 26, 2011 at 9:12 AM, Paul Richards wrote:
    Hi,
    How can I efficiently pick a random element from a sorted-set?

    If I try rand-nth I get this:
    user=> (rand-nth (sorted-set 1 2 3))
    java.lang.UnsupportedOperationException: nth not supported on this
    type: PersistentTreeSet (NO_SOURCE_FILE:0)

    I can get this expression to work if I naively apply seq:
    user=> (rand-nth (seq (sorted-set 1 2 3)))
    1

    However this performs very badly on large sets.  Is there a more
    efficient way to do this?

    (I need to keep my elements in a sorted-set for other parts of the
    application, where I rely on subseq.)
    Try just getting the value with rand-int directly. The sorted-set uses
    a tree map underneath, so look up time is consistent with a map. Also,
    count is O(1).

    (get foo (rand-int (count foo)))

    --
    You received this message because you are subscribed to the Google
    Groups "Clojure" group.
    To post to this group, send email to clojure@googlegroups.com
    Note that posts from new members are moderated - please be patient with your first post.
    To unsubscribe from this group, send email to
    clojure+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/clojure?hl=en
  • Paul Richards at Sep 27, 2011 at 1:46 am

    On Sep 26, 6:13 pm, Jeremy Heiler wrote:
    On Mon, Sep 26, 2011 at 9:12 AM, Paul Richards wrote:
    Hi,
    How can I efficiently pick a random element from a sorted-set?
    If I try rand-nth I get this:
    user=> (rand-nth (sorted-set 1 2 3))
    java.lang.UnsupportedOperationException: nth not supported on this
    type: PersistentTreeSet (NO_SOURCE_FILE:0)
    I can get this expression to work if I naively apply seq:
    user=> (rand-nth (seq (sorted-set 1 2 3)))
    1
    However this performs very badly on large sets.  Is there a more
    efficient way to do this?
    (I need to keep my elements in a sorted-set for other parts of the
    application, where I rely on subseq.)
    Try just getting the value with rand-int directly. The sorted-set uses
    a tree map underneath, so look up time is consistent with a map. Also,
    count is O(1).

    (get foo (rand-int (count foo)))
    I feel I picked a bad example. My sorted-set does not contain
    integers, the elements are (collections of) strings. Trying this
    approach leads to a different failure:

    user=> (get (sorted-set "a" "b" "c") 1)
    java.lang.ClassCastException: java.lang.String cannot be cast to
    java.lang.Number (NO_SOURCE_FILE:0)

    --
    You received this message because you are subscribed to the Google
    Groups "Clojure" group.
    To post to this group, send email to clojure@googlegroups.com
    Note that posts from new members are moderated - please be patient with your first post.
    To unsubscribe from this group, send email to
    clojure+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/clojure?hl=en
  • Paul Richards at Sep 27, 2011 at 1:46 am

    On Sep 26, 2:12 pm, Paul Richards wrote:
    Hi,
    How can I efficiently pick a random element from a sorted-set?
    I've come up with a bit of a hack, which relies on me not caring about
    non-uniform distributions. If I create a custom comparator with a
    "random" backdoor, I can select random elements from the sorted set:

    (defn my-comp [a b]
    (if (or (= :random a) (= :random b))
    (- (* 2 (rand-int 2)) 1)
    (compare a b)))

    ; Create test collection (of strings) using the special comparator
    (def coll (apply sorted-set-by my-comp (map str (range 1 1000))))

    (map #(first (subseq coll > %)) ["500" "200" "700" :random :random])
    ; Result: ("501" "201" "701" "626" "286")



    Bonus question.. What is the distribution of the random selection?

    (I suspect elements will be chosen with some probability like (1/2)^H
    (where H is the height of the element within the red-black tree). I
    should probably generate a histogram to find out..)

    --
    You received this message because you are subscribed to the Google
    Groups "Clojure" group.
    To post to this group, send email to clojure@googlegroups.com
    Note that posts from new members are moderated - please be patient with your first post.
    To unsubscribe from this group, send email to
    clojure+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/clojure?hl=en
  • Alan Malloy at Sep 27, 2011 at 4:07 am

    On Sep 26, 2:12 pm, Paul Richards wrote:
    On Sep 26, 2:12 pm, Paul Richards wrote:

    Hi,
    How can I efficiently pick a random element from a sorted-set?
    I've come up with a bit of a hack, which relies on me not caring about
    non-uniform distributions.  If I create a custom comparator with a
    "random" backdoor, I can select random elements from the sorted set:

    (defn my-comp [a b]
    (if (or (= :random a) (= :random b))
    (- (* 2 (rand-int 2)) 1)
    (compare a b)))

    ; Create test collection (of strings) using the special comparator
    (def coll (apply sorted-set-by my-comp (map str (range 1 1000))))

    (map #(first (subseq coll > %)) ["500" "200" "700" :random :random])
    ; Result: ("501" "201" "701" "626" "286")

    Bonus question..  What is the distribution of the random selection?

    (I suspect elements will be chosen with some probability like (1/2)^H
    (where H is the height of the element within the red-black tree).  I
    should probably generate a histogram to find out..)
    I think it's complicated by what you expect "chosen" to mean. If you
    were randomly selecting a single element, you would only ever choose
    one at the bottom of the tree; the internal nodes would never be
    selected, right? Of course, you can't do that, since `get` does an
    equality check before returning the item. And if you're selecting a
    subseq, it depends on what test(s) you supply, and how many items you
    take, and...

    Anyway, this is sorta fun to generalize to wrap any other comparator:

    (defn hackable-comparator [f]
    (fn [& xs]
    (if (some #{:random} xs)
    (- (* 2 (rand-int 2)) 1)
    (apply f xs))))

    (def my-comp (hackable-comparator compare))

    --
    You received this message because you are subscribed to the Google
    Groups "Clojure" group.
    To post to this group, send email to clojure@googlegroups.com
    Note that posts from new members are moderated - please be patient with your first post.
    To unsubscribe from this group, send email to
    clojure+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/clojure?hl=en

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupclojure @
categoriesclojure
postedSep 26, '11 at 1:41p
activeSep 27, '11 at 4:07a
posts9
users5
websiteclojure.org
irc#clojure

People

Translate

site design / logo © 2022 Grokbase