FAQ
What is the easiest way to reorder a sequence pseudo-randomly?


That is, for a sequence 1,2,3,4 to produce an arbitrary ordering (eg
2,1,4,3) that is different each time.


I'm writing a simulation and would like to visit all the nodes in a
different order at each iteration of the simulation to remove the risk
of a fixed order introducing spurious evidence of correlation.

Search Discussions

  • Chris Angelico at May 24, 2013 at 8:37 am

    On Fri, May 24, 2013 at 6:14 PM, Peter Brooks wrote:
    What is the easiest way to reorder a sequence pseudo-randomly?

    That is, for a sequence 1,2,3,4 to produce an arbitrary ordering (eg
    2,1,4,3) that is different each time.

    I'm writing a simulation and would like to visit all the nodes in a
    different order at each iteration of the simulation to remove the risk
    of a fixed order introducing spurious evidence of correlation.

    Permuting a sequence iteratively to cover every possibility? Good fun.


    Here's one way of looking at it. Imagine the indices of all elements
    are in some special "base" like so:


    [a, b, c, d] --> a*4+b*3+c*2+d*1


    Then iterate up to the highest possible value (ie 4*3*2*1), picking
    indices for each accordingly. I don't know how efficient this will be,
    but here's a simple piece of code to do it:

    def permute(lst,pos):
      lst=lst[:] # Take a copy
      ret=[None]*len(lst)
      for i in range(len(lst)):
       pos,idx=divmod(pos,len(lst))
       ret[i]=lst[idx]
       del lst[idx]
      return ret

    for i in range(4*3*2*1):
      permute([10,20,30,40],i)




    [10, 20, 30, 40]
    [20, 10, 30, 40]
    [30, 10, 20, 40]
    [40, 10, 20, 30]
    [10, 30, 20, 40]
    [20, 30, 10, 40]
    [30, 20, 10, 40]
    [40, 20, 10, 30]
    [10, 40, 20, 30]
    [20, 40, 10, 30]
    [30, 40, 10, 20]
    [40, 30, 10, 20]
    [10, 20, 40, 30]
    [20, 10, 40, 30]
    [30, 10, 40, 20]
    [40, 10, 30, 20]
    [10, 30, 40, 20]
    [20, 30, 40, 10]
    [30, 20, 40, 10]
    [40, 20, 30, 10]
    [10, 40, 30, 20]
    [20, 40, 30, 10]
    [30, 40, 20, 10]
    [40, 30, 20, 10]


    It works, it produces a unique list for any given index provided, but
    it's not the cleanest or most efficient. But I know someone'll improve
    on it... or tell me I'm an idiot for not taking a more obvious
    approach :)


    ChrisA
  • Fábio Santos at May 24, 2013 at 8:47 am

    On 24 May 2013 09:41, "Chris Angelico" wrote:
    On Fri, May 24, 2013 at 6:14 PM, Peter Brooks
    wrote:
    What is the easiest way to reorder a sequence pseudo-randomly?

    That is, for a sequence 1,2,3,4 to produce an arbitrary ordering (eg
    2,1,4,3) that is different each time.
    ...
    It works, it produces a unique list for any given index provided, but
    it's not the cleanest or most efficient. But I know someone'll improve
    on it... or tell me I'm an idiot for not taking a more obvious
    approach :)

    ChrisA

    I think that is pretty much itertools.permutations from the standard
    library. The OP should check it out.
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/python-list/attachments/20130524/40d451ba/attachment.html>
  • Chris Angelico at May 24, 2013 at 9:11 am

    On Fri, May 24, 2013 at 6:47 PM, F?bio Santos wrote:
    On 24 May 2013 09:41, "Chris Angelico" wrote:

    On Fri, May 24, 2013 at 6:14 PM, Peter Brooks
    wrote:
    What is the easiest way to reorder a sequence pseudo-randomly?

    That is, for a sequence 1,2,3,4 to produce an arbitrary ordering (eg
    2,1,4,3) that is different each time.
    ...
    It works, it produces a unique list for any given index provided, but
    it's not the cleanest or most efficient. But I know someone'll improve
    on it... or tell me I'm an idiot for not taking a more obvious
    approach :)

    ChrisA
    I think that is pretty much itertools.permutations from the standard
    library. The OP should check it out.

    That works if all the permutations are wanted at once. Is there a way,
    short of iterating over it N times, to request permutation #N? Or
    maybe I'm misreading the OP and that's not a requirement.


    ChrisA
  • Duncan smith at May 24, 2013 at 2:33 pm

    On 24/05/13 10:11, Chris Angelico wrote:
    On Fri, May 24, 2013 at 6:47 PM, F?bio Santos wrote:
    On 24 May 2013 09:41, "Chris Angelico" wrote:

    On Fri, May 24, 2013 at 6:14 PM, Peter Brooks
    wrote:
    What is the easiest way to reorder a sequence pseudo-randomly?

    That is, for a sequence 1,2,3,4 to produce an arbitrary ordering (eg
    2,1,4,3) that is different each time.
    ...
    It works, it produces a unique list for any given index provided, but
    it's not the cleanest or most efficient. But I know someone'll improve
    on it... or tell me I'm an idiot for not taking a more obvious
    approach :)

    ChrisA
    I think that is pretty much itertools.permutations from the standard
    library. The OP should check it out.
    That works if all the permutations are wanted at once. Is there a way,
    short of iterating over it N times, to request permutation #N? Or
    maybe I'm misreading the OP and that's not a requirement.

    ChrisA

    A long time ago I wrote some code to do that.




    import gmpy


    def LexPermFromIndex(items, index):
          n = len(items)
          inds = range(n)
          perm = []
          for i in range(1, n+1):
              r, index = divmod(index, gmpy.fac(n-i))
              r = int(r)
              perm.append(inds[r])
              inds = inds[:r] + inds[r+1:]


          return [items[i] for i in perm]



    LexPermFromIndex([1,2,3,4], 0)
    [1, 2, 3, 4]
    LexPermFromIndex([1,2,3,4], 1)
    [1, 2, 4, 3]
    LexPermFromIndex([1,2,3,4], 10)
    [2, 4, 1, 3]
      >>>




    I can't remember exactly why I wrote it. But I also have something for
    generating a permutation's index and similar functions for combinations.


    Duncan
  • Terry Jan Reedy at May 24, 2013 at 10:10 am

    On 5/24/2013 4:14 AM, Peter Brooks wrote:
    What is the easiest way to reorder a sequence pseudo-randomly?

    That is, for a sequence 1,2,3,4 to produce an arbitrary ordering (eg
    2,1,4,3) that is different each time.

    I'm writing a simulation and would like to visit all the nodes in a
    different order at each iteration of the simulation to remove the risk
    of a fixed order introducing spurious evidence of correlation.

    random module has a shuffle function I believe
  • Steven D'Aprano at May 24, 2013 at 10:52 am

    On Fri, 24 May 2013 01:14:45 -0700, Peter Brooks wrote:


    What is the easiest way to reorder a sequence pseudo-randomly?

    import random
    random.shuffle(sequence)




    The sequence is modified in place, so it must be mutable. Lists are okay,
    tuples are not.



    That is, for a sequence 1,2,3,4 to produce an arbitrary ordering (eg
    2,1,4,3) that is different each time.

    You can't *guarantee* that it will be different each time. With a four-
    item list, there are only 4! = 24 combinations, so on average you'll get
    the original order one time in 24. For a ten-item list, that is once
    every 3628800 times, and for a twenty-item list, once in
    2432902008176640000 times. But of course these are *random*, and there's
    always a chance of this:


    http://dilbert.com/strips/comic/2001-10-25




    --
    Steven
  • Ned Batchelder at May 24, 2013 at 11:26 am

    On 5/24/2013 6:52 AM, Steven D'Aprano wrote:
    On Fri, 24 May 2013 01:14:45 -0700, Peter Brooks wrote:

    What is the easiest way to reorder a sequence pseudo-randomly?
    import random
    random.shuffle(sequence)


    The sequence is modified in place, so it must be mutable. Lists are okay,
    tuples are not.

    That is, for a sequence 1,2,3,4 to produce an arbitrary ordering (eg
    2,1,4,3) that is different each time.
    You can't *guarantee* that it will be different each time. With a four-
    item list, there are only 4! = 24 combinations, so on average you'll get
    the original order one time in 24. For a ten-item list, that is once
    every 3628800 times, and for a twenty-item list, once in
    2432902008176640000 times. But of course these are *random*, and there's
    always a chance of this:

    http://dilbert.com/strips/comic/2001-10-25

    Also, heed the note in the docs: "Note that for even rather small
    len(x), the total number of permutations of /x/ is larger than the
    period of most random number generators; this implies that most
    permutations of a long sequence can never be generated." The default
    random number generator has a period of 2**19937-1, which means that
    lists longer than 2080 have more permutations than the period of the
    generator (because factorial(2081) > 2**19937). Most shuffles involve
    lists far shorter than 2080, but it's still good to keep it in mind.


    --Ned.
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/python-list/attachments/20130524/630b2e0b/attachment.html>
  • Peter Brooks at May 24, 2013 at 1:23 pm
    Thank you all for those most helpful suggestions! random.shuffle does
    precisely the job that I need quickly. Thank you for introducing me to
    itertools, though, I should have remembered APL did this in a symbol
    or two and I'm sure that itertools will come in handy in future.


    Thanks for the warnings about random numbers too - I hope my lists
    will be short enough for the permutations of the function to be
    irrelevant. I don't need every single sequence to be unique, only that
    the same sequence only occurs occasionally. My lists might get up to
    the ~2k length one day, and I'll keep in mind the need, at that stage,
    to use a different pseudo-random generator. Actually, thinking about
    it, there is probably a source of non-algorithmically-derived 'random'
    numbers somewhere on the net that would do the job nicely.
  • Steven D'Aprano at May 24, 2013 at 1:57 pm

    On Fri, 24 May 2013 06:23:14 -0700, Peter Brooks wrote:




    Thanks for the warnings about random numbers too - I hope my lists will
    be short enough for the permutations of the function to be irrelevant. I
    don't need every single sequence to be unique, only that the same
    sequence only occurs occasionally. My lists might get up to the ~2k
    length one day, and I'll keep in mind the need, at that stage, to use a
    different pseudo-random generator. Actually, thinking about it, there is
    probably a source of non-algorithmically-derived 'random' numbers
    somewhere on the net that would do the job nicely.

    That's massive overkill.


    Take a closer look at what Ned wrote:


    "The default random number generator has a period of 2**19937-1"


    and consider the numbers.


    For a list with 3000 items, there are 3000! possible permutations. That
    is approximately 10**21024. That is, a number with 21024 digits, or
    somewhere around a trillion trillion trillion ... trillion trillion,
    where there are 1752 "trillions".


    If you could generate a million permutations a second, it would take you
    on average 10**210988 centuries before getting the original permutation
    again. That's the expected result you would get with a source of true
    randomness.


    Instead, with Python's default pseudo-random number generator, you cannot
    get the full 3000! permutations, but only 2**19937-1 of them. Using the
    same calculation as above, that means that you will have to generate a
    million permutations per second for "only" 10**13783 centuries before
    getting the original permutation again.


    There are uses where being able to generate any possible permutation is
    important, and the default PRNG is not sufficient. But merely shuffling
    your data around to avoid spurious correlations is not one of them. Save
    yourself a lot of development effort, and debugging, and just use
    random.shuffle.




    --
    Steven
  • Robert Kern at Jun 12, 2013 at 1:14 pm

    On 2013-05-24 14:43, Chris Angelico wrote:
    On Fri, May 24, 2013 at 11:23 PM, Peter Brooks
    wrote:
    Actually, thinking about
    it, there is probably a source of non-algorithmically-derived 'random'
    numbers somewhere on the net that would do the job nicely.
    True entropy is usually provided by a source such as /dev/random (on
    Unix systems). It's sometimes referred to as "cryptographic"
    randomness, due to its necessity in secure encryption work. There are
    various ways to get this in a cross-platform way.

    os.random() and os.urandom(), particularly.


    --
    Robert Kern


    "I have come to believe that the whole world is an enigma, a harmless enigma
       that is made terrible by our own mad attempt to interpret it as though it had
       an underlying truth."
        -- Umberto Eco
  • John Ladasky at May 24, 2013 at 5:33 pm

    On Friday, May 24, 2013 3:52:18 AM UTC-7, Steven D'Aprano wrote:
    On Fri, 24 May 2013 01:14:45 -0700, Peter Brooks wrote:

    That is, for a sequence 1,2,3,4 to produce an arbitrary ordering (eg
    2,1,4,3) that is different each time.
    You can't *guarantee* that it will be different each time.

    Well, within limits, you can guarantee a LONG TIME between repeats. And that may be good enough for Peter's purpose.


    When the number of elements in your sequence is short enough (the right value of "short enough" is at least partially a matter of opinion, and the amount of RAM available, but I would guess n < 10 myself), use itertools.permutations to generate all the possibilities at once -- call this list p. Store p in memory. Then shuffle p, and use its elements one at a time. If you get to the end of p and need to keep working, it's up to you whether to shuffle p a second time. If you don't reshuffle p, it guarantees the maximum interval between reusing the same permutation.


    Use random.shuffle on your sequence directly, when your sequence has a larger number of elements. This doesn't guarantee that two successive permutations will not be identical, but the probability is of course very low.
  • John Ladasky at May 26, 2013 at 1:25 am

    On Friday, May 24, 2013 10:33:47 AM UTC-7, Yours Truly wrote:
    If you don't reshuffle p, it guarantees the maximum interval between reusing
    the same permutation.

    Of course, that comes at a certain price. Given two permutations p[x] and p[x+1], they will ALWAYS be adjacent, in every repetition of the list, unless you reshuffle. So the "spurious correlation" problem that Peter is worrying about would still exist, albeit at a meta-level.


    You could play clever games, such as splitting p in half once you get to the end of it, shuffling each half independently, and then concatenating the halves. This algorithm scrambles the p[x]'s and p[x+1]'s pretty well, at the cost of cutting the average interval between repeats of a given p[x] from len(p) to something closer to len(p)/2.


    Because someone's got to say it... "The generation of random numbers is too important to be left to chance." ? Robert R. Coveyou
  • Roy Smith at May 26, 2013 at 1:49 am
    In article <15a1bb3a-514c-454e-a966-243c84123be9@googlegroups.com>,
      John Ladasky wrote:

    Because someone's got to say it... "The generation of random numbers is too
    important to be left to chance." ? Robert R. Coveyou

    Absolutely. I know just enough about random number generation to
    understand that I don't really know anything about it :-)


    That being said, people who really care about random numbers, tend to
    rely on some sort of physical process instead of computer algorithms. A
    classic example would be /dev/random. A somewhat more fun example is
    http://www.youtube.com/watch?v=7n8LNxGbZbs. Something radioactive and a
    geiger counter are a good source of randomness (time intervals between
    decay events).
  • Carlos Nepomuceno at May 24, 2013 at 3:00 pm
    ----------------------------------------
    Date: Fri, 24 May 2013 01:14:45 -0700
    Subject: Simple algorithm question - how to reorder a sequence economically
    From: peter.h.m.brooks at gmail.com
    To: python-list at python.org

    What is the easiest way to reorder a sequence pseudo-randomly?

    That is, for a sequence 1,2,3,4 to produce an arbitrary ordering (eg
    2,1,4,3) that is different each time.

    I'm writing a simulation and would like to visit all the nodes in a
    different order at each iteration of the simulation to remove the risk
    of a fixed order introducing spurious evidence of correlation.
    --
    http://mail.python.org/mailman/listinfo/python-list

    I don't know what "spurious evidence of correlation" is. Can you give a mathematical definition?


    Here's a snippet for creating a random shuffle by Fisher?Yates algorithm:


    def FY_shuffle(l):
    ??? from random import randint
    ??? for i in range(len(l)-1,0,-1):
    ??????? j = randint(0,i)
    ??????? l[j],l[i] = l[i],l[j]


    It looks just like random.shuffle() mentioned before, but you can change it as you see fit.


    If you can afford to test all permutations you can iterate over it by doing:

    from itertools import permutations
    l=range(4)
    l
    [0, 1, 2, 3]
    for i in permutations(l): print i
    ...
    (0, 1, 2, 3)
    (0, 1, 3, 2)
    (0, 2, 1, 3)
    [...]
    (3, 1, 2, 0)
    (3, 2, 0, 1)
    (3, 2, 1, 0)
    >>>


    Note that 'i' is a tuple.


    If you need a list of all permutations to make a selection:

    l=range(4)
    l
    [0, 1, 2, 3]
    [list(i) for i in permutations(l)]
    [[0, 1, 2, 3], [0, 1, 3, 2], [0, 2, 1, 3], [0, 2, 3, 1], [0, 3, 1, 2], [0, 3, 2, 1], [1, 0, 2, 3], [1, 0, 3, 2], [1, 2, 0, 3], [1, 2, 3, 0], [1, 3, 0, 2], [1, 3, 2, 0], [2, 0, 1, 3], [2, 0, 3, 1], [2, 1, 0, 3], [2, 1, 3, 0], [2, 3, 0, 1], [2, 3, 1, 0], [3, 0, 1, 2], [3, 0, 2, 1], [3, 1, 0, 2], [3, 1, 2, 0], [3, 2, 0, 1], [3, 2, 1, 0]]


    This will produce big lists:
    -for 10 elements (l=range(10)) the size of the list is about 30MB (on Windows).
    -for 11 elements (l=range(11)) the size of the list is about 335MB (on Windows). It took more than 7GB for CPython 2.7.5 to create that list.


    Didn't try after that.
  • Peter Brooks at May 24, 2013 at 7:01 pm

    On May 24, 5:00?pm, Carlos Nepomuceno wrote:

    I don't know what "spurious evidence of correlation" is. Can you give a mathematical definition?
    If I run the simulation with the same sequence, then, because event E1
    always comes before event E2, somebody might believe that there is a
    causative connection between them in the world that's being simulated,
    when, in fact, they only correlate in this way because the sequence is
    not being shuffled. That's what it means.


    Actually it'll be a bit more subtle than that, because each iteration
    of the simulation updates all nodes in one time interval, the events
    will not usually show the order of iteration - but, where there are
    any secondary effects, that are related to the order in which the
    nodes are updated, these will always happen the same way, which is my
    concern.
  • Carlos Nepomuceno at May 24, 2013 at 9:33 pm
    ----------------------------------------
    Date: Fri, 24 May 2013 12:01:35 -0700
    Subject: Re: Simple algorithm question - how to reorder a sequence economically
    From: peter.h.m.brooks at gmail.com
    To: python-list at python.org
    On May 24, 5:00 pm, Carlos Nepomuceno wrote:


    I don't know what "spurious evidence of correlation" is. Can you give a mathematical definition?
    If I run the simulation with the same sequence, then, because event E1
    always comes before event E2, somebody might believe that there is a
    causative connection between them in the world that's being simulated,
    when, in fact, they only correlate in this way because the sequence is
    not being shuffled. That's what it means.

    Correlation does not imply causation. If "somebody" is an expert system and you want to avoid it's recognition and/or triggering of some kind, and you can't or don't want to change it's behavior, you may take the random way because it's cheaper.

    Actually it'll be a bit more subtle than that, because each iteration
    of the simulation updates all nodes in one time interval, the events
    will not usually show the order of iteration - but, where there are
    any secondary effects, that are related to the order in which the
    nodes are updated, these will always happen the same way, which is my
    concern.

    You should have a more precise understanding of the dependence of the variables you taking in consideration before randomizing the series of events your are using for tests.


    I suggest you start using PPMCC. If it's close to zero or negative you wouldn't have to mind about it! ;)
  • Peter Brooks at May 25, 2013 at 12:28 am

    On May 24, 11:33?pm, Carlos Nepomuceno wrote:
    ----------------------------------------








    Date: Fri, 24 May 2013 12:01:35 -0700
    Subject: Re: Simple algorithm question - how to reorder a sequence economically
    From: peter.h.m.bro... at gmail.com
    To: python-l... at python.org
    On May 24, 5:00 pm, Carlos Nepomuceno <carlosnepomuc...@outlook.com>
    wrote:
    I don't know what "spurious evidence of correlation" is. Can you give a mathematical definition?
    If I run the simulation with the same sequence, then, because event E1
    always comes before event E2, somebody might believe that there is a
    causative connection between them in the world that's being simulated,
    when, in fact, they only correlate in this way because the sequence is
    not being shuffled. That's what it means.
    Correlation does not imply causation. If "somebody" is an expert system and you want to avoid it's recognition and/or triggering of some kind, and you can't or don't want to change it's behavior, you may take the random way because it's cheaper.
    Actually it'll be a bit more subtle than that, because each iteration
    of the simulation updates all nodes in one time interval, the events
    will not usually show the order of iteration - but, where there are
    any secondary effects, that are related to the order in which the
    nodes are updated, these will always happen the same way, which is my
    concern.
    You should have a more precise understanding of the dependence of the variables you taking in consideration before randomizing the series of events your are using for tests.
    If the scenario could be modelled mathematically, then there'd be no
    point in writing the simulation.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedMay 24, '13 at 8:14a
activeJun 12, '13 at 1:14p
posts18
users11
websitepython.org

People

Translate

site design / logo © 2018 Grokbase