FAQ
I need to simulate scenarios like the following: "You have a deck of
3 orange cards, 5 yellow cards, and 2 blue cards. You draw a card,
replace it, and repeat N times."

So, I wrote the following code, which works, but it seems quite slow
to me. Can anyone point out some obvious thing that I'm doing
inefficiently? Or, is this more or less as good as it gets?

For reference, my typical numbers look like this:

2 <= len(population) <= 7
4 <= len(mapping) <= 50
10 <= count <= 1000000

B.

<code>
#!/usr/bin/env python

import random

def randomDrawing(count, population):
"""Simulates drawing <count> items from <population>, with
replacement.
population is a list of lists: [[count1, type1], [count2,
type2], ...]

Typical examples:
randomDrawing(100, [[3, 'orange'], [5, 'yellow'], [2, 'blue']])
[[28, 'orange'], [57, 'yellow'], [15, 'blue']]
randomDrawing(100000, [[3, 'orange'], [5, 'yellow'], [2,
'blue']])
[[29923, 'orange'], [50208, 'yellow'], [19869, 'blue']]

"""
res = [[0, item[1]] for item in population]
mapping = []
for i in xrange(len(population)):
mapping.extend([i]*population[i][0])
for i in xrange(count):
index = random.choice(mapping)
res[index][0] += 1
return res

</code>

--
Brendon Towle, PhD
Cognitive Scientist
+1-412-690-2442x127
Carnegie Learning, Inc.
The Cognitive Tutor Company ?
Helping over 375,000 students in 1000 school districts succeed in math.

Search Discussions

  • David J. Braden at Sep 12, 2006 at 8:53 pm

    Brendon Towle wrote:
    I need to simulate scenarios like the following: "You have a deck of 3
    orange cards, 5 yellow cards, and 2 blue cards. You draw a card, replace
    it, and repeat N times."

    So, I wrote the following code, which works, but it seems quite slow to
    me. Can anyone point out some obvious thing that I'm doing
    inefficiently? Or, is this more or less as good as it gets?

    For reference, my typical numbers look like this:

    2 <= len(population) <= 7
    4 <= len(mapping) <= 50
    10 <= count <= 1000000

    B.

    <code>
    #!/usr/bin/env python

    import random

    def randomDrawing(count, population):
    """Simulates drawing <count> items from <population>, with replacement.
    population is a list of lists: [[count1, type1], [count2, type2], ...]

    Typical examples:
    randomDrawing(100, [[3, 'orange'], [5, 'yellow'], [2, 'blue']])
    [[28, 'orange'], [57, 'yellow'], [15, 'blue']]
    randomDrawing(100000, [[3, 'orange'], [5, 'yellow'], [2, 'blue']])
    [[29923, 'orange'], [50208, 'yellow'], [19869, 'blue']]

    """
    res = [[0, item[1]] for item in population]
    mapping = []
    for i in xrange(len(population)):
    mapping.extend([i]*population[i][0])
    for i in xrange(count):
    index = random.choice(mapping)
    res[index][0] += 1
    return res

    </code>

    --Brendon Towle, PhD
    Cognitive Scientist
    +1-412-690-2442x127
    Carnegie Learning, Inc.
    The Cognitive Tutor Company ?
    Helping over 375,000 students in 1000 school districts succeed in math.
    Given the data structure, might it not be faster to generate binomial
    variates for n-1 types, and set nth count = #draws - sum(other
    outcomes)? And for a "large" count, could you get by with a normal
    approximation? If you *do* feel the need for exact binomial, there are
    short, somewhat sluggish routines in Devroye (1986 - on the web!, pg
    525), and Random_binomial1, compiled by Alan Miller(AMrandom.zip0,
    available, xlated, at http://www.esbconsult.com/. Even though they are
    relatively slow, generating a few of these would seem to be a lot faster
    than your current approach.

    Sorry I can't help more - I'm just starting to learn Python, have yet to
    even get IDLE going.

    Regards,
    Dave Braden
  • Simon Forman at Sep 12, 2006 at 10:23 pm

    Brendon Towle wrote:
    I need to simulate scenarios like the following: "You have a deck of
    3 orange cards, 5 yellow cards, and 2 blue cards. You draw a card,
    replace it, and repeat N times."

    So, I wrote the following code, which works, but it seems quite slow
    to me. Can anyone point out some obvious thing that I'm doing
    inefficiently? Or, is this more or less as good as it gets?

    For reference, my typical numbers look like this:

    2 <= len(population) <= 7
    4 <= len(mapping) <= 50
    10 <= count <= 1000000

    B.

    <code>
    #!/usr/bin/env python

    import random

    def randomDrawing(count, population):
    """Simulates drawing <count> items from <population>, with
    replacement.
    population is a list of lists: [[count1, type1], [count2,
    type2], ...]

    Typical examples:
    randomDrawing(100, [[3, 'orange'], [5, 'yellow'], [2, 'blue']])
    [[28, 'orange'], [57, 'yellow'], [15, 'blue']]
    randomDrawing(100000, [[3, 'orange'], [5, 'yellow'], [2,
    'blue']])
    [[29923, 'orange'], [50208, 'yellow'], [19869, 'blue']]

    """
    res = [[0, item[1]] for item in population]
    mapping = []
    for i in xrange(len(population)):
    mapping.extend([i]*population[i][0])
    for i in xrange(count):
    index = random.choice(mapping)
    res[index][0] += 1
    return res

    </code>

    --
    Brendon Towle, PhD
    Cognitive Scientist
    +1-412-690-2442x127
    Carnegie Learning, Inc.
    The Cognitive Tutor Company ?
    Helping over 375,000 students in 1000 school districts succeed in math.
    I got nearly a 2x speed up with this variant:

    def randomDrawing3(count, population):
    res = [[0, item[1]] for item in population]
    mapping = []
    for i in xrange(len(population)):
    mapping.extend([i]*population[i][0])

    n = len(mapping)
    for i in xrange(count):
    index = int(n * random.random())
    res[mapping[index]][0] += 1

    return res


    Apparently "int(n * random.random())" is faster than random.choice()
    or random.randint()

    sforman at garbage:~ $ python -mtimeit -s'import random; n' 'int(n *
    random.random())'
    100000 loops, best of 3: 3.22 usec per loop

    sforman at garbage:~ $ python -mtimeit -s'import random; n'
    'random.randint(1, n)'
    100000 loops, best of 3: 13 usec per loop

    sforman at garbage:~ $ python -mtimeit -s'import random; n=range(10)'
    'random.choice(n)'
    100000 loops, best of 3: 6.07 usec per loop

    (Note that the first and last examples produce values 0..9 while the
    middle one produces 1..10)


    I don't know for sure, but I think the random, uh, spread or whatever
    will be the same for random() as for choice(). If it's important, you
    should verify that. ;-)

    Peace,
    ~Simon
  • Travis E. Oliphant at Sep 13, 2006 at 2:43 am

    Brendon Towle wrote:
    I need to simulate scenarios like the following: "You have a deck of
    3 orange cards, 5 yellow cards, and 2 blue cards. You draw a card,
    replace it, and repeat N times."
    Thinking about the problem as drawing sample froms a discrete
    distribution defined by the population might help.

    For example, in SciPy you can define your own discrete random variable:

    var = scipy.stats.rv_discrete(name='sample',
    values=([0,1,2],[3/10.,5/10.,2/10.]))

    Then

    var.rvs(size000)

    would quickly return an array of "draws"

    If you didn't want to install SciPy, but were willing to install NumPy,
    then the crux of the algorithm is to generate an entire array of uniform
    random variables: numpy.random.rand(count) and then map them through
    the inverse cumulative distribution function to generate your samples.
    The inverse cumulative distribution function is just a series of steps
    whose width depends on the probablity distribution.

    Thus, the population defines the draw boundaries. To make it easy, just
    multiply the uniform random number by the total number of cards and then
    the boundaries are on the integers of the number of each kind of card.

    Here is an implementation. I find this version to be 2x - 5x faster
    depending on how many draws are used.


    import numpy as N

    def randomDrawing_N(count, population):
    probs = [x[0] for x in population]
    res = [[0, item[1]] for item in population]
    nums = N.random.rand(count)
    cprobs = N.cumsum([0]+probs)
    nums *= cprobs[-1]
    for k, val in enumerate(probs):
    res[k][0] += ((nums > cprobs[k]) & (nums < cprobs[k+1])).sum()
    return res


    -Travis Oliphant
  • Robert Kern at Sep 13, 2006 at 3:46 am

    Paul Rubin wrote:
    "Travis E. Oliphant" <oliphant.travis at ieee.org> writes:
    I need to simulate scenarios like the following: "You have a deck of
    3 orange cards, 5 yellow cards, and 2 blue cards. You draw a card,
    replace it, and repeat N times."
    Thinking about the problem as drawing sample froms a discrete
    distribution defined by the population might help.
    Is there some important reason you want to do this as a simulation?
    And is the real problem more complicated? If you draw from the
    distribution 100,000 times with replacement and sum the results, per
    the Central Limit Theorem you'll get something very close to a normal
    distribution whose parameters you can determine analytically. There
    is probably also some statistics formula to find the precise error.
    So you can replace the 100,000 draws with a single draw.
    Along the lines of what you're trying to get at, the problem that the OP is
    describing is one of sampling from a multinomial distribution.

    http://en.wikipedia.org/wiki/Multinomial_distribution

    numpy has a function that will do the sampling for you:


    In [4]: numpy.random.multinomial?
    Type: builtin_function_or_method
    Base Class: <type 'builtin_function_or_method'>
    String Form: <built-in method multinomial of mtrand.RandomState object at
    0x3e140>
    Namespace: Interactive
    Docstring:
    Multinomial distribution.

    multinomial(n, pvals, size=None) -> random values

    pvals is a sequence of probabilities that should sum to 1 (however, the
    last element is always assumed to account for the remaining probability
    as long as sum(pvals[:-1]) <= 1).


    Sampling from the multinomial distribution is quite simply implemented given a
    binomial sampler. Unfortunately, the standard library's random module does not
    have one. If the number of samples is high enough, then one might be able to
    approximate the binomial distribution with a normal one, but you'd be better off
    just installing numpy.

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
  • Brendon Towle at Sep 13, 2006 at 1:37 pm

    On 12 Sep 2006, at 6:33 PM, python-list-request at python.org wrote:

    Date: 12 Sep 2006 15:23:51 -0700
    From: "Simon Forman" <rogue_pedro at yahoo.com>
    Subject: Re: Random Drawing Simulation -- performance issue

    Brendon Towle wrote:
    I need to simulate scenarios like the following: "You have a deck of
    3 orange cards, 5 yellow cards, and 2 blue cards. You draw a card,
    replace it, and repeat N times."
    [my original code snipped]
    I got nearly a 2x speed up with this variant:

    def randomDrawing3(count, population):
    res = [[0, item[1]] for item in population]
    mapping = []
    for i in xrange(len(population)):
    mapping.extend([i]*population[i][0])

    n = len(mapping)
    for i in xrange(count):
    index = int(n * random.random())
    res[mapping[index]][0] += 1

    return res
    Excellent! For some reason, the speedup I get is only ~1.5x, but
    that's still non-trivial.

    Thanks much for the pointer-

    B.


    --
    Brendon Towle, PhD
    Cognitive Scientist
    +1-412-690-2442x127
    Carnegie Learning, Inc.
    The Cognitive Tutor Company ?
    Helping over 375,000 students in 1000 school districts succeed in math.
  • Brendon Towle at Sep 13, 2006 at 1:44 pm

    On 13 Sep 2006, at 1:01 AM, python-list-request at python.org wrote:

    Date: 12 Sep 2006 20:17:47 -0700
    From: Paul Rubin <http://phr.cx at NOSPAM.invalid>
    Subject: Re: Random Drawing Simulation -- performance issue
    To: python-list at python.org

    "Travis E. Oliphant" <oliphant.travis at ieee.org> writes:
    I need to simulate scenarios like the following: "You have a deck of
    3 orange cards, 5 yellow cards, and 2 blue cards. You draw a card,
    replace it, and repeat N times."
    Thinking about the problem as drawing sample froms a discrete
    distribution defined by the population might help.
    Is there some important reason you want to do this as a simulation?
    And is the real problem more complicated? If you draw from the
    distribution 100,000 times with replacement and sum the results, per
    the Central Limit Theorem you'll get something very close to a normal
    distribution whose parameters you can determine analytically. There
    is probably also some statistics formula to find the precise error.
    So you can replace the 100,000 draws with a single draw.
    The real problem is not substantially more complicated. (The real
    code is, because it's embedded in a bunch of other stuff, but that's
    not the point.)

    I guess the essential reason that I want to do it as a simulation,
    and not as a statistics formula, is that I'd like the code to be
    readable (and modifiable) by a programmer who doesn't have a
    statistics background. I could dredge up enough of my college stats
    to do as you suggest (although I might not enjoy it), but I don't
    think I want to make that a requirement.

    On the other hand (quote somewhat snipped):
    Date: Tue, 12 Sep 2006 22:46:04 -0500
    From: Robert Kern <robert.kern at gmail.com>
    Subject: Re: Random Drawing Simulation -- performance issue
    To: python-list at python.org

    Along the lines of what you're trying to get at, the problem that
    the OP is
    describing is one of sampling from a multinomial distribution.

    numpy has a function that will do the sampling for you:

    In [4]: numpy.random.multinomial?
    Docstring:
    Multinomial distribution.

    multinomial(n, pvals, size=None) -> random values

    pvals is a sequence of probabilities that should sum to 1
    (however, the
    last element is always assumed to account for the remaining
    probability
    as long as sum(pvals[:-1]) <= 1).
    Here, I'm torn. I do want the code to be accessible to non-stats
    people, but this just might do the trick. Must ponder.

    Thanks, everyone, for your helpful suggestions!

    B.

    --
    Brendon Towle, PhD
    Cognitive Scientist
    +1-412-690-2442x127
    Carnegie Learning, Inc.
    The Cognitive Tutor Company ?
    Helping over 375,000 students in 1000 school districts succeed in math.




    From http Wed Sep 13 15:49:44 2006
    From: http (Paul Rubin)
    Date: 13 Sep 2006 06:49:44 -0700
    Subject: Help me use my Dual Core CPU!
    References: <1158063943.007271.86000@m73g2000cwd.googlegroups.com>
    <45071c6d$0$576$ed2619ec@ptn-nntp-reader03.plus.net>
    <1158128695.301849.148210@p79g2000cwp.googlegroups.com>
    <7x8xkol0n9.fsf@ruckus.brouhaha.com>
    <1158132381.716706.243040@e63g2000cwd.googlegroups.com>
    <mailman.14.1158140507.10491.python-list@python.org>
    <7xbqpk5bnc.fsf@ruckus.brouhaha.com>
    <mailman.17.1158145066.10491.python-list@python.org>
    <7xodtkt48l.fsf@ruckus.brouhaha.com>
    <mailman.25.1158154691.10491.python-list@python.org>
    Message-ID: <7xr6yf3mh3.fsf@ruckus.brouhaha.com>

    Robin Becker <robin at reportlab.com> writes:
    well these guys seem to think there are, perhaps it's a joke
    http://anandtech.com/mac/showdoc.aspx?i(32&p=6
    Wow! Yes, it seems to be real ("Clovertown" 4-core cpu). See:

    http://www.tgdaily.com/2006/03/07/idf_keynotes_welcome_to_intel_3-point-0/

    There's even a version that goes out to 32 cores ("Dunnington") on the
    drawing boards.

    We better get rid of that GIL soon ;-)
  • David J. Braden at Sep 13, 2006 at 4:12 pm

    Travis E. Oliphant wrote:
    Brendon Towle wrote:
    I need to simulate scenarios like the following: "You have a deck of
    3 orange cards, 5 yellow cards, and 2 blue cards. You draw a card,
    replace it, and repeat N times."
    Thinking about the problem as drawing sample froms a discrete
    distribution defined by the population might help.

    For example, in SciPy you can define your own discrete random variable:

    var = scipy.stats.rv_discrete(name='sample',
    values=([0,1,2],[3/10.,5/10.,2/10.]))

    Then

    var.rvs(size000)

    would quickly return an array of "draws"

    If you didn't want to install SciPy, but were willing to install NumPy,
    then the crux of the algorithm is to generate an entire array of uniform
    random variables: numpy.random.rand(count) and then map them through
    the inverse cumulative distribution function to generate your samples.
    The inverse cumulative distribution function is just a series of steps
    whose width depends on the probablity distribution.

    Thus, the population defines the draw boundaries. To make it easy, just
    multiply the uniform random number by the total number of cards and then
    the boundaries are on the integers of the number of each kind of card.

    Here is an implementation. I find this version to be 2x - 5x faster
    depending on how many draws are used.


    import numpy as N

    def randomDrawing_N(count, population):
    probs = [x[0] for x in population]
    res = [[0, item[1]] for item in population]
    nums = N.random.rand(count)
    cprobs = N.cumsum([0]+probs)
    nums *= cprobs[-1]
    for k, val in enumerate(probs):
    res[k][0] += ((nums > cprobs[k]) & (nums < cprobs[k+1])).sum()
    return res


    -Travis Oliphant
    In response to what Travis and Simon wrote -
    (1) Where the heck can I find a description of scipy's stat functions?
    Documentation on these seems sparse.

    (2) How does one set up a good timer for Python as implemented for
    Windows? (i.e., how can I make API calls to Windows from Python?)

    Please bear with me, or not; I am /just/ starting off with Python.

    Thanks,
    Dave Braden
  • David J. Braden at Sep 13, 2006 at 4:52 pm

    David J. Braden wrote:
    Travis E. Oliphant wrote:
    Brendon Towle wrote:
    I need to simulate scenarios like the following: "You have a deck of
    3 orange cards, 5 yellow cards, and 2 blue cards. You draw a card,
    replace it, and repeat N times."
    Thinking about the problem as drawing sample froms a discrete
    distribution defined by the population might help.

    For example, in SciPy you can define your own discrete random variable:

    var = scipy.stats.rv_discrete(name='sample',
    values=([0,1,2],[3/10.,5/10.,2/10.]))

    Then

    var.rvs(size000)

    would quickly return an array of "draws"

    If you didn't want to install SciPy, but were willing to install
    NumPy, then the crux of the algorithm is to generate an entire array
    of uniform random variables: numpy.random.rand(count) and then map
    them through the inverse cumulative distribution function to generate
    your samples. The inverse cumulative distribution function is just a
    series of steps whose width depends on the probablity distribution.

    Thus, the population defines the draw boundaries. To make it easy,
    just multiply the uniform random number by the total number of cards
    and then the boundaries are on the integers of the number of each kind
    of card.

    Here is an implementation. I find this version to be 2x - 5x faster
    depending on how many draws are used.


    import numpy as N

    def randomDrawing_N(count, population):
    probs = [x[0] for x in population]
    res = [[0, item[1]] for item in population]
    nums = N.random.rand(count)
    cprobs = N.cumsum([0]+probs)
    nums *= cprobs[-1]
    for k, val in enumerate(probs):
    res[k][0] += ((nums > cprobs[k]) & (nums < cprobs[k+1])).sum()
    return res


    -Travis Oliphant
    In response to what Travis and Simon wrote -
    (1) Where the heck can I find a description of scipy's stat functions?
    Documentation on these seems sparse.

    (2) How does one set up a good timer for Python as implemented for
    Windows? (i.e., how can I make API calls to Windows from Python?)

    Please bear with me, or not; I am /just/ starting off with Python.

    Thanks,
    Dave Braden
    Oops, or rather, duh.
    Rereading Robert's post clarified for me where multinomial is, and that
    to get help I need to specify more than simply help(multinomial).
    Sheesh, I'm dim today.

    My question re timing code stands.

    TIA,
    DaveB

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedSep 12, '06 at 7:08p
activeSep 13, '06 at 4:52p
posts9
users5
websitepython.org

People

Translate

site design / logo © 2021 Grokbase