FAQ
I need to simulate scenarios like the following: "You have a deck of
3 orange cards, 5 yellow cards, and 2 blue cards. You draw a card,
replace it, and repeat N times."

So, I wrote the following code, which works, but it seems quite slow
to me. Can anyone point out some obvious thing that I'm doing
inefficiently? Or, is this more or less as good as it gets?

For reference, my typical numbers look like this:

2 <= len(population) <= 7
4 <= len(mapping) <= 50
10 <= count <= 1000000

B.

<code>
#!/usr/bin/env python

import random

def randomDrawing(count, population):
"""Simulates drawing <count> items from <population>, with
replacement.
population is a list of lists: [[count1, type1], [count2,
type2], ...]

Typical examples:
randomDrawing(100, [[3, 'orange'], [5, 'yellow'], [2, 'blue']])
[[28, 'orange'], [57, 'yellow'], [15, 'blue']]
randomDrawing(100000, [[3, 'orange'], [5, 'yellow'], [2,
'blue']])
[[29923, 'orange'], [50208, 'yellow'], [19869, 'blue']]

"""
res = [[0, item[1]] for item in population]
mapping = []
for i in xrange(len(population)):
mapping.extend([i]*population[i][0])
for i in xrange(count):
index = random.choice(mapping)
res[index][0] += 1
return res

</code>

--
Brendon Towle, PhD
Cognitive Scientist
+1-412-690-2442x127
Carnegie Learning, Inc.
The Cognitive Tutor Company ?
Helping over 375,000 students in 1000 school districts succeed in math.

Search Discussions

•  at Sep 12, 2006 at 8:53 pm ⇧

Brendon Towle wrote:
I need to simulate scenarios like the following: "You have a deck of 3
orange cards, 5 yellow cards, and 2 blue cards. You draw a card, replace
it, and repeat N times."

So, I wrote the following code, which works, but it seems quite slow to
me. Can anyone point out some obvious thing that I'm doing
inefficiently? Or, is this more or less as good as it gets?

For reference, my typical numbers look like this:

2 <= len(population) <= 7
4 <= len(mapping) <= 50
10 <= count <= 1000000

B.

<code>
#!/usr/bin/env python

import random

def randomDrawing(count, population):
"""Simulates drawing <count> items from <population>, with replacement.
population is a list of lists: [[count1, type1], [count2, type2], ...]

Typical examples:
randomDrawing(100, [[3, 'orange'], [5, 'yellow'], [2, 'blue']])
[[28, 'orange'], [57, 'yellow'], [15, 'blue']]
randomDrawing(100000, [[3, 'orange'], [5, 'yellow'], [2, 'blue']])
[[29923, 'orange'], [50208, 'yellow'], [19869, 'blue']]

"""
res = [[0, item[1]] for item in population]
mapping = []
for i in xrange(len(population)):
mapping.extend([i]*population[i][0])
for i in xrange(count):
index = random.choice(mapping)
res[index][0] += 1
return res

</code>

--Brendon Towle, PhD
Cognitive Scientist
+1-412-690-2442x127
Carnegie Learning, Inc.
The Cognitive Tutor Company ?
Helping over 375,000 students in 1000 school districts succeed in math.
Given the data structure, might it not be faster to generate binomial
variates for n-1 types, and set nth count = #draws - sum(other
outcomes)? And for a "large" count, could you get by with a normal
approximation? If you *do* feel the need for exact binomial, there are
short, somewhat sluggish routines in Devroye (1986 - on the web!, pg
525), and Random_binomial1, compiled by Alan Miller(AMrandom.zip0,
available, xlated, at http://www.esbconsult.com/. Even though they are
relatively slow, generating a few of these would seem to be a lot faster
than your current approach.

Sorry I can't help more - I'm just starting to learn Python, have yet to
even get IDLE going.

Regards,
•  at Sep 12, 2006 at 10:23 pm ⇧

Brendon Towle wrote:
I need to simulate scenarios like the following: "You have a deck of
3 orange cards, 5 yellow cards, and 2 blue cards. You draw a card,
replace it, and repeat N times."

So, I wrote the following code, which works, but it seems quite slow
to me. Can anyone point out some obvious thing that I'm doing
inefficiently? Or, is this more or less as good as it gets?

For reference, my typical numbers look like this:

2 <= len(population) <= 7
4 <= len(mapping) <= 50
10 <= count <= 1000000

B.

<code>
#!/usr/bin/env python

import random

def randomDrawing(count, population):
"""Simulates drawing <count> items from <population>, with
replacement.
population is a list of lists: [[count1, type1], [count2,
type2], ...]

Typical examples:
randomDrawing(100, [[3, 'orange'], [5, 'yellow'], [2, 'blue']])
[[28, 'orange'], [57, 'yellow'], [15, 'blue']]
randomDrawing(100000, [[3, 'orange'], [5, 'yellow'], [2,
'blue']])
[[29923, 'orange'], [50208, 'yellow'], [19869, 'blue']]

"""
res = [[0, item[1]] for item in population]
mapping = []
for i in xrange(len(population)):
mapping.extend([i]*population[i][0])
for i in xrange(count):
index = random.choice(mapping)
res[index][0] += 1
return res

</code>

--
Brendon Towle, PhD
Cognitive Scientist
+1-412-690-2442x127
Carnegie Learning, Inc.
The Cognitive Tutor Company ?
Helping over 375,000 students in 1000 school districts succeed in math.
I got nearly a 2x speed up with this variant:

def randomDrawing3(count, population):
res = [[0, item[1]] for item in population]
mapping = []
for i in xrange(len(population)):
mapping.extend([i]*population[i][0])

n = len(mapping)
for i in xrange(count):
index = int(n * random.random())
res[mapping[index]][0] += 1

return res

Apparently "int(n * random.random())" is faster than random.choice()
or random.randint()

sforman at garbage:~ \$ python -mtimeit -s'import random; n' 'int(n *
random.random())'
100000 loops, best of 3: 3.22 usec per loop

sforman at garbage:~ \$ python -mtimeit -s'import random; n'
'random.randint(1, n)'
100000 loops, best of 3: 13 usec per loop

sforman at garbage:~ \$ python -mtimeit -s'import random; n=range(10)'
'random.choice(n)'
100000 loops, best of 3: 6.07 usec per loop

(Note that the first and last examples produce values 0..9 while the
middle one produces 1..10)

I don't know for sure, but I think the random, uh, spread or whatever
will be the same for random() as for choice(). If it's important, you
should verify that. ;-)

Peace,
~Simon
•  at Sep 13, 2006 at 2:43 am ⇧

Brendon Towle wrote:
I need to simulate scenarios like the following: "You have a deck of
3 orange cards, 5 yellow cards, and 2 blue cards. You draw a card,
replace it, and repeat N times."
Thinking about the problem as drawing sample froms a discrete
distribution defined by the population might help.

For example, in SciPy you can define your own discrete random variable:

var = scipy.stats.rv_discrete(name='sample',
values=([0,1,2],[3/10.,5/10.,2/10.]))

Then

var.rvs(size000)

would quickly return an array of "draws"

If you didn't want to install SciPy, but were willing to install NumPy,
then the crux of the algorithm is to generate an entire array of uniform
random variables: numpy.random.rand(count) and then map them through
the inverse cumulative distribution function to generate your samples.
The inverse cumulative distribution function is just a series of steps
whose width depends on the probablity distribution.

Thus, the population defines the draw boundaries. To make it easy, just
multiply the uniform random number by the total number of cards and then
the boundaries are on the integers of the number of each kind of card.

Here is an implementation. I find this version to be 2x - 5x faster
depending on how many draws are used.

import numpy as N

def randomDrawing_N(count, population):
probs = [x[0] for x in population]
res = [[0, item[1]] for item in population]
nums = N.random.rand(count)
cprobs = N.cumsum([0]+probs)
nums *= cprobs[-1]
for k, val in enumerate(probs):
res[k][0] += ((nums > cprobs[k]) & (nums < cprobs[k+1])).sum()
return res

-Travis Oliphant
•  at Sep 13, 2006 at 3:46 am ⇧

Paul Rubin wrote:
"Travis E. Oliphant" <oliphant.travis at ieee.org> writes:
I need to simulate scenarios like the following: "You have a deck of
3 orange cards, 5 yellow cards, and 2 blue cards. You draw a card,
replace it, and repeat N times."
Thinking about the problem as drawing sample froms a discrete
distribution defined by the population might help.
Is there some important reason you want to do this as a simulation?
And is the real problem more complicated? If you draw from the
distribution 100,000 times with replacement and sum the results, per
the Central Limit Theorem you'll get something very close to a normal
distribution whose parameters you can determine analytically. There
is probably also some statistics formula to find the precise error.
So you can replace the 100,000 draws with a single draw.
Along the lines of what you're trying to get at, the problem that the OP is
describing is one of sampling from a multinomial distribution.

http://en.wikipedia.org/wiki/Multinomial_distribution

numpy has a function that will do the sampling for you:

In [4]: numpy.random.multinomial?
Type: builtin_function_or_method
Base Class: <type 'builtin_function_or_method'>
String Form: <built-in method multinomial of mtrand.RandomState object at
0x3e140>
Namespace: Interactive
Docstring:
Multinomial distribution.

multinomial(n, pvals, size=None) -> random values

pvals is a sequence of probabilities that should sum to 1 (however, the
last element is always assumed to account for the remaining probability
as long as sum(pvals[:-1]) <= 1).

Sampling from the multinomial distribution is quite simply implemented given a
binomial sampler. Unfortunately, the standard library's random module does not
have one. If the number of samples is high enough, then one might be able to
approximate the binomial distribution with a normal one, but you'd be better off
just installing numpy.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
•  at Sep 13, 2006 at 1:37 pm ⇧

On 12 Sep 2006, at 6:33 PM, python-list-request at python.org wrote:

Date: 12 Sep 2006 15:23:51 -0700
From: "Simon Forman" <rogue_pedro at yahoo.com>
Subject: Re: Random Drawing Simulation -- performance issue

Brendon Towle wrote:
I need to simulate scenarios like the following: "You have a deck of
3 orange cards, 5 yellow cards, and 2 blue cards. You draw a card,
replace it, and repeat N times."
[my original code snipped]
I got nearly a 2x speed up with this variant:

def randomDrawing3(count, population):
res = [[0, item[1]] for item in population]
mapping = []
for i in xrange(len(population)):
mapping.extend([i]*population[i][0])

n = len(mapping)
for i in xrange(count):
index = int(n * random.random())
res[mapping[index]][0] += 1

return res
Excellent! For some reason, the speedup I get is only ~1.5x, but
that's still non-trivial.

Thanks much for the pointer-

B.

--
Brendon Towle, PhD
Cognitive Scientist
+1-412-690-2442x127
Carnegie Learning, Inc.
The Cognitive Tutor Company ?
Helping over 375,000 students in 1000 school districts succeed in math.
•  at Sep 13, 2006 at 1:44 pm ⇧

On 13 Sep 2006, at 1:01 AM, python-list-request at python.org wrote:

Date: 12 Sep 2006 20:17:47 -0700
From: Paul Rubin <http://phr.cx at NOSPAM.invalid>
Subject: Re: Random Drawing Simulation -- performance issue
To: python-list at python.org

"Travis E. Oliphant" <oliphant.travis at ieee.org> writes:
I need to simulate scenarios like the following: "You have a deck of
3 orange cards, 5 yellow cards, and 2 blue cards. You draw a card,
replace it, and repeat N times."
Thinking about the problem as drawing sample froms a discrete
distribution defined by the population might help.
Is there some important reason you want to do this as a simulation?
And is the real problem more complicated? If you draw from the
distribution 100,000 times with replacement and sum the results, per
the Central Limit Theorem you'll get something very close to a normal
distribution whose parameters you can determine analytically. There
is probably also some statistics formula to find the precise error.
So you can replace the 100,000 draws with a single draw.
The real problem is not substantially more complicated. (The real
code is, because it's embedded in a bunch of other stuff, but that's
not the point.)

I guess the essential reason that I want to do it as a simulation,
and not as a statistics formula, is that I'd like the code to be
readable (and modifiable) by a programmer who doesn't have a
statistics background. I could dredge up enough of my college stats
to do as you suggest (although I might not enjoy it), but I don't
think I want to make that a requirement.

On the other hand (quote somewhat snipped):
Date: Tue, 12 Sep 2006 22:46:04 -0500
From: Robert Kern <robert.kern at gmail.com>
Subject: Re: Random Drawing Simulation -- performance issue
To: python-list at python.org

Along the lines of what you're trying to get at, the problem that
the OP is
describing is one of sampling from a multinomial distribution.

numpy has a function that will do the sampling for you:

In [4]: numpy.random.multinomial?
Docstring:
Multinomial distribution.

multinomial(n, pvals, size=None) -> random values

pvals is a sequence of probabilities that should sum to 1
(however, the
last element is always assumed to account for the remaining
probability
as long as sum(pvals[:-1]) <= 1).
Here, I'm torn. I do want the code to be accessible to non-stats
people, but this just might do the trick. Must ponder.

B.

--
Brendon Towle, PhD
Cognitive Scientist
+1-412-690-2442x127
Carnegie Learning, Inc.
The Cognitive Tutor Company ?
Helping over 375,000 students in 1000 school districts succeed in math.

From http Wed Sep 13 15:49:44 2006
From: http (Paul Rubin)
Date: 13 Sep 2006 06:49:44 -0700
Subject: Help me use my Dual Core CPU!
<7x8xkol0n9.fsf@ruckus.brouhaha.com>
<mailman.14.1158140507.10491.python-list@python.org>
<7xbqpk5bnc.fsf@ruckus.brouhaha.com>
<mailman.17.1158145066.10491.python-list@python.org>
<7xodtkt48l.fsf@ruckus.brouhaha.com>
<mailman.25.1158154691.10491.python-list@python.org>
Message-ID: <7xr6yf3mh3.fsf@ruckus.brouhaha.com>

Robin Becker <robin at reportlab.com> writes:
well these guys seem to think there are, perhaps it's a joke
http://anandtech.com/mac/showdoc.aspx?i(32&p=6
Wow! Yes, it seems to be real ("Clovertown" 4-core cpu). See:

http://www.tgdaily.com/2006/03/07/idf_keynotes_welcome_to_intel_3-point-0/

There's even a version that goes out to 32 cores ("Dunnington") on the
drawing boards.

We better get rid of that GIL soon ;-)
•  at Sep 13, 2006 at 4:12 pm ⇧

Travis E. Oliphant wrote:
Brendon Towle wrote:
I need to simulate scenarios like the following: "You have a deck of
3 orange cards, 5 yellow cards, and 2 blue cards. You draw a card,
replace it, and repeat N times."
Thinking about the problem as drawing sample froms a discrete
distribution defined by the population might help.

For example, in SciPy you can define your own discrete random variable:

var = scipy.stats.rv_discrete(name='sample',
values=([0,1,2],[3/10.,5/10.,2/10.]))

Then

var.rvs(size000)

would quickly return an array of "draws"

If you didn't want to install SciPy, but were willing to install NumPy,
then the crux of the algorithm is to generate an entire array of uniform
random variables: numpy.random.rand(count) and then map them through
the inverse cumulative distribution function to generate your samples.
The inverse cumulative distribution function is just a series of steps
whose width depends on the probablity distribution.

Thus, the population defines the draw boundaries. To make it easy, just
multiply the uniform random number by the total number of cards and then
the boundaries are on the integers of the number of each kind of card.

Here is an implementation. I find this version to be 2x - 5x faster
depending on how many draws are used.

import numpy as N

def randomDrawing_N(count, population):
probs = [x[0] for x in population]
res = [[0, item[1]] for item in population]
nums = N.random.rand(count)
cprobs = N.cumsum([0]+probs)
nums *= cprobs[-1]
for k, val in enumerate(probs):
res[k][0] += ((nums > cprobs[k]) & (nums < cprobs[k+1])).sum()
return res

-Travis Oliphant
In response to what Travis and Simon wrote -
(1) Where the heck can I find a description of scipy's stat functions?
Documentation on these seems sparse.

(2) How does one set up a good timer for Python as implemented for
Windows? (i.e., how can I make API calls to Windows from Python?)

Please bear with me, or not; I am /just/ starting off with Python.

Thanks,
•  at Sep 13, 2006 at 4:52 pm ⇧

David J. Braden wrote:
Travis E. Oliphant wrote:
Brendon Towle wrote:
I need to simulate scenarios like the following: "You have a deck of
3 orange cards, 5 yellow cards, and 2 blue cards. You draw a card,
replace it, and repeat N times."
Thinking about the problem as drawing sample froms a discrete
distribution defined by the population might help.

For example, in SciPy you can define your own discrete random variable:

var = scipy.stats.rv_discrete(name='sample',
values=([0,1,2],[3/10.,5/10.,2/10.]))

Then

var.rvs(size000)

would quickly return an array of "draws"

If you didn't want to install SciPy, but were willing to install
NumPy, then the crux of the algorithm is to generate an entire array
of uniform random variables: numpy.random.rand(count) and then map
them through the inverse cumulative distribution function to generate
your samples. The inverse cumulative distribution function is just a
series of steps whose width depends on the probablity distribution.

Thus, the population defines the draw boundaries. To make it easy,
just multiply the uniform random number by the total number of cards
and then the boundaries are on the integers of the number of each kind
of card.

Here is an implementation. I find this version to be 2x - 5x faster
depending on how many draws are used.

import numpy as N

def randomDrawing_N(count, population):
probs = [x[0] for x in population]
res = [[0, item[1]] for item in population]
nums = N.random.rand(count)
cprobs = N.cumsum([0]+probs)
nums *= cprobs[-1]
for k, val in enumerate(probs):
res[k][0] += ((nums > cprobs[k]) & (nums < cprobs[k+1])).sum()
return res

-Travis Oliphant
In response to what Travis and Simon wrote -
(1) Where the heck can I find a description of scipy's stat functions?
Documentation on these seems sparse.

(2) How does one set up a good timer for Python as implemented for
Windows? (i.e., how can I make API calls to Windows from Python?)

Please bear with me, or not; I am /just/ starting off with Python.

Thanks,
Oops, or rather, duh.
Rereading Robert's post clarified for me where multinomial is, and that
to get help I need to specify more than simply help(multinomial).
Sheesh, I'm dim today.

My question re timing code stands.

TIA,
DaveB

Related Discussions

 view thread | post
Discussion Overview
 group python-list categories python posted Sep 12, '06 at 7:08p active Sep 13, '06 at 4:52p posts 9 users 5 website python.org

5 users in discussion

Content

People

Support

Translate

site design / logo © 2021 Grokbase