FAQ
I've received several long emails from Theo de Raadt (OpenBSD founder)
about Python's default random number generator. This is the random module,
and it defaults to a Mersenne Twister (MT) seeded by 2500 bytes of entropy
taken from os.urandom().


Theo's worry is that while the starting seed is fine, MT is not good when
random numbers are used for crypto and other security purposes. I've
countered that it's not meant for that (you should use
random.SystemRandom() or os.urandom() for that) but he counters that people
don't necessarily know that and are using the default random.random() setup
for security purposes without realizing how wrong that is.


There is already a warning in the docs for the random module that it's not
suitable for security, but -- as the meme goes -- nobody reads the docs.


Theo then went into technicalities that went straight over my head,
concluding with a strongly worded recommendation of the OpenBSD version of
arc4random() (which IIUC is based on something called "chacha", not on
"RC4" despite that being in the name). He says it is very fast (but I don't
know what that means).


I've invited Theo to join this list but he's too busy. The two core Python
experts on the random module have given me opinions suggesting that there's
not much wrong with MT, so here I am. Who is right? What should we do? Is
there anything we need to do?


--
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150909/3cab00ac/attachment-0001.html>

Search Discussions

  • Donald Stufft at Sep 9, 2015 at 4:53 pm

    On September 9, 2015 at 12:36:16 PM, Guido van Rossum (guido at python.org) wrote:
    I've invited Theo to join this list but he's too busy. The two core Python
    experts on the random module have given me opinions suggesting that there's
    not much wrong with MT, so here I am. Who is right? What should we do? Is
    there anything we need to do?

    Everyone is right :)


    MT is a fine algorithm for random numbers when you don't need them to be?
    cryptographically safe, it is a disastrous algorithm if you do need them to be
    safe. As long as you only use MT (and the default ``random``) implementation
    for things where the fact the numbers you get aren't going to be quite random
    (e.g. they are actually predictable) and you use os.urandom/random.SystemRandom
    for everything where you need actual random then everything is fine.


    The problem boils down to, are people going to accidently use the default
    random module when they really should use os.urandom or random.SystemRandom. It
    is my opinion (and I believe Theo's) that they are going to use the MT backed
    random functions in random.py when they shouldn't be. However I don't have a
    great solution to what we should do about it.


    One option is to add a new, random.FastInsecureRandom class, and switch the
    "bare" random functions in that module over to using random.SystemRandom by
    default. Then if people want to opt into a faster random that isn't
    crpytographically secure by default they can use that class. This would
    essentially be inverting the relationship today, where it defaults to insecure
    and you have to opt in to secure.


    -----------------
    Donald Stufft
    PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
  • Sven R. Kunze at Sep 9, 2015 at 7:09 pm

    On 09.09.2015 18:53, Donald Stufft wrote:
    This would
    essentially be inverting the relationship today, where it defaults to insecure
    and you have to opt in to secure.

    Not being an expert on this but I agree with this assessment.


    You can determine easily whether your program runs fast enough. If not,
    you can fix it.
    You cannot determine easily whether something you made is
    cryptographically secure.


    The default should be as secure as possible.




    Best,
    Sven
  • Guido van Rossum at Sep 9, 2015 at 5:02 pm
    I'm just going to forward further missives by Theo.


    ---------- Forwarded message ----------
    From: Theo de Raadt
    Date: Wed, Sep 9, 2015 at 9:59 AM
    Subject: Re: getentropy, getrandom, arc4random()
    To: guido at python.org



    Thanks. And one last thing: unless Go and Swift, Python has no significant
    corporate resources behind it -- it's mostly volunteers begging their
    employers to let them work on Python for a few hours per week.

    i understand because I find myself in the same situation.


    however i think you overstate the difficulty involved.


    high-availibility random is kind of a new issue.


    so final advice from me; feel free to forward as you like.


    i think arc4random would be a better API to call on the back side than
    getentropy/getrandom.


    arc4random can seed initialize with a single getentropy/getrandom call
    at startup. that is done automatically. you can then use
    arc4random's results to initialize the MT. in a system call trace,
    this will show up as one getentropy/getrandom at process startup,
    which gets both subsystems going. really cheap.


    in the case of longer run times, the userland arc4random PRNG folding
    reduces the system calls required. this helps older kernels with
    slower entropy creation, taking pressure off their subsystem. driving
    everyone towards this one API which is so high performance is the
    right goal.


    chacha arc4random is really fast.


    if you were to create such an API in python, maybe this is how it will
    go:


    say it becomes arc4random in the back end. i am unsure what advice to
    give you regarding a python API name. in swift, they chose to use the
    same prefix "arc4random" (id = arc4random(), id = arc4random_uniform(1..n)";
    it is a little bit different than the C API. google has tended to choose
    other prefixes. we admit the name is a bit strange, but we can't touch
    the previous attempts like drand48....


    I do suggest you have the _uniform and _buf versions. Maybe apple
    chose to stick to arc4random as a name simply because search engines
    tend to give above average advice for this search string?


    so arc4random is natively available in freebsd, macos, solaris, and
    other systems like andriod libc (bionic). some systems lack it:
    win32, glibc, hpux, aix, so we wrote replacements for libressl:


    https://github.com/libressl-portable/openbsd/tree/master/src/lib/libcrypto/crypto
    https://github.com/libressl-portable/portable/tree/master/crypto/compat


    the first is the base openbsd tree where we maintain/develop this code
    for other systems, the 2nd part is scaffold in libressl that makes this
    available to others.


    it contains arc4random for those systems, and supplies getentropy()
    stubs for challenged systems.


    we'll admit we haven't got solutions for every system known to man.
    we are trying to handle fork issues, and systems with very bad
    entropy feeding.


    that's free code. the heavy lifting is done, and we'll keep maintaining
    that until the end of days. i hope it helps.






    --
    --Guido van Rossum (python.org/~guido)
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150909/97b26141/attachment.html>
  • Tim Peters at Sep 9, 2015 at 5:10 pm
    {Guido]
    ...
    The two core Python experts on the random module
    have given me opinions suggesting that there's not
    much wrong with MT, so here I am.

    There is nothing _right_ about MT in a crypto context - it's entirely
    unsuitable for any such purpose, and always was. Just to be clear
    about that ;-) But it's an excellent generator for almost all other
    purposes.


    So the real question is: whose use cases do you want to cater to by default?


    If you answer "crytpo", then realize the Python generator will have to
    change every time the crypto community changes its mind about what's
    _currently_ "good enough". There's a long history of that already.


    Indeed, there are already numerous "chacha" variants. For a brief
    overview, scroll down to the ChaCha20 section of this exceptionally
    readable page listing pros and cons of various generators:


         http://www.pcg-random.org/other-rngs.html


    There are no answers to vital pragmatic questions (like "is it
    possible to supply a seed to get reproducible results?") without
    specifying whose implementation of which chacha variant you're asking
    about.


    I've always thought Python should be a follower rather than a leader
    in this specific area. For example, I didn't push for the Twister
    before it was well on its way to becoming a de facto standard.


    Anyway, it's all moot until someone supplies a patch - and that sure
    ain't gonna be me ;-)



    On Wed, Sep 9, 2015 at 11:35 AM, Guido van Rossum wrote:
    I've received several long emails from Theo de Raadt (OpenBSD founder) about
    Python's default random number generator. This is the random module, and it
    defaults to a Mersenne Twister (MT) seeded by 2500 bytes of entropy taken
    from os.urandom().

    Theo's worry is that while the starting seed is fine, MT is not good when
    random numbers are used for crypto and other security purposes. I've
    countered that it's not meant for that (you should use random.SystemRandom()
    or os.urandom() for that) but he counters that people don't necessarily know
    that and are using the default random.random() setup for security purposes
    without realizing how wrong that is.

    There is already a warning in the docs for the random module that it's not
    suitable for security, but -- as the meme goes -- nobody reads the docs.

    Theo then went into technicalities that went straight over my head,
    concluding with a strongly worded recommendation of the OpenBSD version of
    arc4random() (which IIUC is based on something called "chacha", not on "RC4"
    despite that being in the name). He says it is very fast (but I don't know
    what that means).

    I've invited Theo to join this list but he's too busy. The two core Python
    experts on the random module have given me opinions suggesting that there's
    not much wrong with MT, so here I am. Who is right? What should we do? Is
    there anything we need to do?

    --
    --Guido van Rossum (python.org/~guido)

    _______________________________________________
    Python-ideas mailing list
    Python-ideas at python.org
    https://mail.python.org/mailman/listinfo/python-ideas
    Code of Conduct: http://python.org/psf/codeofconduct/
  • Donald Stufft at Sep 9, 2015 at 5:20 pm

    On September 9, 2015 at 1:11:22 PM, Tim Peters (tim.peters at gmail.com) wrote:
    So the real question is: whose use cases do you want to cater to
    by default?

    If you answer "crytpo", then realize the Python generator will
    have to
    change every time the crypto community changes its mind about
    what's
    _currently_ "good enough". There's a long history of that already.?



    This is not really true in that sense that Python would need to do anything if
    the blessed generator changed. We'd use /dev/urandom, one of the syscalls that
    do the same thing, or the CryptGen API on Windows. Python should not have it's
    own userland CSPRNG. Then it's up to the platform to follow what generator they
    are going to provide.


    -----------------
    Donald Stufft
    PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
  • Tim Peters at Sep 9, 2015 at 5:28 pm
    [Tim]
    So the real question is: whose use cases do you want to cater to
    by default?

    If you answer "crytpo", then realize the Python generator will
    have to change every time the crypto community changes its mind
    about what's _currently_ "good enough". There's a long history of
    ? that already.

    [Donald Stufft <donald@stufft.io>]
    This is not really true in that sense that Python would need to do anything if
    the blessed generator changed.

    I read Guido's message as specifically asking about Theo's "strongly
    worded recommendation of [Python switching to] the OpenBSD version of
    arc4random()" as its default generator. In which, case, yes, when that
    specific implementation falls out of favor, Python would need to
    change.



    We'd use /dev/urandom, one of the syscalls that
    do the same thing, or the CryptGen API on Windows. Python should not have it's
    own userland CSPRNG.

    I read Guido's message as asking whether Python should indeed do just that.
  • Donald Stufft at Sep 9, 2015 at 5:43 pm

    On September 9, 2015 at 1:28:46 PM, Tim Peters (tim.peters at gmail.com) wrote:
    I read Guido's message as specifically asking about Theo's
    "strongly
    worded recommendation of [Python switching to] the OpenBSD
    version of
    arc4random()" as its default generator. In which, case, yes,
    when that
    specific implementation falls out of favor, Python would need
    to
    change.

    arc4random changes as the underlying implementation changes too, the name is a
    historical accident really. arc4random no longer uses arc4 it uses chacha, and
    when/if chacha needs to be replaced, arc4random will still be the name.


    -----------------
    Donald Stufft
    PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
  • Random832 at Sep 9, 2015 at 5:54 pm

    On Wed, Sep 9, 2015, at 13:43, Donald Stufft wrote:
    arc4random changes as the underlying implementation changes too, the name
    is a
    historical accident really. arc4random no longer uses arc4 it uses
    chacha, and
    when/if chacha needs to be replaced, arc4random will still be the name.

    The issue is, what should Python do, if the decision is made to not
    provide its own RNG [presumably would be a forked copy of OpenBSD's
    current arc4random] on systems that do not provide a function named
    arc4random? Use /dev/urandom (or CryptGenRandom) every time [more
    expensive, performs I/O]? rand48? random? rand?


    I don't see the issue with Python providing its own implementation. If
    the state of the art changes, we can have another discussion then.
  • Tim Peters at Sep 9, 2015 at 6:31 pm
    [random832 at fastmail.us]
    I don't see the issue with Python providing its own implementation. If
    the state of the art changes,

    It will. Over & over again. That's why it's called "art" ;-)

    we can have another discussion then.

    Also over & over again. If you volunteer to own responsibility for
    updating all versions of Python each time it changes (in a crypto
    context, an advance in the state of the art implies the prior state
    becomes "a bug"), and post a performance bond sufficient to pay
    someone else to do it if you vanish, then a major pragmatic objection
    would go away ;-)
  • Stefan Krah at Sep 9, 2015 at 6:43 pm

    Tim Peters <tim.peters@...> writes:
    we can have another discussion then.
    Also over & over again. If you volunteer to own responsibility for
    updating all versions of Python each time it changes (in a crypto
    context, an advance in the state of the art implies the prior state
    becomes "a bug"), and post a performance bond sufficient to pay
    someone else to do it if you vanish, then a major pragmatic objection
    would go away

    The OpenBSD devs could also publish arc4random as a library that
    works everywhere (like OpenSSH). That would be a nicer solution
    for everyone (except for the devs perhaps :).




    Stefan Krah
  • Tim Peters at Sep 9, 2015 at 6:47 pm
    [Stefan Krah <skrah@bytereef.org>]
    ...
    The OpenBSD devs could also publish arc4random as a library that
    works everywhere (like OpenSSH). That would be a nicer solution
    for everyone (except for the devs perhaps :).

    Telling Python devs "hey, it will be as easy as dealing with OpenSSH
    has been!" is indeed a good way to kill the idea at once ;-)
  • Stefan Krah at Sep 9, 2015 at 8:07 pm

    Stefan Krah <skrah@...> writes:
    The OpenBSD devs could also publish arc4random as a library that
    works everywhere (like OpenSSH). That would be a nicer solution
    for everyone (except for the devs perhaps :).

    And naturally they're already doing that. I missed this in Theo's first
    mail:


    https://github.com/libressl-portable/openbsd/tree/master/src/lib/libcrypto/crypto
    https://github.com/libressl-portable/portable/tree/master/crypto/compat




    So I guess the whole thing also depends on how popular these
    libraries will be.




    Stefan Krah
  • Random832 at Sep 9, 2015 at 6:55 pm

    On Wed, Sep 9, 2015, at 14:31, Tim Peters wrote:
    Also over & over again. If you volunteer to own responsibility for
    updating all versions of Python each time it changes (in a crypto
    context, an advance in the state of the art implies the prior state
    becomes "a bug"), and post a performance bond sufficient to pay
    someone else to do it if you vanish, then a major pragmatic objection
    would go away ;-)

    I don't see how "Changing Python's RNG implementation today to
    arc4random as it exists now" necessarily implies "Making a commitment to
    guarantee the cryptographic suitability of Python's RNG for all time".
    Those are two separate things.
  • Tim Peters at Sep 9, 2015 at 7:03 pm
    [<random832@fastmail.us>]
    I don't see how "Changing Python's RNG implementation today to
    arc4random as it exists now" necessarily implies "Making a commitment to
    guarantee the cryptographic suitability of Python's RNG for all time".
    Those are two separate things.

    Disagree. The _only_ point to switching today is "to guarantee the
    cryptographic suitability of Python's RNG" today. It misses the
    intent of the switch entirely to give a "but tomorrow? eh - that'[s a
    different issue" dodge.


    No, no rules of formal logic would be violated by separating the two -
    it would be a violation of the only _sense_ in making a switch at all.
    If you don't believe me, try asking Theo ;-)
  • Steven D'Aprano at Sep 9, 2015 at 7:07 pm

    On Wed, Sep 09, 2015 at 02:55:01PM -0400, random832 at fastmail.us wrote:
    On Wed, Sep 9, 2015, at 14:31, Tim Peters wrote:
    Also over & over again. If you volunteer to own responsibility for
    updating all versions of Python each time it changes (in a crypto
    context, an advance in the state of the art implies the prior state
    becomes "a bug"), and post a performance bond sufficient to pay
    someone else to do it if you vanish, then a major pragmatic objection
    would go away ;-)
    I don't see how "Changing Python's RNG implementation today to
    arc4random as it exists now" necessarily implies "Making a commitment to
    guarantee the cryptographic suitability of Python's RNG for all time".
    Those are two separate things.

    Not really. Look at the subject line. It doesn't say "should we change
    from MT to arc4random?", it asks if the default random number generator
    should be secure. The only reason we are considering the change from MT
    to arc4random is to make the PRNG cryptographically secure. "Secure" is
    a moving target, what is secure today will not be secure tomorrow.


    Yes, in principle, we could make the change once, then never again. But
    why bother? We don't gain anything from changing to arc4random if there
    is no promise to be secure into the future.


    Question, aimed at anyone, not necessarily random832 -- one desirable
    property of PRNGs is that you can repeat a sequence of values if you
    re-seed with a known value. Does arc4random keep that property? I think
    that it is important that the default RNG be deterministic when given a
    known seed. (I'm happy for the default seed to be unpredictable.)




    --
    Steve
  • Tim Peters at Sep 9, 2015 at 7:20 pm
    [Steven D'Aprano <steve@pearwood.info>]
    ...
    Question, aimed at anyone, not necessarily random832 -- one desirable
    property of PRNGs is that you can repeat a sequence of values if you
    re-seed with a known value. Does arc4random keep that property? I think
    that it is important that the default RNG be deterministic when given a
    known seed. (I'm happy for the default seed to be unpredictable.)

    "arc4random" is ill-defined. From what I gathered, it's the case that
    "pure chacha" variants can be seeded to get a reproducible sequence
    "in theory", but that not all implementations support that.


    Specifically, the OpenBSD implementation being "sold" here does not and cannot:


         http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man3/arc4random.3


    "Does not" because there is no API to either request or set a seed.


    "Cannot" because:


         The subsystem is re-seeded from the kernel random number
         subsystem using getentropy(2) on a regular basis


    Other variants skip that last part.
  • Nathaniel Smith at Sep 9, 2015 at 9:02 pm

    On Sep 9, 2015 12:21 PM, "Tim Peters" wrote:
    [Steven D'Aprano <steve@pearwood.info>]
    ...
    Question, aimed at anyone, not necessarily random832 -- one desirable
    property of PRNGs is that you can repeat a sequence of values if you
    re-seed with a known value. Does arc4random keep that property? I think
    that it is important that the default RNG be deterministic when given a
    known seed. (I'm happy for the default seed to be unpredictable.)
    "arc4random" is ill-defined. From what I gathered, it's the case that
    "pure chacha" variants can be seeded to get a reproducible sequence
    "in theory", but that not all implementations support that.

    Specifically, the OpenBSD implementation being "sold" here does not and
    cannot:
    http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man3/arc4random.3
    "Does not" because there is no API to either request or set a seed.

    "Cannot" because:

    The subsystem is re-seeded from the kernel random number
    subsystem using getentropy(2) on a regular basis

    Another reason why it is important *not* to provide a seeding api for a
    crypto rng is that this means you can later swap out the underlying
    algorithms easily as the state of the art improves. By contrast, if you
    have a deterministic seeded mode, then swapping out the algorithm becomes a
    compatibility break.


    (You can provide a "mix this extra entropy into the pool" api, which looks
    rather similar to seeding, but has fundamentally different semantics.)


    The only real problem that I see with switching the random module to use a
    crypto rng is exactly this backwards compatibility issue. For scientific
    users, reproducibility of output streams is really important.


    (Ironically, this is a variety of "important" that crypto people are very
    familiar with: universally acknowledged to be the right thing by everyone
    who's thought about it, a minority do religiously and rely on, and most
    people ignore out of ignorance. Education is ongoing...)


    OTOH python has never made strong guarantees of output stream
    reproducibility -- 3.2 broke all seeds by default (you have to add
    'version=1' to your seed call to get the same results on post-3.2 pythons
    -- which of course gives an error on older versions). And 99% of the
    methods are documented to be unstable across versions -- the only method
    that's guaranteed to produce reproducible results across versions is
    random.random(). In practice the other methods usually don't change so
    people get away with it, but. See:


         https://docs.python.org/3/library/random.html#notes-on-reproducibility


    So in practice the stdlib random module is not super suitable for
    scientific work anyway. Not that this stops anyone from using it for this
    purpose... see above. (And to be fair even this limited determinism is
    still enough to be somewhat useful -- not all projects require
    reproducibility across years of different python versions.) Plus even a lot
    of people who know about the importance of seeding don't realize that the
    stdlib's support has these gotchas.


    (Numpy unsurprisingly puts higher priority on these issues -- our random
    module guarantees exact reproducibility of seeded outputs modulo rounding,
    across versions and systems, except for bugfixes necessary for correctness.
    This means that we carry around a bunch of old inefficient implementations
    of the distribution methods, but so be it...)


    So, all that considered: I can actually see an argument for removing the
    seeding methods from the the stdib entirely, and directing those who need
    reproducibility to a third party library like numpy (/ pygsl / ...). This
    would be pretty annoying for some cases where someone really does have
    simple needs and wants just a little determinism without a binary
    extension, but on net it might even be an improvement, given how easy it is
    to misread the current api as guaranteeing more than it actually promises.
    OTOH this would actually break the current promise, weak as it is.


    Keeping that promise in mind, an alternative would be to keep both
    generators around, use the cryptographically secure one by default, and
    switch to MT when someone calls


       seed(1234, generator="INSECURE LEGACY MT")


    But this would justifiably get us crucified by the security community,
    because the above call would flip the insecure switch for your entire
    program, including possibly other modules that were depending on random to
    provide secure bits.


    So if we were going to do this then I think it would have to be by
    switching the global RNG over unconditionally, and to fulfill the promise,
    provide the MT option as a separate class that the user would have to
    instantiate explicitly if they wanted it for backcompat. Document that you
    should replace


    import random
    random.seed(12345)
    if random.whatever(): ...


    with


    from random import MTRandom
    random = MTRandom(12345)
    if random.whatever(): ...


    As part of this transition I would also suggest making the seed method on
    non-seedable RNGs raise an error when given an explicit seed, instead of
    silently doing nothing like the current SystemRandom.


    -n
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150909/faae4eb3/attachment-0001.html>
  • Random832 at Sep 9, 2015 at 9:15 pm

    On Wed, Sep 9, 2015, at 17:02, Nathaniel Smith wrote:
    Keeping that promise in mind, an alternative would be to keep both
    generators around, use the cryptographically secure one by default, and
    switch to MT when someone calls

    seed(1234, generator="INSECURE LEGACY MT")

    But this would justifiably get us crucified by the security community,
    because the above call would flip the insecure switch for your entire
    program, including possibly other modules that were depending on random
    to
    provide secure bits.

    Ideally, neither the crypto bits nor the science bits of a big program
    should be using the module-level functions. A small program either
    hasn't got both kinds of bits, or won't be using them at the same time.
    And if you've got non-science bits doing stuff with your RNG then your
    results probably aren't going to be reproducible anyway.


    Which suggests a solution: How about exposing a way to switch out the
    Random instance used by the module-level functions? The instance itself
    exists now as random._inst, but the module just spews its bound methods
    all over its namespace. (Long-term, it might make sense to deprecate the
    module-level functions)
  • Random832 at Sep 11, 2015 at 9:12 pm

    On Wed, Sep 9, 2015, at 17:02, Nathaniel Smith wrote:
    Keeping that promise in mind, an alternative would be to keep both
    generators around, use the cryptographically secure one by default, and
    switch to MT when someone calls

    seed(1234, generator="INSECURE LEGACY MT")

    But this would justifiably get us crucified by the security community,
    because the above call would flip the insecure switch for your entire
    program, including possibly other modules that were depending on random
    to
    provide secure bits.

    I just realized, OpenBSD has precisely this functionality, for the
    rand/random/rand48 functions, in the "_deterministic" versions of their
    respective seed functions. So that's probably not a terrible path to go
    down:


    Make a Random class that uses a CSPRNG and/or os.urandom until/unless it
    is explicitly seeded. Use that class for the global instance. We could
    probably skip the "make a separate function name to show you really mean
    it" because unlike C, Python has never encouraged explicitly seeding
    with the {time, pid, four bytes from /dev/random} when one doesn't
    actually want determinism. (The default seed in C for rand/random is
    *1*; for rand48 it is an implementation-defined, but specified to be
    constant, value).


    For completeness, have getstate return a tuple of a boolean (for which
    mode it is in) and whatever state Random returns. setstate can accept
    either this tuple, or for compatibility whatever Random uses.
  • Petr Viktorin at Sep 11, 2015 at 9:48 pm

    On Fri, Sep 11, 2015 at 11:12 PM, wrote:
    On Wed, Sep 9, 2015, at 17:02, Nathaniel Smith wrote:
    Keeping that promise in mind, an alternative would be to keep both
    generators around, use the cryptographically secure one by default, and
    switch to MT when someone calls

    seed(1234, generator="INSECURE LEGACY MT")

    But this would justifiably get us crucified by the security community,
    because the above call would flip the insecure switch for your entire
    program, including possibly other modules that were depending on random
    to
    provide secure bits.
    I just realized, OpenBSD has precisely this functionality, for the
    rand/random/rand48 functions, in the "_deterministic" versions of their
    respective seed functions. So that's probably not a terrible path to go
    down:

    Make a Random class that uses a CSPRNG and/or os.urandom until/unless it
    is explicitly seeded. Use that class for the global instance. We could
    probably skip the "make a separate function name to show you really mean
    it" because unlike C, Python has never encouraged explicitly seeding
    with the {time, pid, four bytes from /dev/random} when one doesn't
    actually want determinism. (The default seed in C for rand/random is
    *1*; for rand48 it is an implementation-defined, but specified to be
    constant, value).

    For completeness, have getstate return a tuple of a boolean (for which
    mode it is in) and whatever state Random returns. setstate can accept
    either this tuple, or for compatibility whatever Random uses.

    Calling getstate() means yoy want to call setstate() at some point in
    the future, and have deterministic results. Getting the CSRNG state is
    dangerous (since it would allow replaying), and it's not even useful
    (since system entropy gets mixed in occasionally).
    Instead, in this scheme, getstate() should activate the deterministic
    RNG (seeding it if it's the first use), and return its state.
    setstate() would then also switch to the Twister, and seed it.
  • Random832 at Sep 11, 2015 at 11:12 pm

    On Fri, Sep 11, 2015, at 17:48, Petr Viktorin wrote:
    Calling getstate() means yoy want to call setstate() at some point in
    the future, and have deterministic results. Getting the CSRNG state is
    dangerous (since it would allow replaying), and it's not even useful
    (since system entropy gets mixed in occasionally).
    Instead, in this scheme, getstate() should activate the deterministic
    RNG (seeding it if it's the first use), and return its state.
    setstate() would then also switch to the Twister, and seed it.

    My thinking was that "CSRNG is enabled" should be regarded as a single
    state of the "magic switching RNG". The alternative would be that
    calling getstate on a magic switching RNG that is not already in
    deterministic mode is an error.
  • Stefan Krah at Sep 9, 2015 at 7:33 pm

    Steven D'Aprano <steve@...> writes:
    Question, aimed at anyone, not necessarily random832 -- one desirable
    property of PRNGs is that you can repeat a sequence of values if you
    re-seed with a known value. Does arc4random keep that property? I think
    that it is important that the default RNG be deterministic when given a
    known seed. (I'm happy for the default seed to be unpredictable.)

    I think the removal of MT wasn't proposed (at least not by Theo).
    So we'd still have deterministic sequences in addition to
    arc4random.






    Stefan Krah
  • Paul Moore at Sep 9, 2015 at 8:04 pm

    On 9 September 2015 at 20:33, Stefan Krah wrote:
    Steven D'Aprano <steve@...> writes:
    Question, aimed at anyone, not necessarily random832 -- one desirable
    property of PRNGs is that you can repeat a sequence of values if you
    re-seed with a known value. Does arc4random keep that property? I think
    that it is important that the default RNG be deterministic when given a
    known seed. (I'm happy for the default seed to be unpredictable.)
    I think the removal of MT wasn't proposed (at least not by Theo).
    So we'd still have deterministic sequences in addition to
    arc4random.

    I use a RNG quite often. Typically for simulations (games, dierolls,
    card draws, that sort of thing). Sometimes for many millions of
    results (Monte Carlo simulations, for example). I would always just
    use the default RNG supplied by the stdlib - I view my use case as
    "normal use" and wouldn't go looking for specialist answers. I'd
    occasionally look for reproducibility, although it's not often a key
    requirement for me (I would expect it as an option from the stdlib
    RNG, though).


    Anyone doing crypto who doesn't fully appreciate that it's a
    specialist subject and that they should be looking for a dedicated RNG
    suitable for crypto, is probably going to make a lot of *other*
    mistakes as well. Leading them away from this one probably isn't going
    to be enough to make their code something I'd want to use...


    So as a user, I'm against making a change like this. Let the default
    RNG in the stdlib be something suitable for simulations, "pick a
    random question", and similar situations, and provide a crypto-capable
    RNG for those who need it, but not as the default. (I am, of course,
    assuming that it's not possible to have a single RNG that is the best
    option for both uses - nobody on this thread seems to have suggested
    that I'm wrong in this assumption).


    Paul
  • Petr Viktorin at Sep 9, 2015 at 8:37 pm

    On Wed, Sep 9, 2015 at 9:33 PM, Stefan Krah wrote:
    Steven D'Aprano <steve@...> writes:
    Question, aimed at anyone, not necessarily random832 -- one desirable
    property of PRNGs is that you can repeat a sequence of values if you
    re-seed with a known value. Does arc4random keep that property? I think
    that it is important that the default RNG be deterministic when given a
    known seed. (I'm happy for the default seed to be unpredictable.)

    The OpenBSD implementation does not allow any kind of reproducible results.
    Reading http://www.pcg-random.org/other-rngs.html, I see that
    arc4random is not built for is statistical quality and k-dimensional
    equidistribution, which are also properties you might not need for
    crypto, but do want for simulations.
    So there are two quite different use cases (plus a lot of grey area
    where any solution is okay).


    The current situation may be surprising to people who didn't read the
    docs. Switching away from MT might be a disservice to users that did
    read and understand them.
  • Stefan Krah at Sep 9, 2015 at 9:36 pm

    Petr Viktorin <encukou@...> writes:
    The OpenBSD implementation does not allow any kind of reproducible results.
    Reading http://www.pcg-random.org/other-rngs.html, I see that
    arc4random is not built for is statistical quality and k-dimensional
    equidistribution, which are also properties you might not need for
    crypto, but do want for simulations.
    So there are two quite different use cases (plus a lot of grey area
    where any solution is okay).

    I can't find much at all when searching for "chacha20 equidistribution".
    Contrast that with "mersenne twister equidistribution" and it seems that
    chacha20 hasn't been studied very much in that respect (except for
    the pcg-random site).




    So I also think this should preclude us from replacing the current
    random() functions.




    Adding an arc4random module with the caveat that its quality will
    be as good as the current OpenBSD libcrypto/libressl(?) would be okay.




    Stefan Krah
  • Tim Peters at Sep 9, 2015 at 10:19 pm
    [Stefan Krah <skrah@bytereef.org>]
    I can't find much at all when searching for "chacha20 equidistribution".
    Contrast that with "mersenne twister equidistribution" and it seems that
    chacha20 hasn't been studied very much in that respect (except for
    the pcg-random site).

    So I also think this should preclude us from replacing the current
    random() functions.

    Well, most arguments about random functions rely on fantasies ;-) For
    example, yes, the Twister is provably equidistributed to 32 bits
    across 623 dimensions, but ... does it make a difference to anything?
    That's across the Twister's _entire period_, which couldn't actually
    be generated across the age of the universe.


    What may really matter to an application is whether it will see rough
    equidistribution across the infinitesimally small slice (of the
    Twister's full period) it actually generates. And you'll find very
    little about _that_ (last time I searched, I found nothing). For
    assurances about that, people rely on test suites developed to test
    generators.


    The Twister's provably perfect equidistribution across its whole
    period also has its scary sides. For example, run random.random()
    often enough, and it's _guaranteed_ you'll eventually reach a state
    where the output is exactly 0.0 hundreds of times in a row. That will
    happen as often as it "should happen" by chance, but that's scant
    comfort if you happen to hit such a state. Indeed, the Twister was
    patched relatively early in its life to try to prevent it from
    _starting_ in such miserable states. Such states are nevertheless
    still reachable from every starting state.


    But few people know any of that, so they take "equidistribution" as
    meaning a necessary thing rather than as an absolute guarantee of
    eventual disaster ;-)


    What may really matter for most simulations is that the Twister never
    reaches a state where, in low dimensions, k-tuples fall on "just a
    few" regularly-spaced hyperplanes forever after. That's a systematic
    problem with old flavors of linear congruential generators. But that
    problem is _so_ old that no new generator proposed over the last few
    decades suffers it either.



    Adding an arc4random module with the caveat that its quality will
    be as good as the current OpenBSD libcrypto/libressl(?) would be okay.

    Anyone is welcome to supply such a module today, and distribute it via
    the usual channels.


    Python already supplies the platform spelling of `urandom`, and a very
    capable random.SystemRandom class based on `urandom`, for those
    needing crypto-strength randomness (relying on what their OS believed
    that means, and supplied via their `urandom`).


    Good enough for me. But, no, I'm not a security wonk at heart.
  • Nathaniel Smith at Sep 9, 2015 at 11:15 pm

    On Wed, Sep 9, 2015 at 3:19 PM, Tim Peters wrote:
    Well, most arguments about random functions rely on fantasies ;-) For
    example, yes, the Twister is provably equidistributed to 32 bits
    across 623 dimensions, but ... does it make a difference to anything?
    That's across the Twister's _entire period_, which couldn't actually
    be generated across the age of the universe.

    What may really matter to an application is whether it will see rough
    equidistribution across the infinitesimally small slice (of the
    Twister's full period) it actually generates. And you'll find very
    little about _that_ (last time I searched, I found nothing). For
    assurances about that, people rely on test suites developed to test
    generators.

    Yeah, equidistribution is not a guarantee of anything on its own. For
    example, an integer counter modulo 2**(623*32) is equidistributed to
    32 bits across 623 dimensions, just like the Mersenne Twister. Mostly
    people talk about equidistribution because for a deterministic RNG,
    (a) being non-equidistributed would be bad, (b) equidistribution is
    something you can reasonably hope to prove for simple
    non-cryptographic generators, and mathematicians like writing proofs.


    OTOH equidistribution is not even well-defined for the OpenBSD
    "arc4random" generator, because it is genuinely non-deterministic --
    it regularly mixes new entropy into its state as it goes -- and
    equidistribution by definition requires determinism. So it "fails"
    this test of "randomness" because it is too random...


    In practice, the chances that your Monte Carlo simulation is going to
    give bad results because of systematic biases in arc4random are much,
    much lower than the chances that it will give bad results because of
    subtle hardware failures that corrupt your simulation. And hey, if
    arc4random *does* mess up your simulation, then congratulations, your
    simulation is publishable as a cryptographic attack and will probably
    get written up in the NYTimes :-).


    The real reasons to prefer non-cryptographic RNGs are the auxiliary
    features like determinism, speed, jumpahead, multi-thread
    friendliness, etc. But the stdlib random module doesn't really provide
    any of these (except determinism in strictly limited cases), so I'm
    not sure it matters much.

    The Twister's provably perfect equidistribution across its whole
    period also has its scary sides. For example, run random.random()
    often enough, and it's _guaranteed_ you'll eventually reach a state
    where the output is exactly 0.0 hundreds of times in a row. That will
    happen as often as it "should happen" by chance, but that's scant
    comfort if you happen to hit such a state. Indeed, the Twister was
    patched relatively early in its life to try to prevent it from
    _starting_ in such miserable states. Such states are nevertheless
    still reachable from every starting state.

    This criticism seems a bit unfair though -- even a true stream of
    random bits (e.g. from a true unbiased quantum source) has this
    property, and trying to avoid this happening would introduce bias that
    really could cause problems in practice. A good probabilistic program
    is one that has a high probability of returning some useful result,
    but they always have some low probability of returning something
    weird. So this is just saying that most people don't understand
    probability. Which is true, but there isn't much that the random
    module can do about it :-)


    -n


    --
    Nathaniel J. Smith -- http://vorpus.org
  • Steven D'Aprano at Sep 10, 2015 at 1:55 am

    On Wed, Sep 09, 2015 at 04:15:31PM -0700, Nathaniel Smith wrote:


    The real reasons to prefer non-cryptographic RNGs are the auxiliary
    features like determinism, speed, jumpahead, multi-thread
    friendliness, etc. But the stdlib random module doesn't really provide
    any of these (except determinism in strictly limited cases), so I'm
    not sure it matters much.

    The default MT is certainly deterministic, and although only the output
    of random() itself is guaranteed to be reproducible, the other methods
    are *usually* stable in practice.


    There's a jumpahead method too, and for use with multiple threads,
    you can (and should) create your own instances that don't share
    state. I call that "multi-thread friendliness" :-)


    I think Paul Moore's position is a good one. Anyone writing crypto code
    without reading the docs and understanding what they are doing are
    surely making more mistakes than just using the wrong PRNG. There may be
    a good argument for adding arc4random support to the stdlib, but making
    it the default (with the disadvantages discussed, breaking backwards
    compatibility, surprising non-crypto users, etc.) won't fix the broken
    crypto code. It will just give people a false sense of security and
    encourage them to ignore the docs and write broken crypto code.






    --
    Steve
  • Tim Peters at Sep 10, 2015 at 2:23 am
    [Nathaniel Smith]
    The real reasons to prefer non-cryptographic RNGs are the auxiliary
    features like determinism, speed, jumpahead, multi-thread
    friendliness, etc. But the stdlib random module doesn't really provide
    any of these (except determinism in strictly limited cases), so I'm
    not sure it matters much.

    [Steven D'Aprano]
    The default MT is certainly deterministic, and although only the output
    of random() itself is guaranteed to be reproducible, the other methods
    are *usually* stable in practice.

    There's a jumpahead method too,

    Not in Python. There was for the ancient Wichmann-Hill generator, but
    not for MT. A detailed sketch of ways to implement efficient
    jumpahead for MT are given here:


         A Fast Jump Ahead Algorithm for Linear Recurrences
             in a Polynomial Space
         http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/ARTICLES/jump-seta-lfsr.pdf


    But because MT isn't a _simple_ recurrence, they're all depressingly
    complex :-( For Wichmann-Hill it was just a few integer modular
    exponentiations.



    and for use with multiple threads, you can (and should) create your
    own instances that don't share state. I call that "multi-thread friendliness" :-)

    That's what people do, but MT's creators don't recommend it anymore
    (IIRC, their FAQ did recommend it some years ago). Then they switched
    to recommending using jumpahead with a large value (to guarantee
    different instances' states would never overlap). Now (well, last I
    saw) they recommend a parameterized scheme creating a distinct variant
    of MT per thread (not just different state, but a different (albeit
    related) algorithm):


         http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dgene.pdf


    So I'd say it's clear as mud ;-)

    ...
  • Nathaniel Smith at Sep 10, 2015 at 3:32 am
    [Sorry to Tim and Steven if they get multiple copies of this... Gmail
    recently broke their Android app's handling of from addresses, so
    resending, sigh]


    On Sep 9, 2015 7:24 PM, "Tim Peters" wrote:
    [...]
    [Steven D'Aprano]
    [...]
    and for use with multiple threads, you can (and should) create your
    own instances that don't share state. I call that "multi-thread
    friendliness" :-)
    That's what people do, but MT's creators don't recommend it anymore
    (IIRC, their FAQ did recommend it some years ago). Then they switched
    to recommending using jumpahead with a large value (to guarantee
    different instances' states would never overlap). Now (well, last I
    saw) they recommend a parameterized scheme creating a distinct variant
    of MT per thread (not just different state, but a different (albeit
    related) algorithm):

    http:// <http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dgene.pdf>
    www.math.sci.hiroshima-u.ac.jp
    <http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dgene.pdf>/~m-mat/MT/DC/
    <http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dgene.pdf>dgene.pdf
    <http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dgene.pdf>
    So I'd say it's clear as mud ;-)

    Yeah, the independent-seed-for-each-thread approach is an option with any
    RNG, but just like people feel better if they have a 100% certified
    guarantee that the RNG output in a single thread will pass through every
    combination of possible values (if you wait some cosmological time), they
    also feel better if there is some 100% certified guarantee that the RNG
    values in two threads will also be uncorrelated with each other.


    With something like MT, if two threads did end up with nearby seeds, then
    that would be bad: each thread individually would see values that looked
    like high quality randomness, but if you compared across the two threads,
    they would be identical modulo some lag. So all the nice theoretical
    analysis of the single threaded stream falls apart.


    However, for two independently seeded threads to end up anywhere near each
    other in the MT state space requires that you have picked two numbers
    between 0 and 2**19937 and gotten values that were "close". Assuming your
    seeding procedure is functional at all, then this is not a thing that will
    ever actually happen in this universe. So AFAICT the rise of explicitly
    multi-threaded RNG designs is one of those fake problems that exists only
    so people can write papers about solving it. (Maybe this is uncharitable.)


    So there exist RNG designs that handle multi-threading explicitly, and it
    shows up on feature comparison checklists. I don't think it should really
    affect Python's decisions at all though.


    -n
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150909/6b6023e9/attachment-0001.html>
  • Tim Peters at Sep 10, 2015 at 3:35 am
    [Nathaniel Smith <njs@vorpus.org>]
    Yeah, the independent-seed-for-each-thread approach works for any RNG, but
    just like people feel better if they have a 100% certified guarantee that
    the RNG output in a single thread will pass through every combination of
    possible values (if you wait some cosmological time), they also feel better
    if there is some 100% certified guarantee that the RNG values in two threads
    will also be uncorrelated with each other.

    With something like MT, if two threads did end up with nearby seeds, then
    that would be bad: each thread individually would see values that looked
    like high quality randomness, but if you compared across the two threads,
    they would be identical modulo some lag. So all the nice theoretical
    analysis of the single threaded stream falls apart.

    However, for two independently seeded threads to end up anywhere near each
    other in the MT state space requires that you have picked two numbers
    between 0 and 2**19937 and gotten values that were "close". Assuming your
    seeding procedure is functional at all, then this is not a thing that will
    ever actually happen in this universe.

    I think it's worse than that. MT is based on a linear recurrence.
    Take two streams "far apart" in MT space, and their sum also satisfies
    the recurrence. So a possible worry about a third stream isn't _just_
    about correlation or overlap with the first two streams, but,
    depending on the app, also about correlation/overlap with the sum of
    the first two streams. Move to N streams, and there are O(N**2)
    direct sums to worry about, and then sums of sums, and ...


    Still won't cause a problem in _my_ statistical life expectancy, but I
    only have 4 cores ;-)



    So AFAICT the rise of explicitly multi-threaded RNG designs is one of
    those fake problems that exists only so people can write papers about
    solving it. (Maybe this is uncharitable.)

    Uncharitable, but fair :-)



    So there exist RNG designs that handle multi-threading explicitly, and it
    shows up on feature comparison checklists. I don't think it should really
    affect Python's decisions at all though.

    There are some clean and easy approaches to this based on
    crypto-inspired schemes, but giving up crypto strength for speed. If
    you haven't read it, this paper is delightful:


         http://www.thesalmons.org/john/random123/papers/random123sc11.pdf
  • Nathaniel Smith at Sep 10, 2015 at 3:56 am

    On Wed, Sep 9, 2015 at 8:35 PM, Tim Peters wrote:
    There are some clean and easy approaches to this based on
    crypto-inspired schemes, but giving up crypto strength for speed. If
    you haven't read it, this paper is delightful:

    http://www.thesalmons.org/john/random123/papers/random123sc11.pdf

    It really is! As AES acceleration instructions become more common
    (they're now standard IIUC on x86, x86-64, and even recent ARM?), even
    just using AES in CTR mode becomes pretty compelling -- it's fast,
    deterministic, provably equidistributed, *and* cryptographically
    secure enough for many purposes.


    (Compared to a true state-of-the-art CPRNG the naive version fails due
    to lack of incremental mixing, and the use of a reversible transition
    function. But even these are mostly only important to protect against
    attackers who have access to your memory -- which is not trivial as
    heartbleed shows, but still, it's *waaay* ahead of something like MT
    on basically every important axis.)


    -n


    --
    Nathaniel J. Smith -- http://vorpus.org
  • Robert Kern at Sep 10, 2015 at 5:41 pm

    On 2015-09-10 04:56, Nathaniel Smith wrote:
    On Wed, Sep 9, 2015 at 8:35 PM, Tim Peters wrote:
    There are some clean and easy approaches to this based on
    crypto-inspired schemes, but giving up crypto strength for speed. If
    you haven't read it, this paper is delightful:

    http://www.thesalmons.org/john/random123/papers/random123sc11.pdf
    It really is! As AES acceleration instructions become more common
    (they're now standard IIUC on x86, x86-64, and even recent ARM?), even
    just using AES in CTR mode becomes pretty compelling -- it's fast,
    deterministic, provably equidistributed, *and* cryptographically
    secure enough for many purposes.

    I'll also recommend the PCG paper (and algorithm) as the author's cross-PRNGs
    comparisons have been bandied about in this thread already. The paper lays out a
    lot of the relevant issues and balances the various qualities that are
    important: statistical quality, speed, and security (of various flavors).


        http://www.pcg-random.org/paper.html


    I'm actually not that impressed with Random123. The core idea is nice and clean,
    but the implementation is hideously complex.


    --
    Robert Kern


    "I have come to believe that the whole world is an enigma, a harmless enigma
       that is made terrible by our own mad attempt to interpret it as though it had
       an underlying truth."
        -- Umberto Eco
  • Tim Peters at Sep 10, 2015 at 4:47 am
    [Tim, on parallel PRNGs]
    There are some clean and easy approaches to this based on
    crypto-inspired schemes, but giving up crypto strength for speed. If
    you haven't read it, this paper is delightful:

    http://www.thesalmons.org/john/random123/papers/random123sc11.pdf

    [Nathaniel Smith]
    It really is! As AES acceleration instructions become more common
    (they're now standard IIUC on x86, x86-64, and even recent ARM?), even
    just using AES in CTR mode becomes pretty compelling -- it's fast,
    deterministic, provably equidistributed, *and* cryptographically
    secure enough for many purposes.

    Excellent - we're going to have a hard time finding something real to
    disagree about :-)



    (Compared to a true state-of-the-art CPRNG the naive version fails due
    to lack of incremental mixing, and the use of a reversible transition
    function. But even these are mostly only important to protect against
    attackers who have access to your memory -- which is not trivial as
    heartbleed shows, but still, it's *waaay* ahead of something like MT
    on basically every important axis.)

    Except for wide adoption. Most people I bump into never even heard of
    this kind of approach. Nobody ever got fired for buying IBM, and
    nobody ever got fired for recommending MT - it's darned near a
    checklist item when shopping for a language. I may have to sneak the
    code in while you distract Guido with a provocative rant about the
    inherent perfidy of the Dutch character ;-)
  • Nathaniel Smith at Sep 10, 2015 at 5:55 am

    On Wed, Sep 9, 2015 at 9:47 PM, Tim Peters wrote:
    [Tim, on parallel PRNGs]
    There are some clean and easy approaches to this based on
    crypto-inspired schemes, but giving up crypto strength for speed. If
    you haven't read it, this paper is delightful:

    http://www.thesalmons.org/john/random123/papers/random123sc11.pdf
    [Nathaniel Smith]
    It really is! As AES acceleration instructions become more common
    (they're now standard IIUC on x86, x86-64, and even recent ARM?), even
    just using AES in CTR mode becomes pretty compelling -- it's fast,
    deterministic, provably equidistributed, *and* cryptographically
    secure enough for many purposes.
    Excellent - we're going to have a hard time finding something real to
    disagree about :-)

    (Compared to a true state-of-the-art CPRNG the naive version fails due
    to lack of incremental mixing, and the use of a reversible transition
    function. But even these are mostly only important to protect against
    attackers who have access to your memory -- which is not trivial as
    heartbleed shows, but still, it's *waaay* ahead of something like MT
    on basically every important axis.)
    Except for wide adoption. Most people I bump into never even heard of
    this kind of approach. Nobody ever got fired for buying IBM, and
    nobody ever got fired for recommending MT - it's darned near a
    checklist item when shopping for a language. I may have to sneak the
    code in while you distract Guido with a provocative rant about the
    inherent perfidy of the Dutch character ;-)

    :-)


    Srsly though, we've talked about switching to some kind of CTR-mode
    RNG as the default in NumPy (where speed differences are pretty
    visible, b/c we support generating big blocks of random numbers at
    once), and would probably accept a patch. (Just in case Guido is
    undisturbed by scurrilous allegations.)


    --
    Nathaniel J. Smith -- http://vorpus.org
  • Steven D'Aprano at Sep 11, 2015 at 4:08 pm

    On Wed, Sep 09, 2015 at 09:23:23PM -0500, Tim Peters wrote:


    [Steven D'Aprano]
    The default MT is certainly deterministic, and although only the output
    of random() itself is guaranteed to be reproducible, the other methods
    are *usually* stable in practice.

    There's a jumpahead method too,
    Not in Python.

    It is there, up to Python 2.7. I hadn't noticed it was gone in Python 3.




    --
    Steve
  • Tim Peters at Sep 11, 2015 at 5:16 pm
    [Steven D'Aprano]
    The default MT is certainly deterministic, and although only the output
    of random() itself is guaranteed to be reproducible, the other methods
    are *usually* stable in practice.

    There's a jumpahead method too,

    [Tim]
    Not in Python.

    [Steve]
    It is there, up to Python 2.7. I hadn't noticed it was gone in Python 3.

    Yes, there's something _called_ `,jumpahead()`, for backward
    compatibility with the old WIchmann-Hill generator. But what it does
    for MT is "eh - no idea what to do, so let's just make stuff up":


         def jumpahead(self, n):
             """Change the internal state to one that is likely far away
             from the current state. This method will not be in Py3.x,
             so it is better to simply reseed.
             """
             # The super.jumpahead() method uses shuffling to change state,
             # so it needs a large and "interesting" n to work with. Here,
             # we use hashing to create a large n for the shuffle.
             s = repr(n) + repr(self.getstate())
             n = int(_hashlib.new('sha512', s).hexdigest(), 16)
             super(Random, self).jumpahead(n)


    I doubt there's anything that can be proved about the result of doing
    that - except that it's almost certain it won't bear any relationship
    to what calling the generator `n` times instead would have done ;-)
  • Tim Peters at Sep 10, 2015 at 3:23 am
    [Nathaniel Smith <njs@pobox.com>]
    Yeah, equidistribution is not a guarantee of anything on its own. For
    example, an integer counter modulo 2**(623*32) is equidistributed to
    32 bits across 623 dimensions, just like the Mersenne Twister.

    The analogy is almost exact. If you view MT's state as a single
    19937-bit integer (b3*32 + 1), then MT's state "simply" cycles
    through a specific permutation of


         range(1, 2**19937)


    (with a different orbit consisting solely of 0). That was the "hard
    part" to prove. Everything about equidistribution was more of an
    observation following from that. Doesn't say anything about
    distribution "in the small" (across small slices), alas.



    ... And hey, if
    arc4random *does* mess up your simulation, then congratulations, your
    simulation is publishable as a cryptographic attack and will probably
    get written up in the NYTimes :-).

    Heh. In the NYT or a security wonk's blog, maybe. But why would a
    reputable journal believe me? By design, the results of using the
    OpenBSD arc4random can't be reproduced ;-)



    The real reasons to prefer non-cryptographic RNGs are the auxiliary
    features like determinism, speed, jumpahead, multi-thread
    friendliness, etc. But the stdlib random module doesn't really provide
    any of these (except determinism in strictly limited cases), so I'm
    not sure it matters much.

    Python's implementation of MT has never changed anything about the
    sequence produced from a given seed state, and indeed gives the same
    sequence from the same seed state as every other correct
    implementation of the same flavor of MT. That is "strictly limited",
    to perfection ;-) At a higher level, depends on the app. People are
    quite creative at defeating efforts to be helpful ;-)



    The Twister's provably perfect equidistribution across its whole
    period also has its scary sides. For example, run random.random()
    often enough, and it's _guaranteed_ you'll eventually reach a state
    where the output is exactly 0.0 hundreds of times in a row. That will
    happen as often as it "should happen" by chance, but that's scant
    comfort if you happen to hit such a state. Indeed, the Twister was
    patched relatively early in its life to try to prevent it from
    _starting_ in such miserable states. Such states are nevertheless
    still reachable from every starting state.
    This criticism seems a bit unfair though

    Those are facts, not criticisms. I like the Twister very much. But
    those who have no fear of it are dreaming - while those who have
    significant fear of it are also dreaming. It's my job to ensure
    nobody is either frightened or complacent ;-)



    -- even a true stream of random bits (e.g. from a true unbiased
    quantum source) has this property,

    But good generators with astronomically smaller periods do not. In a
    sense, MT made it possible to get results "far more random" than any
    widely used deterministic generator before it. The patch I mentioned
    above was to fix real problems in real life, where people using simple
    seeding schemes systematically ended up with results so transparently
    ludicrous nobody could possibly accept them for any purpose.


    "The fix" consisted of using scrambling functions to spray the bits in
    the user-*supplied* seed all over the place, in a pseudo-random way,
    to probabilistically ensure "the real" state wouldn't end up with "too
    many" zero bits. "A lot of zero bits" tends to persist across MT
    state transitions for a long time.



    and trying to avoid this happening would introduce bias that really
    could cause problems in practice.

    For example, nobody _needs_ a generator capable of producing hundreds
    of 0.0 in a row. Use a variant even of MT with a much smaller period,
    and that problem goes away, with no bad effects on any app.


    Push that more, and what many Monte Carlo applications _really_ want
    is "low discrepancy": some randomness is essential, but becomes a
    waste of cycles and assaults "common sense" if it gets too far from
    covering the domain more-or-less uniformly. So there are many ways
    known of generating "quasi-random" sequences instead, melding a notion
    of randomness with guarantees of relatively low discrepancy (low
    deviation from uniformity). Nothing works best for all purposes -
    except Python.



    A good probabilistic program is one that has a high probability
    of returning some useful result, but they always have some
    low probability of returning something weird. So this is just
    saying that most people don't understand probability.

    Or nobody does. Probability really is insufferably subtle. Guido
    should ban it.

    Which is true, but there isn't much that the random module can do
    about it :-)

    Python should just remove it. It's an "attractive nuisance" to everyone ;-)
  • Robert Kern at Sep 10, 2015 at 5:29 pm

    On 2015-09-10 00:15, Nathaniel Smith wrote:
    On Wed, Sep 9, 2015 at 3:19 PM, Tim Peters wrote:
    The Twister's provably perfect equidistribution across its whole
    period also has its scary sides. For example, run random.random()
    often enough, and it's _guaranteed_ you'll eventually reach a state
    where the output is exactly 0.0 hundreds of times in a row. That will
    happen as often as it "should happen" by chance, but that's scant
    comfort if you happen to hit such a state. Indeed, the Twister was
    patched relatively early in its life to try to prevent it from
    _starting_ in such miserable states. Such states are nevertheless
    still reachable from every starting state.
    This criticism seems a bit unfair though -- even a true stream of
    random bits (e.g. from a true unbiased quantum source) has this
    property, and trying to avoid this happening would introduce bias that
    really could cause problems in practice. A good probabilistic program
    is one that has a high probability of returning some useful result,
    but they always have some low probability of returning something
    weird. So this is just saying that most people don't understand
    probability. Which is true, but there isn't much that the random
    module can do about it :-)

    The MT actually does have a problem unique to it (or at least to its family of
    Generalized Feedback Shift Registers) where a state with a high proportion of 0
    bits will get stuck in a region of successive states with high proportions of 0
    bits. Other 623-dimensional equidistributed PRNGs will indeed come across the
    same states with high 0-bit sequences with the frequency that you expect from
    probability, but they will be surrounded by states with dissimilar 0-bit
    proportions. This problem isn't *due* to equidistribution per se, but I think
    Tim's point is that you are inevitably due to hit one such patch if you sample
    long enough.


    --
    Robert Kern


    "I have come to believe that the whole world is an enigma, a harmless enigma
       that is made terrible by our own mad attempt to interpret it as though it had
       an underlying truth."
        -- Umberto Eco
  • Tim Peters at Sep 10, 2015 at 5:48 pm
    [Robert Kern <robert.kern@gmail.com>]
    The MT actually does have a problem unique to it (or at least to its family
    of Generalized Feedback Shift Registers) where a state with a high
    proportion of 0 bits will get stuck in a region of successive states with
    high proportions of 0 bits. Other 623-dimensional equidistributed PRNGs will
    indeed come across the same states with high 0-bit sequences with the
    frequency that you expect from probability, but they will be surrounded by
    states with dissimilar 0-bit proportions. This problem isn't *due* to
    equidistribution per se, but I think Tim's point is that you are inevitably
    due to hit one such patch if you sample long enough.

    Thank you for explaining it better than I did. I implied MT's "stuck
    in zero-land" problems were _due_ to perfect equidistribution, but of
    course they're not. It's just that MT's specific permutation of the
    equidistributed-regardless-of-order range(1, 2**19937) systematically
    puts integers with "lots of 0 bits" next to each other.


    And there are many such patches. But 2**19337 is so large you need to
    contrive the state "by hand" to get there at once. For example,

    random.setstate((2, (0,)*600 + (1,)*24 + (624,), None))
    random.random()
    0.0
    random.random()
    0.0
    random.random()
    0.0
    random.random()
    0.0
    random.random()
    0.0
    random.random()
    0.0
    random.random()
    0.0
    random.random()
    0.0
    random.random()
    0.0
    random.random()
    0.0
    random.random()
    0.0
    random.random()
    0.0


    That's "impossible" ;-) (1 chance in 2***(53*12)) of seeing 0.0
    twelve times in a row)
  • Random832 at Sep 9, 2015 at 8:09 pm

    On Wed, Sep 9, 2015, at 15:07, Steven D'Aprano wrote:
    Not really. Look at the subject line. It doesn't say "should we change
    from MT to arc4random?", it asks if the default random number generator
    should be secure. The only reason we are considering the change from MT
    to arc4random is to make the PRNG cryptographically secure. "Secure" is
    a moving target, what is secure today will not be secure tomorrow.

    Right, but we are discussing making it secure today.
  • Steven D'Aprano at Sep 10, 2015 at 1:27 am

    On Wed, Sep 09, 2015 at 04:09:21PM -0400, random832 at fastmail.us wrote:
    On Wed, Sep 9, 2015, at 15:07, Steven D'Aprano wrote:
    Not really. Look at the subject line. It doesn't say "should we change
    from MT to arc4random?", it asks if the default random number generator
    should be secure. The only reason we are considering the change from MT
    to arc4random is to make the PRNG cryptographically secure. "Secure" is
    a moving target, what is secure today will not be secure tomorrow.
    Right, but we are discussing making it secure today.

    No, *you* are discussing making it secure today. The rest of us are
    discussing making it secure for all time.




    --
    Steve
  • Greg Ewing at Sep 9, 2015 at 11:23 pm

    Steven D'Aprano wrote:
    one desirable
    property of PRNGs is that you can repeat a sequence of values if you
    re-seed with a known value. Does arc4random keep that property?

    Another property that's important for some applications is
    to be able to efficiently "jump ahead" some number of steps
    in the sequence, to produce multiple independent streams of
    numbers. It would be good to know if that is possible with
    arc4random.


    --
    Greg
  • Chris Angelico at Sep 10, 2015 at 4:57 am

    On Thu, Sep 10, 2015 at 9:23 AM, Greg Ewing wrote:
    Steven D'Aprano wrote:
    one desirable property of PRNGs is that you can repeat a sequence of
    values if you re-seed with a known value. Does arc4random keep that
    property?

    Another property that's important for some applications is
    to be able to efficiently "jump ahead" some number of steps
    in the sequence, to produce multiple independent streams of
    numbers. It would be good to know if that is possible with
    arc4random.

    If arc4random reseeds with entropy periodically, then jumping ahead
    past such a reseed is simply a matter of performing a reseed, isn't
    it?


    ChrisA
  • Tim Peters at Sep 10, 2015 at 5:00 am
    [Chris Angelico]
    If arc4random reseeds with entropy periodically, then jumping ahead
    past such a reseed is simply a matter of performing a reseed, isn't
    it?

    The OpenBSD version supplies no functionality related to seeds (you
    can't set one, and you can't ask for one).
  • Tim Peters at Sep 10, 2015 at 4:58 am
    [Steven D'Aprano]
    one desirable property of PRNGs is that you can repeat a sequence of
    values if you re-seed with a known value. Does arc4random keep that
    property?

    [Greg Ewing]
    [> Another property that's important for some applications is
    to be able to efficiently "jump ahead" some number of steps
    in the sequence, to produce multiple independent streams of
    numbers. It would be good to know if that is possible with
    arc4random.

    No for "arc4random" based on RC4, yes for "arc4random" based on
    ChaCha20, "mostly yes" for "arc4random" in the OpenBSD implementation,
    wholly unknown for whatever functions that will may be_called_
    "arc4random" in the future.


    The fly in the ointment for the OpenBSD version is that it
    periodically fiddles its internal state with "entropy" obtained from
    the kernel. It's completely unreproducible for that reason. However,
    you can still jump ahead in the state. It's just impossible to say
    that it's the same state you would have arrived at had you invoked the
    function that many times instead (the kernel could change the state in
    unpredictable ways any number of times while you were doing that).;
  • Tim Peters at Sep 10, 2015 at 5:10 am
    [Tim]
    ...
    The fly in the ointment for the OpenBSD version is that it
    periodically fiddles its internal state with "entropy" obtained from
    the kernel. It's completely unreproducible for that reason. However,
    you can still jump ahead in the state.

    I should add: but only if they supplied a jumpahead function. Which
    they don't.
  • M.-A. Lemburg at Sep 10, 2015 at 8:26 am
    Reading this thread is fun, but it doesn't seem to be getting
    anywhere - perhaps that's part of the fun ;-)


    Realistically, I see two options:


      1. Someone goes and implements the OpenBSD random function in C
         and put a package up on PyPI, updating it whenever OpenBSD
         thinks that a new algorithm is needed or a security issue
         has to be fixed (from my experience with other crypto software
         like OpenSSL, this should be on the order of every 2-6 months ;-))


      2. Ditto, but we put the module in the stdlib and then run around
         issuing patch level security releases every 2-6 months.


    Replacing our deterministic default PRNG with a non-deterministic
    one doesn't really fly, since we'd break an important feature
    of random.random(). You may remember that we already ran a similar
    stunt with the string hash function, with very mixed results.


    Calling the result of such a switch-over "secure" is even
    worse, since it's a promise we cannot keep (probably not even
    fully define). Better leave the promise at "insecure" - that's
    something we can promise forever and don't have to define :-)


    Regardless of what we end up with, I think Python land can do
    better than name it "arc4random". We're great at bike shedding,
    so how about we start the fun with "randomYMMV" :-)


    Overall, I think having more options for good PRNGs is great.
    Whether this "arc4random" is any good remains to be seen, but
    given that OpenBSD developed it, chances are higher than
    usual.


    --
    Marc-Andre Lemburg
    eGenix.com


    Professional Python Services directly from the Source (#1, Sep 10 2015)
    Python Projects, Coaching and Consulting ... http://www.egenix.com/
    mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/
    mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
    ________________________________________________________________________
    2015-09-18: PyCon UK 2015 ... 8 days to go


    ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::


        eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
         D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
                Registered at Amtsgericht Duesseldorf: HRB 46611
                    http://www.egenix.com/company/contact/
  • Stefan Krah at Sep 10, 2015 at 1:39 pm

    M.-A. Lemburg <mal@...> writes:
    Reading this thread is fun, but it doesn't seem to be getting
    anywhere - perhaps that's part of the fun

    Realistically, I see two options:

    1. Someone goes and implements the OpenBSD random function in C
    and put a package up on PyPI, updating it whenever OpenBSD
    thinks that a new algorithm is needed or a security issue
    has to be fixed (from my experience with other crypto software
    like OpenSSL, this should be on the order of every 2-6 months )

    The sane option would be to use the OpenBSD libcrypto, which seems to
    be part of their OpenSSL fork (libressl), just like libcrypto is part
    of OpenSSL.


    Then the crypto maintenance would be delegated to the distributions.


    I would even be interested in writing such a package, but it would
    be external and non-redistributable for well-known reasons. :)




    Stefan Krah
  • M.-A. Lemburg at Sep 10, 2015 at 3:59 pm

    On 10.09.2015 15:39, Stefan Krah wrote:
    M.-A. Lemburg <mal@...> writes:
    Reading this thread is fun, but it doesn't seem to be getting
    anywhere - perhaps that's part of the fun

    Realistically, I see two options:

    1. Someone goes and implements the OpenBSD random function in C
    and put a package up on PyPI, updating it whenever OpenBSD
    thinks that a new algorithm is needed or a security issue
    has to be fixed (from my experience with other crypto software
    like OpenSSL, this should be on the order of every 2-6 months )
    The sane option would be to use the OpenBSD libcrypto, which seems to
    be part of their OpenSSL fork (libressl), just like libcrypto is part
    of OpenSSL.

    Well, we already link to OpenSSL for SSL and hashes. I guess exposing
    the OpenSSL RAND interface in a module would be the easiest way
    to go about this.


    pyOpenSSL already does this:


    http://www.egenix.com/products/python/pyOpenSSL/doc/pyopenssl.html/#document-api/rand


    More pointers:
    https://wiki.openssl.org/index.php/Random_Numbers
    https://www.openssl.org/docs/manmaster/crypto/rand.html


    What's nice about the API is that you can add entropy as you
    find it.

    Then the crypto maintenance would be delegated to the distributions.

    I would even be interested in writing such a package, but it would
    be external and non-redistributable for well-known reasons. :)

    --
    Marc-Andre Lemburg
    eGenix.com


    Professional Python Services directly from the Source (#1, Sep 10 2015)
    Python Projects, Coaching and Consulting ... http://www.egenix.com/
    mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/
    mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
    ________________________________________________________________________
    2015-09-18: PyCon UK 2015 ... 8 days to go


    ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::


        eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
         D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
                Registered at Amtsgericht Duesseldorf: HRB 46611
                    http://www.egenix.com/company/contact/

Related Discussions

People

Translate

site design / logo © 2018 Grokbase