FAQ
In my experience it is far more expensive to allocate a lock in python then
it is the types that use them. Here are some examples:
timeit.timeit('Lock()', 'from threading import Lock')
1.4449114807669048
timeit.timeit('dict()')
0.2821554294221187
timeit.timeit('list()')
0.17358153222312467


Granted, I know these types do not allocate a lock - but a similarly defined
user type would need to allocate a lock to guarantee thread-safety.

I have not done a great deal of research into lockless algorithms thusfar,
but would it be worthwhile to investigate implementing some of them in
python to optimize performance critical sections of code?

Many lockless algorithms that I have looked at thusfar require a "Compare
and Swap" operation. Does python have an equivalent to this? Is python high
level enough that it has something better than this or it simply doesn't
need it?

--
Zachary Burns
(407)590-4814
Aim - Zac256FL
Production Engineer
Zindagi Games
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20100628/d1a0216b/attachment.html>

Search Discussions

  • Ryan Kelly at Jun 29, 2010 at 12:30 am

    On Mon, 2010-06-28 at 16:46 -0700, Zac Burns wrote:
    In my experience it is far more expensive to allocate a lock in python
    then it is the types that use them. Here are some examples:
    timeit.timeit('Lock()', 'from threading import Lock')
    1.4449114807669048
    timeit.timeit('dict()')
    0.2821554294221187
    timeit.timeit('list()')
    0.17358153222312467


    Granted, I know these types do not allocate a lock - but a similarly
    defined user type would need to allocate a lock to guarantee
    thread-safety.
    Sure, but I think you're timing the wrong thing here. You would only
    allocate the lock relatively rarely - it's the overhead of *acquiring*
    the lock that's the real problem.

    rfk at durian:~$ python -m timeit -s "from threading import Lock; l = Lock()" "l.acquire(); l.release()"
    1000000 loops, best of 3: 0.378 usec per loop

    Compare to one of my favourite dict methods, setdefault:

    rfk at durian:~$ python -m timeit -s "d = dict()" "d.setdefault('hello','world')"
    10000000 loops, best of 3: 0.189 usec per loop


    In the same ballpark really.
    I have not done a great deal of research into lockless algorithms
    thusfar, but would it be worthwhile to investigate implementing some
    of them in python to optimize performance critical sections of code?

    Many lockless algorithms that I have looked at thusfar require a
    "Compare and Swap" operation. Does python have an equivalent to this?
    Is python high level enough that it has something better than this or
    it simply doesn't need it?
    I've managed to avoid locking in some cases by using dict.setdefault()
    as a kind of atomic test-and-set operation. It's not a full
    compare-and-swap but you could implement a simple locking scheme using
    it like this:


    class PretendLock:
    def __init__(self):
    self.d = dict()
    def acquire(self):
    myid = threading.currentThread().ident
    while self.d.setdefault("owner",myid) != myid:
    pass
    def release(self):
    del self.d["owner"]


    This depends on some peculiarities of the CPython implementation that
    aren't guaranteed by the language spec - namely that thread switches
    can't happen while executing C code, that dict.setdefault is implemented
    in C, and that is can calculate the hash of a string key without calling
    back out to python code.

    Usually, it's not worth it. The one time I use this regularly is for
    lazy object initialisation, e.g. something like this:

    def get_my_object(key):
    try:
    return objects[key]
    except KeyError:
    return objects.setdefault(key,create_my_object(key))

    If two threads happen to try initialising the object at the same time,
    only one will wind up in the dict and being used; the other will be
    discarded and garbage-collected.



    Cheers,

    Ryan




    --
    Ryan Kelly
    http://www.rfk.id.au | This message is digitally signed. Please visit
    ryan at rfk.id.au | http://www.rfk.id.au/ramblings/gpg/ for details

    -------------- next part --------------
    A non-text attachment was scrubbed...
    Name: signature.asc
    Type: application/pgp-signature
    Size: 198 bytes
    Desc: This is a digitally signed message part
    URL: <http://mail.python.org/pipermail/python-list/attachments/20100629/423143bf/attachment-0001.pgp>
  • Zac Burns at Jun 29, 2010 at 1:27 am

    Sure, but I think you're timing the wrong thing here. You would only
    allocate the lock relatively rarely - it's the overhead of *acquiring*
    the lock that's the real problem.

    rfk at durian:~$ python -m timeit -s "from threading import Lock; l =
    Lock()" "l.acquire(); l.release()"
    1000000 loops, best of 3: 0.378 usec per loop

    Compare to one of my favourite dict methods, setdefault:

    rfk at durian:~$ python -m timeit -s "d = dict()"
    "d.setdefault('hello','world')"
    10000000 loops, best of 3: 0.189 usec per loop


    In the same ballpark really.

    I've managed to avoid locking in some cases by using dict.setdefault()
    as a kind of atomic test-and-set operation. It's not a full
    compare-and-swap but you could implement a simple locking scheme using
    it like this:


    class PretendLock:
    def __init__(self):
    self.d = dict()
    def acquire(self):
    myid = threading.currentThread().ident
    while self.d.setdefault("owner",myid) != myid:
    pass
    def release(self):
    del self.d["owner"]


    This depends on some peculiarities of the CPython implementation that
    aren't guaranteed by the language spec - namely that thread switches
    can't happen while executing C code, that dict.setdefault is implemented
    in C, and that is can calculate the hash of a string key without calling
    back out to python code.

    Usually, it's not worth it. The one time I use this regularly is for
    lazy object initialisation, e.g. something like this:

    def get_my_object(key):
    try:
    return objects[key]
    except KeyError:
    return objects.setdefault(key,create_my_object(key))

    If two threads happen to try initialising the object at the same time,
    only one will wind up in the dict and being used; the other will be
    discarded and garbage-collected.
    Thanks Ryan,

    Agreed, I was timing the wrong thing - but largely because I wasn't sure
    what to time against. setdefault certainly seems to be a candidate for
    something to time. Even so, setdefault wins.

    Though, I wouldn't want to use PretendLock in this case... it doesn't seem
    that having the spin loop would justify reducing reducing the lock
    allocation overhead in a no-contention scenario.

    Yes, get_my_object is a good trick. I did just that to avoid instantiating
    extra locks in one scenario.

    --
    Zachary Burns
    (407)590-4814
    Aim - Zac256FL
    Production Engineer
    Zindagi Games
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/python-list/attachments/20100628/4682ff10/attachment-0001.html>
  • Ryan Kelly at Jun 29, 2010 at 1:35 am

    On Mon, 2010-06-28 at 18:27 -0700, Zac Burns wrote:


    I've managed to avoid locking in some cases by using
    dict.setdefault() as a kind of atomic test-and-set operation.
    It's not a full compare-and-swap but you could implement a
    simple locking scheme using it like this:


    class PretendLock:
    def __init__(self):
    self.d = dict()
    def acquire(self):
    myid = threading.currentThread().ident
    while self.d.setdefault("owner",myid) != myid:
    pass
    def release(self):
    del self.d["owner"]


    This depends on some peculiarities of the CPython
    implementation that aren't guaranteed by the language spec -
    namely that thread switches can't happen while executing C
    code, that dict.setdefault is implemented in C, and that is
    can calculate the hash of a string key without calling back
    out to python code.


    Thanks Ryan,

    Agreed, I was timing the wrong thing - but largely because I wasn't
    sure what to time against. setdefault certainly seems to be a
    candidate for something to time. Even so, setdefault wins.

    Though, I wouldn't want to use PretendLock in this case... it doesn't
    seem that having the spin loop would justify reducing reducing the
    lock allocation overhead in a no-contention scenario.

    Oh totally - nobody reading this should even consider using that
    PretendLock class for anything. It's just an example of how to use
    setdefault as a test-and-set kind of operator.


    Ryan

    --
    Ryan Kelly
    http://www.rfk.id.au | This message is digitally signed. Please visit
    ryan at rfk.id.au | http://www.rfk.id.au/ramblings/gpg/ for details

    -------------- next part --------------
    A non-text attachment was scrubbed...
    Name: signature.asc
    Type: application/pgp-signature
    Size: 198 bytes
    Desc: This is a digitally signed message part
    URL: <http://mail.python.org/pipermail/python-list/attachments/20100629/4d07b468/attachment.pgp>
  • Sturlamolden at Jun 29, 2010 at 2:44 am

    Many lockless algorithms that I have looked at thusfar require a
    "Compare and Swap" operation. Does python have an equivalent to this?
    Is python high level enough that it has something better than this or
    it simply doesn't need it?
    Python does have a GIL, and contrary to the title, this has a lot to
    do with the GIL. Specifically, any set of operations protected by the
    GIL are atomic in the CPython interpreter.

    We can therefore get global synchronization 'for free', allowing 'lock
    free' data structures that avoid using threading.Lock, and replaces it
    with, well, nothing. :)

    How to do it (it takes tiny amounts of C or Cython):

    Take at look at page 14:

    http://www.dabeaz.com/python/UnderstandingGIL.pdf

    The critical part is the variable _Py_Ticker in Python's source file
    ceval.c, which is declared volatile int. As it is not declared static
    it is a global variable, and we can actually manipulate its value from
    a C extension:

    extern volatile int _Py_Ticker; // evil!!!

    Now imagine temporarily setting _Py_Ticker to some ridiculous high
    value like 0x7fffffff. That locks out all other threads, practically
    forever, until we restore _Py_Ticker back to it's original value.

    What this gives us is global synchronization for free! The GIL is
    there anyway, so it adds no additional overhead.

    By messing with _Py_Ticker we can force blocks of Python code to
    become atomic in the interpreter. With this hack we can therefore
    create data structures that are lock free relative to the CPython
    interpreter. It is evil, but very cheap compared to using
    threading.Lock. In fact it has exactly zero overhead: the GIL is there
    anyway, and it is already acquired!

    The safest placement of a _Py_Ticker hack is in a context manager. We
    have to make sure it gets restored to a sane value, even in presence
    of an exception.

    Enjoy :)



    P.S. If you get smart and think sys.setcheckinterval(sys.maxint) would
    do the trick, you are actually not that smart. It will lock our own
    thread out with a Poisson rate of 1./sys.getcheckinterval(). Thus we
    have to attack _Py_Ticker from C.

    P.P.S. And when we have taken control over the GIL, don't do anything
    that might release it from C (like reading from a file). We cannot
    afford to loose it :)
  • Sturlamolden at Jun 29, 2010 at 2:45 am

    Many lockless algorithms that I have looked at thusfar require a
    "Compare and Swap" operation. Does python have an equivalent to this?
    Is python high level enough that it has something better than this or
    it simply doesn't need it?
    Python does have a GIL, and contrary to the title, this has a lot to
    do with the GIL. Specifically, any set of operations protected by the
    GIL are atomic in the CPython interpreter.

    We can therefore get global synchronization 'for free', allowing 'lock
    free' data structures that avoid using threading.Lock, and replaces it
    with, well, nothing. :)

    How to do it (it takes tiny amounts of C or Cython):

    Take at look at page 14:

    http://www.dabeaz.com/python/UnderstandingGIL.pdf

    The critical part is the variable _Py_Ticker in Python's source file
    ceval.c, which is declared volatile int. As it is not declared static
    it is a global variable, and we can actually manipulate its value from
    a C extension:

    extern volatile int _Py_Ticker; // evil!!!

    Now imagine temporarily setting _Py_Ticker to some ridiculous high
    value like 0x7fffffff. That locks out all other threads, practically
    forever, until we restore _Py_Ticker back to it's original value.

    What this gives us is global synchronization for free! The GIL is
    there anyway, so it adds no additional overhead.

    By messing with _Py_Ticker we can force blocks of Python code to
    become atomic in the interpreter. With this hack we can therefore
    create data structures that are lock free relative to the CPython
    interpreter. It is evil, but very cheap compared to using
    threading.Lock. In fact it has exactly zero overhead: the GIL is there
    anyway, and it is already acquired!

    The safest placement of a _Py_Ticker hack is in a context manager. We
    have to make sure it gets restored to a sane value, even in presence
    of an exception.

    Enjoy :)



    P.S. If you get smart and think sys.setcheckinterval(sys.maxint) would
    do the trick, you are actually not that smart. It will lock our own
    thread out with a Poisson rate of 1./sys.getcheckinterval(). Thus we
    have to attack _Py_Ticker from C.

    P.P.S. And when we have taken control over the GIL, don't do anything
    that might release it from C (like reading from a file). We cannot
    afford to loose it :)
  • Ryan Kelly at Jun 29, 2010 at 3:11 am

    On Mon, 2010-06-28 at 19:45 -0700, sturlamolden wrote:
    Many lockless algorithms that I have looked at thusfar require a
    "Compare and Swap" operation. Does python have an equivalent to this?
    Is python high level enough that it has something better than this or
    it simply doesn't need it?
    Python does have a GIL, and contrary to the title, this has a lot to
    do with the GIL. Specifically, any set of operations protected by the
    GIL are atomic in the CPython interpreter.

    We can therefore get global synchronization 'for free', allowing 'lock
    free' data structures that avoid using threading.Lock, and replaces it
    with, well, nothing. :)

    How to do it (it takes tiny amounts of C or Cython):

    Take at look at page 14:

    http://www.dabeaz.com/python/UnderstandingGIL.pdf

    The critical part is the variable _Py_Ticker in Python's source file
    ceval.c, which is declared volatile int. As it is not declared static
    it is a global variable, and we can actually manipulate its value from
    a C extension:

    extern volatile int _Py_Ticker; // evil!!!

    Now imagine temporarily setting _Py_Ticker to some ridiculous high
    value like 0x7fffffff. That locks out all other threads, practically
    forever, until we restore _Py_Ticker back to it's original value.

    What this gives us is global synchronization for free! The GIL is
    there anyway, so it adds no additional overhead.
    Very interesting idea. Will it work if accessed through ctypes?

    ticker = ctypes.c_int.in_dll(ctypes.pythonapi,"_Py_Ticker")
    ticker.value = 0x7fffffff

    Or does ctypes muck with the GIL in a way that would break this idea?


    Ryan


    --
    Ryan Kelly
    http://www.rfk.id.au | This message is digitally signed. Please visit
    ryan at rfk.id.au | http://www.rfk.id.au/ramblings/gpg/ for details

    -------------- next part --------------
    A non-text attachment was scrubbed...
    Name: signature.asc
    Type: application/pgp-signature
    Size: 198 bytes
    Desc: This is a digitally signed message part
    URL: <http://mail.python.org/pipermail/python-list/attachments/20100629/4ef6d18b/attachment.pgp>
  • Sturlamolden at Jun 29, 2010 at 3:32 am

    On 29 Jun, 05:11, Ryan Kelly wrote:

    Very interesting idea. ?Will it work if accessed through ctypes?

    ? ?ticker = ctypes.c_int.in_dll(ctypes.pythonapi,"_Py_Ticker")
    ? ?ticker.value = 0x7fffffff

    Or does ctypes muck with the GIL in a way that would break this idea?
    ctypes.pythonapi
    <PyDLL 'python dll', handle 1e000000 at 1ffead0>

    It's a PyDLL, so it should not mock with the GIL.


    from contextlib import contextmanager
    import ctypes
    _Py_Ticker = ctypes.c_int.in_dll(ctypes.pythonapi,"_Py_Ticker")

    @contextmanager
    def atomic():
    tmp = _Py_Ticker.value
    _Py_Ticker.value = 0x7fffffff
    yield
    _Py_Ticker.value = tmp - 1


    Now we can do

    with atomic():
    # whatever
    pass

    Just make sure we don't call extension code that releases the GIL :)
  • Antoine Pitrou at Jul 2, 2010 at 8:26 pm

    On Mon, 28 Jun 2010 16:46:41 -0700 Zac Burns wrote:
    In my experience it is far more expensive to allocate a lock in python then
    it is the types that use them. Here are some examples:
    timeit.timeit('Lock()', 'from threading import Lock')
    1.4449114807669048
    timeit.timeit('dict()')
    0.2821554294221187
    timeit.timeit('list()')
    0.17358153222312467
    I'm not sure what Python version on what machine you are using, but
    here (Python 2.7):
    timeit.timeit('Lock()', 'from threading import Lock')
    0.09944796562194824
    timeit.timeit('dict()')
    0.17817902565002441
    timeit.timeit('list()')
    0.19633007049560547
    timeit.timeit('{}')
    0.03823709487915039
    timeit.timeit('[]')
    0.05156302452087402
  • Zac Burns at Jul 6, 2010 at 5:06 pm
    I'm not an expert on this, but wouldn't it be more dependent on the platform
    than python version? Perhaps it is Windows 7 that is very slow. Perhaps my
    processor architecture. Not sure...

    Here are some for 3.1.2x64
    import timeit
    timeit.timeit('Lock()', 'from threading import Lock')
    1.4162585386092708
    timeit.timeit('dict()', 'from threading import Lock')
    0.2730348901369162
    timeit.timeit('list()', 'from threading import Lock')
    0.1719480219512306

    -Zac
    On Fri, Jul 2, 2010 at 1:26 PM, Antoine Pitrou wrote:

    On Mon, 28 Jun 2010 16:46:41 -0700
    Zac Burns wrote:
    In my experience it is far more expensive to allocate a lock in python then
    it is the types that use them. Here are some examples:
    timeit.timeit('Lock()', 'from threading import Lock')
    1.4449114807669048
    timeit.timeit('dict()')
    0.2821554294221187
    timeit.timeit('list()')
    0.17358153222312467
    I'm not sure what Python version on what machine you are using, but
    here (Python 2.7):
    timeit.timeit('Lock()', 'from threading import Lock')
    0.09944796562194824
    timeit.timeit('dict()')
    0.17817902565002441
    timeit.timeit('list()')
    0.19633007049560547
    timeit.timeit('{}')
    0.03823709487915039
    timeit.timeit('[]')
    0.05156302452087402



    --
    http://mail.python.org/mailman/listinfo/python-list
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/python-list/attachments/20100706/e2445b2f/attachment.html>

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedJun 28, '10 at 11:46p
activeJul 6, '10 at 5:06p
posts10
users4
websitepython.org

People

Translate

site design / logo © 2022 Grokbase