FAQ
Hi!

I'm now debugging a very nasty bug in a multithreadded program embedding
Python for several days.

After quite some work I found the following:

One of the threads occasionally locks at
PyThread_acquire_lock/pthread_cond_wait when trying to get the
interpreter_lock or the import_lock. This thread will block there forever.

But other threads may get the same lock w/o any problem at all as it seems.
And when I look on it in gdb it looks even more astonishing:

(gdb) bt
#0 0x4026cea9 in sigsuspend () from /lib/libc.so.6
#1 0x4003bd48 in __pthread_wait_for_restart_signal () from
/lib/libpthread.so.0
#2 0x4003814b in pthread_cond_wait () from /lib/libpthread.so.0
#3 0x080fc44a in PyThread_acquire_lock (lock=0x81ac940, waitflag=1)
at Python/thread_pthread.h:374
#4 0x080d3187 in PyEval_RestoreThread (tstate=0x815a358) at
Python/ceval.c:342
#5 0x0806ac04 in capisuite_disconnect (args=0x8208bdc) at
capisuitemodule.cpp:422
#6 0x080ab897 in PyCFunction_Call (func=0x8172fa0, arg=0x8208bdc,
kw=0x4035bc90) at Objects/methodobject.c:90
#7 0x080d0290 in eval_frame (f=0x81e42dc) at Python/ceval.c:2004
#8 0x080d0c9c in PyEval_EvalCodeEx (co=0x826c318, globals=0xfffffffc,
locals=0x0, args=0x81fbda4, argcount=5, kws=0x81fbdb8, kwcount=0,
defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2585
#9 0x080d1c4d in fast_function (func=0xfffffffc, pp_stack=0xfffffffc, n=-4,
na=5, nk6297892) at Python/ceval.c:3161
#10 0x080d01e7 in eval_frame (f=0x81fbbec) at Python/ceval.c:2024
#11 0x080d0c9c in PyEval_EvalCodeEx (co=0x82ae2a8, globals=0xfffffffc,
locals=0x0,args=0x81f77b8, argcount=1, kws=0x0, kwcount=0, defs=0x0,
defcount=0, closure=0x0) at Python/ceval.c:2585
#12 0x08115f84 in function_call (func=0x8265f0c, arg=0x81f77ac, kw=0x0)
at Objects/funcobject.c:374
#13 0x080936ae in PyObject_Call (func=0x8, arg=0x81f77ac, kw=0x0) at
Objects/abstract.c:1684
#14 0x080d1a49 in PyEval_CallObjectWithKeywords (func=0x8265f0c,
arg=0x81f77ac, kw=0x0) at Python/ceval.c:3049
#15 0x08093685 in PyObject_CallObject (o=0x8265f0c, a=0x81f77ac) at
Objects/abstract.c:1675
#16 0x08066386 in PythonScript::run() (this=0x8174ee0) at
pythonscript.cpp:80
#17 0x0806822e in IdleScript::run() (this=0x8174ee0) at idlescript.cpp:59
#18 0x40090ee1 in ost::ThreadImpl::ThreadExecHandler(ost::Thread*) ()
from /usr/lib/libccgnu2-0.99.so.0
#19 0x4009025c in ccxx_exec_handler.1 () from /usr/lib/libccgnu2-0.99.so.0
#20 0x400391b0 in pthread_start_thread () from /lib/libpthread.so.0

Ok, it tries to get the global lock but:

(gdb) print *((pthread_lock*) interpreter_lock)
$7 = {locked = 0 '\0', lock_released = {__c_lock = {__status = 0,
__spinlock = 0}, __c_waiting = 0x0}, mut = {__m_reserved = 0, __m_count = 0,
__m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0, __spinlock = 0}}}

So the lock is indeed not locked!! I don't understand this at all.

The same phenomenom I saw once when trying to get the import_lock.

pthread_cond_wait () was blocked but import_lock_level was 0 and
import_lock_thread was -1.

Anybody seen anything like this?

It occurs on GNU/Linux using pthreads from glibc 2.2.5. I'm using Python
2.2.1 but can't see that 2.2.2 will improve this somehow...

This is very important for me as the program will get my diploma thesis and
if I couldn't get this problem solved in the next week, I'll get in real
trouble. :-((

So please, please if any of you has some idea which could be helpful, please
tell me!! TIA!!!

--
Ciao,

Gernot

Search Discussions

  • Jeremy Hylton at Feb 6, 2003 at 6:18 pm
    Gernot Hillier <ghillie at suse.de> wrote in message news:<b1tqqs$q4e$1 at Fourier.suse.de>...
    pthread_cond_wait () was blocked but import_lock_level was 0 and
    import_lock_thread was -1.

    Anybody seen anything like this?

    It occurs on GNU/Linux using pthreads from glibc 2.2.5. I'm using Python
    2.2.1 but can't see that 2.2.2 will improve this somehow...

    This is very important for me as the program will get my diploma thesis and
    if I couldn't get this problem solved in the next week, I'll get in real
    trouble. :-((

    So please, please if any of you has some idea which could be helpful, please
    tell me!! TIA!!!
    A condition variable that is waiting does not hold the lock. You can
    only call wait with the lock acquired, and the first thing wait does
    is release the lock.
    So there shouldn't be any surprise that you are blocked in wait
    without holding
    the lock.

    If you have a multi-threaded application, you should look at what all
    the other threads are doing. One of the other threads has the lock
    and hasn't released
    it. There's a good chance that there is inconsistent use of app-level
    locking
    with the GIL.

    Jeremy
  • Gernot Hillier at Feb 6, 2003 at 9:13 pm
    Hi!

    Jeremy Hylton wrote:
    A condition variable that is waiting does not hold the lock. You can
    I think you misunderstood me. I don't speak about a Python-level condition
    variable. I mean the condition variable which is used internally to
    implement the lock:

    I'll quote Python/thread_pthread.h (shortened):

    PyThread_acquire_lock(PyThread_type_lock lock, int waitflag)
    {
    status = pthread_mutex_lock( &thelock->mut );
    success = thelock->locked == 0;
    if (success) thelock->locked = 1;
    status = pthread_mutex_unlock( &thelock->mut );

    if ( !success && waitflag ) {
    status = pthread_mutex_lock( &thelock->mut );
    while ( thelock->locked ) {
    status = pthread_cond_wait(&thelock->lock_released,
    &thelock->mut);
    }
    thelock->locked = 1;
    status = pthread_mutex_unlock( &thelock->mut );
    success = 1;
    }
    if (error) success = 0;
    return success;
    }

    So if I'm understanding this right, pthread_cond_wait() will be only called
    if the lock is held and ...

    void
    PyThread_release_lock(PyThread_type_lock lock)
    {
    status = pthread_mutex_lock( &thelock->mut );
    thelock->locked = 0;
    status = pthread_mutex_unlock( &thelock->mut );

    status = pthread_cond_signal( &thelock->lock_released );
    }

    ... will be signalled as soon as the GIL is released.

    So IMHO thelock->locked MUST be 1 while pthread_cond_wait is blocking.

    But sometimes it isn't :-(
    If you have a multi-threaded application, you should look at what all
    the other threads are doing. One of the other threads has the lock
    and hasn't released
    it.
    That was my first idea, but no other thread is doing anything Python-related
    at the moment the one thread is blocking.

    And what is very astonishing is the fact that other threads can (later on)
    quite happily acquire and release the GIL as often as they want to - my
    firstly started thread still blocks at the cond_wait() call and will do so
    forever :-(
    There's a good chance that there is inconsistent use of app-level
    locking with the GIL.
    Possible, but I tripled checked all parts of my programs - and that wouldn't
    explain how a similar block can occur with the import_lock - as I don't use
    this thingie myself anywhere in my code...

    --
    Ciao,

    Gernot
  • Tim Peters at Feb 6, 2003 at 7:48 pm
    [Gernot Hillier]
    I'm now debugging a very nasty bug in a multithreadded program embedding
    Python for several days.
    So you've barely got a start on it <wink>.
    After quite some work I found the following:

    One of the threads occasionally locks at
    PyThread_acquire_lock/pthread_cond_wait when trying to get the
    interpreter_lock or the import_lock. This thread will block there forever.

    But other threads may get the same lock w/o any problem at all as
    it seems.
    And when I look on it in gdb it looks even more astonishing:

    (gdb) bt
    #0 0x4026cea9 in sigsuspend () from /lib/libc.so.6
    #1 0x4003bd48 in __pthread_wait_for_restart_signal () from
    /lib/libpthread.so.0
    That's enough. If the it never gets a restart signal, it will stay there
    forever. Whether it does get a restart signal is out of Python's hands,
    though -- that's up to the pthreads implementation.
    ...
    Ok, it tries to get the global lock but:

    (gdb) print *((pthread_lock*) interpreter_lock)
    $7 = {locked = 0 '\0', lock_released = {__c_lock = {__status = 0,
    __spinlock = 0}, __c_waiting = 0x0}, mut = {__m_reserved = 0,
    __m_count = 0,
    __m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0, __spinlock = 0}}}

    So the lock is indeed not locked!! I don't understand this at all.
    "The lock" is ambiguous. There's the pthreads mutex ("mut" in the above),
    and there's the Python GIL (implemented by that entire data structure). As
    Jeremy said, it's normal for mut to be unlocked during a condvar wait.
    Whether the GIL is locked is really irrelevant, because your stack trace
    shows that it's in the bowels of the platform condvar wait implementation,
    presumably waiting for a signal it's never going to get.
    The same phenomenom I saw once when trying to get the import_lock.

    pthread_cond_wait () was blocked but import_lock_level was 0 and
    import_lock_thread was -1.

    Anybody seen anything like this?

    It occurs on GNU/Linux using pthreads from glibc 2.2.5. I'm using Python
    2.2.1 but can't see that 2.2.2 will improve this somehow...
    Trying Python 2.3a1 might. The GIL under Linux is implemented via POSIX
    semaphores in 2.3, instead of via a condvar+mutex pair.
    This is very important for me as the program will get my diploma
    thesis and if I couldn't get this problem solved in the next week,
    I'll get in real trouble. :-((
    Then let me ask you an odd question: are you using fork()? If so, move
    heaven and earth to get rid of it. Over a year ago a number of people spent
    more than a week trying to solve a similar problem on Linux, and never did
    manage to solve it. All the evidence pointed to a bug in the Linux pthreads
    implementation, due to improper treatment of internal pthreads memory after
    a fork. Forking and threading mix like bananas and motor oil under the best
    of conditions.

    If you're not using fork(), I have no ideas other than to try a different
    OS, or move to Python 2.3a1 and hope the same bug doesn't plague your
    platform semaphore implementation.

    all-oses-are-buggy-ly y'rs - tim
  • Laura Creighton at Feb 6, 2003 at 9:51 pm

    Forking and threading mix like bananas and motor oil under the best
    of conditions. - tim
    +1 for QOTW -- and its been a good week ...

    Laura
  • Erik Max Francis at Feb 6, 2003 at 11:38 pm

    Laura Creighton wrote:

    +1 for QOTW -- and its been a good week ...
    How come nobody was making QOTW recommendations when I was doing the
    Python-URL! digests? :-)

    --
    Erik Max Francis / max at alcyone.com / http://www.alcyone.com/max/
    __ San Jose, CA, USA / 37 20 N 121 53 W / &tSftDotIotE
    / \ War is like love, it always finds a way.
    \__/ Bertolt Brecht
    Bosskey.net: Return to Wolfenstein / http://www.bosskey.net/rtcw/
    A personal guide to Return to Castle Wolfenstein.
  • Laura Creighton at Feb 7, 2003 at 8:10 am

    Laura Creighton wrote:
    +1 for QOTW -- and its been a good week ...
    How come nobody was making QOTW recommendations when I was doing the
    Python-URL! digests? :-)
    You were doing such a good job we didn't think you needed any help.

    Laura
  • Mike Meyer at Feb 8, 2003 at 1:56 am

    Laura Creighton <lac at strakt.com> writes:
    How come nobody was making QOTW recommendations when I was doing the
    Python-URL! digests? :-)
    You were doing such a good job we didn't think you needed any help.
    Hey!

    <mike
    --
    Mike Meyer <mwm at mired.org> http://www.mired.org/home/mwm/
    Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
  • Gernot Hillier at Feb 6, 2003 at 9:38 pm
    Hi!

    Tim Peters wrote:
    [Gernot Hillier]
    (gdb) bt
    #0 0x4026cea9 in sigsuspend () from /lib/libc.so.6
    #1 0x4003bd48 in __pthread_wait_for_restart_signal () from
    /lib/libpthread.so.0
    That's enough. If the it never gets a restart signal, it will stay there
    forever. Whether it does get a restart signal is out of Python's hands,
    though -- that's up to the pthreads implementation.
    Anything I can do to see if this signal arrives?

    Any nice gdb magic to get closer to the solution?

    But as this also seems to be a race condition (won't happen everytime, seems
    to be dependant on the machine speed, doesn't occur on any machine, ...), I
    doubt it will occur when running in gdb at all. *sigh*
    (gdb) print *((pthread_lock*) interpreter_lock)
    $7 = {locked = 0 '\0', lock_released = {__c_lock = {__status = 0,
    __spinlock = 0}, __c_waiting = 0x0}, mut = {__m_reserved = 0,
    __m_count = 0,
    __m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0, __spinlock =
    0}}}

    So the lock is indeed not locked!! I don't understand this at all.
    "The lock" is ambiguous. There's the pthreads mutex ("mut" in the above),
    and there's the Python GIL (implemented by that entire data structure).
    As Jeremy said, it's normal for mut to be unlocked during a condvar wait.
    I referred to interpreter_lock->locked.

    What I wanted to say: when it blocks at cond_wait_signal() but "locked"
    being 0 at the same time it can't be that I've forgetten a ReleaseLock()
    anywhere in my source code so it can't be my fault, can it?
    Whether the GIL is locked is really irrelevant, because your stack trace
    shows that it's in the bowels of the platform condvar wait implementation,
    presumably waiting for a signal it's never going to get.
    Just want to be sure it can't be a problem of my embedding/extending code...
    It occurs on GNU/Linux using pthreads from glibc 2.2.5. I'm using Python
    2.2.1 but can't see that 2.2.2 will improve this somehow...
    Trying Python 2.3a1 might. The GIL under Linux is implemented via POSIX
    semaphores in 2.3, instead of via a condvar+mutex pair.
    Hmmm... No real solution for me as this program must run on the Python
    stable tree. But anyway surely worth a try. Thx for the suggestion...
    Then let me ask you an odd question: are you using fork()? If so, move
    heaven and earth to get rid of it. Over a year ago a number of people
    [...]

    No, I'm not using fork(). Only threads via the CommonC++ via libpthreads.
    If you're not using fork(), I have no ideas other than to try a different
    OS,
    I doubt that this will be an option for me (look at my mail address) ;-))
    or move to Python 2.3a1 and hope the same bug doesn't plague your
    platform semaphore implementation.
    Hmmm... I'll firstly try to get rid of the CommonC++ library and implement
    my threads myself using pthreads. I already had some very odd bugs which
    nailed down to only occur when using the CommonC++ library and which
    disappeared when I used libpthreads directly.

    So I'll give that a try in the next hours/days (let's see)...
    all-oses-are-buggy-ly y'rs - tim
    :)

    --
    Ciao,

    Gernot
  • Gernot Hillier at Feb 7, 2003 at 4:52 pm
    Hi!

    Gernot Hillier wrote:
    or move to Python 2.3a1 and hope the same bug doesn't plague your
    platform semaphore implementation.
    Hmmm... I'll firstly try to get rid of the CommonC++ library and implement
    my threads myself using pthreads. I already had some very odd bugs which
    nailed down to only occur when using the CommonC++ library and which
    disappeared when I used libpthreads directly.
    Ok, done that. Doesn't help either. :-((

    --
    Ciao,

    Gernot
  • O maj at Feb 7, 2003 at 1:50 am
    Is python ideal for beginners?
    Well, Pedro you do have a point, although you should try and change
    your method of presentation. It does look like a cheap marketing
    gimmick, and i am sure it put lots of people off reading it.

    Okay, 2-3 years ago, i tried to study python as a first language,
    because i went to Eric Raymonds site and basically he recommended it
    for beginers.
    Initially, i did have problems trying to get round it, because i
    couldnt find many basic examples. But because i was determined to
    learn it, i simply
    borrowed a book of basic algorithms and studied them. I then
    transfered this
    knowledge to python.

    I also found it very easy then to pick up vb, vbscript and later c. I
    doubt i would have found it easy to pick up other languages had i
    started with vb.

    Python is ideal for the determined beginner, but is a poor choice for
    an unmotivated person who wants to be spoon fed.
  • Michael Hudson at Feb 7, 2003 at 12:38 pm

    omarmaj at talk21.com (o maj) writes:

    Python is ideal for the determined beginner, but is a poor choice
    for an unmotivated person who wants to be spoon fed.
    What is a good choice for such a person? The OP seemed to suggest C,
    which is quite funny :-)

    Cheers,
    M.

    --
    Emacs is a fashion statement.
    No, Gnus is a fashion statement. Emacs is clothing. Everyone
    else is running around naked.
    -- Karl Kleinpaste & Jonadab the Unsightly One, gnu.emacs.gnus
  • Carlos Ribeiro at Feb 7, 2003 at 8:45 pm

    On Friday 07 February 2003 12:38 pm, Michael Hudson wrote:
    omarmaj at talk21.com (o maj) writes:
    Python is ideal for the determined beginner, but is a poor choice
    for an unmotivated person who wants to be spoon fed.
    What is a good choice for such a person? The OP seemed to suggest C,
    which is quite funny :-)
    Prozac :-)


    Carlos Ribeiro
    cribeiro at mail.inet.com.br
  • Jimmy Retzlaff at Feb 7, 2003 at 2:28 am

    o maj [omarmaj at talk21.com] wrote:

    Python is ideal for the determined beginner, but is a poor choice
    for an unmotivated person who wants to be spoon fed.
    I'm not sure an "unmotivated person who wants to be spoon fed" is going
    to develop into a very effective programmer anyway. In my estimation,
    programming at any non-trivial level takes a fair amount of persistence
    and initiative.

    Jimmy
  • Laura Creighton at Feb 7, 2003 at 8:58 am

    o maj [omarmaj at talk21.com] wrote:
    Python is ideal for the determined beginner, but is a poor choice
    for an unmotivated person who wants to be spoon fed.
    I'm not sure an "unmotivated person who wants to be spoon fed" is going
    to develop into a very effective programmer anyway. In my estimation,
    programming at any non-trivial level takes a fair amount of persistence
    and initiative.

    Jimmy
    Some bright people, who have always had excellent textbooks written for
    their level of experience in the world at hand whenever they have had to
    learn anything, miss out on developing these skills. They have never
    had to work at learning anything ... it always slipped right into their
    brains with no fuss whatsoever. For them, learning something becomes
    only a matter of reading.

    They don't know that this means that the authors of their books
    and their teachers have been doing a truly excellent job at a really
    hard task. Because they are indeed very, very, bright, it is
    not surprising that they believe that the reason they learn well has
    only to do with how intelligent they are.

    Universities have to watch for these people. Some of them cannot adapt
    to an environment where they are exposed to people who cannnot teach,
    texbooks which are clear as mud, and a where a failed undergraduate
    <them> is not considered a major exisential tragedy. After first term
    exams, a significant subset of them jump off bridges and the like.

    They could learn to program just fine. But first they have to learn how
    to learn.

    Laura
  • Arthur at Feb 7, 2003 at 1:29 pm

    Python is ideal for the determined beginner, but is a poor choice
    for an unmotivated person who wants to be spoon fed.
    I'm not sure an "unmotivated person who wants to be spoon fed" is going
    to develop into a very effective programmer anyway. In my estimation,
    programming at any non-trivial level takes a fair amount of persistence
    and initiative.
    I think both points are quite true.

    Which is why I found it so bizarre that there was a period of time here in
    Python land where some of the most trivial of the challenges faced by anyone
    approaching programming via Python were being given so much attention - and
    which so much solemnity. Having felt that is one area that I could speak to
    with some authority - and knowing quite well that I faced some real
    challenges, but they were miles away from the issues on which there was so
    much focus. Having developed some concrete ideas - from my experience - of
    what might be done to tweak things to improve the Python learning experience
    for those approaching it with persistance and initiative - well the time
    seems to have past where that is a subject of much interest.

    Art
  • Gernot Hillier at Feb 28, 2003 at 1:39 pm
    Hi!

    JFYI: I once reported a problem where one of my Python threads freezes at
    pthread_cond_wait in GNU/Linux although the Python lock is not occupied:

    http://groups.google.com/groups?dq=&hl=de&lr=&ie=UTF-8&selm=b1tqqs%24q4e%241%40Fourier.suse.de

    I've found the reason now: it was my fault: I created multiple threads as
    root and called os.setuid() in ONE thread only.

    Now this thread has another UID than the others which isn't allowed in
    pthreads.

    This leads to pthread_cond_signal not working because it uses kill()
    internally to weak up the other thread. But as a non-privileged thread
    can't kill() a root-thread, this failed. Unfortunately, there's no error
    checking for this case done in pthreads so it just silently looses the
    pthread_cond_signal event.

    Just removing os.setuid() did the trick.

    You're only allowed to setuid() a whole process including all threads...

    Learned something the hard way :-}

    --
    Ciao,

    Gernot

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedFeb 6, '03 at 2:16p
activeFeb 28, '03 at 1:39p
posts17
users11
websitepython.org

People

Translate

site design / logo © 2023 Grokbase