FAQ
Hi All,

I have a Python program that goes up to 100% CPU. Just like this (top):

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU
COMMAND
80212 user1 2 44 0 70520K 16212K select 1 0:30 100.00%
/usr/local/bin/python process_updates_ss_od.py -l 10

I have added extra logs and it turns out that there are two threads. One
thread is calling "time.sleep()" and the other is calling "os.stat"
call. (Actually it is calling os.path.isfile, but I hunted down the last
link in the chain.) The most interesting thing is that the process is in
"SELECT" state. As far as I know, CPU load should be 0% because "select"
state should block program execution until the I/O completes.

I must also tell you that the os.stat call is taking long because this
system has about 7 million files on a slow disk. It would be normal for
an os.stat call to return after 10 seconds. I have no problem with that.
But I think that the 100% CPU is not acceptable. I guess that the code
is running in kernel mode. I think this because I can send a KILL signal
to it and the state changes to the following:


PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU
COMMAND
80212 user1 2 44 0 70520K 15256K STOP 5 1:27 100.00%
/usr/local/bin/python process_updates_ss_od.py -l 10

So the state of the process changes to "STOP", but the program does not
stop until the os.stat call returns back (sometimes for 30 seconds).

Could it be a problem with the operation system? Is it possible that an
os.stat call requires 100% CPU power from the OS? Or is it a problem
with the Python implementation?

(Unfortunately I cannot give you an example program. Giving an example
would require giving you a slow I/O device with millions of files on it.)

OS version: FreeBSD 8.1-STABLE amd64
Python version: 2.6.6

Thanks,

Laszlo


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Search Discussions

  • Christian Heimes at Mar 21, 2011 at 9:34 am

    Am 21.03.2011 09:32, schrieb Laszlo Nagy:
    I must also tell you that the os.stat call is taking long because this
    system has about 7 million files on a slow disk. It would be normal for
    an os.stat call to return after 10 seconds. I have no problem with that.
    But I think that the 100% CPU is not acceptable. I guess that the code
    is running in kernel mode. I think this because I can send a KILL signal
    to it and the state changes to the following:
    What's your operating system and file system? A better file system or
    system setting may increase your performance a lot. XFS or ext3 / ext4
    with hashed directory index increases the file lookup part from O(n) to
    O(1).
    Could it be a problem with the operation system? Is it possible that an
    os.stat call requires 100% CPU power from the OS? Or is it a problem
    with the Python implementation?
    It's mostly likely a problem with the OS, hardware and/or your
    configuration. Python doesn't come with its own stat() implementation.
    os.stat() just wraps the libc's stat() function. The heavy lifting is
    done inside libc and the kernel.

    Christian
  • Laszlo Nagy at Mar 22, 2011 at 2:33 pm
    I guess that the code
    is running in kernel mode. I think this because I can send a KILL signal
    to it and the state changes to the following:
    What's your operating system and file system? A better file system or
    system setting may increase your performance a lot. XFS or ext3 / ext4
    with hashed directory index increases the file lookup part from O(n) to
    O(1).
    FreeBSD 8.1 and UFS2 + soft updates. Dirhash memory already set to a
    high value:

    vfs.ufs.dirhash_maxmem: 134217728

    It's mostly likely a problem with the OS, hardware and/or your
    configuration. Python doesn't come with its own stat() implementation.
    os.stat() just wraps the libc's stat() function. The heavy lifting is
    done inside libc and the kernel.
    Great, so the I should ask the BSD list about this.

    I'll first try to rewrite the code and replace that some million files
    with a very big hash database (gdbm). I guess that will be faster.

    Thanks.

    Laszlo
  • Martin v. Loewis at Mar 21, 2011 at 9:58 am

    Could it be a problem with the operation system? Is it possible that an
    os.stat call requires 100% CPU power from the OS? Or is it a problem
    with the Python implementation?
    There is a chance that the CPU usage actually comes from the thread
    doing sleep(). If you have a very short sleep time, and a loop around
    it, it may spin in this loop.

    If it's not that, and if it's not any other unrelated application that
    uses CPU that you didn't mention, then chances are high that it's indeed
    the file system code of your operating system that consumes that much
    CPU time.

    Regards,
    Martin
  • Laszlo Nagy at Mar 21, 2011 at 10:12 am

    On 2011-03-21 10:58, Martin v. Loewis wrote:
    Could it be a problem with the operation system? Is it possible that an
    os.stat call requires 100% CPU power from the OS? Or is it a problem
    with the Python implementation?
    There is a chance that the CPU usage actually comes from the thread
    doing sleep(). If you have a very short sleep time, and a loop around
    it, it may spin in this loop.
    Sleep time is 1.0 sec.
    If it's not that, and if it's not any other unrelated application that
    uses CPU that you didn't mention, then chances are high that it's indeed
    the file system code of your operating system that consumes that much
    CPU time.
    Yes. top shows 2 thread in this process, and full 100% CPU time is
    dedicated to the process. So it must be it.

    Thank you,

    Laszlo
  • Nobody at Mar 21, 2011 at 5:44 pm

    On Mon, 21 Mar 2011 09:32:11 +0100, Laszlo Nagy wrote:

    So the state of the process changes to "STOP", but the program does not
    stop until the os.stat call returns back (sometimes for 30 seconds).

    Could it be a problem with the operation system? Is it possible that an
    os.stat call requires 100% CPU power from the OS? Or is it a problem
    with the Python implementation?
    It's the OS kernel. If it was Python or the C library, sending SIGKILL
    would result in immediate termination.

    Is the disk interface operating in PIO mode? A slow disk shouldn't cause
    100% CPU consumption; the OS would just get on with something else (or
    just idle) while waiting for data to become available. But if it's
    having to copy data from the controller one word at a time, that could
    cause it (and would also make the disk appear slow).
  • Laszlo Nagy at Mar 24, 2011 at 3:40 pm

    It's the OS kernel. If it was Python or the C library, sending SIGKILL
    would result in immediate termination.

    Is the disk interface operating in PIO mode? A slow disk shouldn't cause
    100% CPU consumption; the OS would just get on with something else (or
    just idle) while waiting for data to become available. But if it's
    having to copy data from the controller one word at a time, that could
    cause it (and would also make the disk appear slow).
    This is a RAID 1+0 array with 10 hard disks plus a SCSI controller with
    2GB write back cache. I don't think that it has to do anything with disk
    speed. The CPU load goes up to 100% (the disk I/O does not go up that much).

    I'm working about a different storage method. We will be storing these
    logs in a real database instead of separate CSV files. So probably this
    problem will cease.

    Thanks,

    Laszlo
  • Dan Stromberg at Mar 22, 2011 at 11:37 pm

    On Mon, Mar 21, 2011 at 1:32 AM, Laszlo Nagy wrote:
    Hi All,

    I have a Python program that goes up to 100% CPU. Just like this (top):

    PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU
    COMMAND
    80212 user1 2 44 0 70520K 16212K select 1 0:30 100.00%
    /usr/local/bin/python process_updates_ss_od.py -l 10

    I have added extra logs and it turns out that there are two threads. One
    thread is calling "time.sleep()" and the other is calling "os.stat" call.
    (Actually it is calling os.path.isfile, but I hunted down the last link in
    the chain.) The most interesting thing is that the process is in "SELECT"
    state. As far as I know, CPU load should be 0% because "select" state should
    block program execution until the I/O completes.

    I must also tell you that the os.stat call is taking long because this
    system has about 7 million files on a slow disk. It would be normal for an
    os.stat call to return after 10 seconds. I have no problem with that. But I
    think that the 100% CPU is not acceptable. I guess that the code is running
    in kernel mode. I think this because I can send a KILL signal to it and the
    state changes to the following:


    PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU
    COMMAND
    80212 user1 2 44 0 70520K 15256K STOP 5 1:27 100.00%
    /usr/local/bin/python process_updates_ss_od.py -l 10

    So the state of the process changes to "STOP", but the program does not
    stop until the os.stat call returns back (sometimes for 30 seconds).

    Could it be a problem with the operation system? Is it possible that an
    os.stat call requires 100% CPU power from the OS? Or is it a problem with
    the Python implementation?

    (Unfortunately I cannot give you an example program. Giving an example
    would require giving you a slow I/O device with millions of files on it.)

    OS version: FreeBSD 8.1-STABLE amd64
    Python version: 2.6.6

    Thanks,

    Laszlo <http://mail.python.org/mailman/listinfo/python-list>

    1) Run it under the "time" command, to break down the CPU use like the
    following (on Ubuntu in this example, but your results may resemble these).
    User time is from userspace (the python interpreter: time spent on your
    code, including the Python standard library and C library), sys is the time
    spent in the kernel related to your process. Real is the wall-clock time.
    If the kernel time is high, look into using a database or using a filesystem
    (tweak) that supports large directories well. If the userspace time is
    high, scrutinize the code in more detail:

    $ time python -c 'for i in xrange(100000): pass'
    cmd started 2011 Tue Mar 22 04:26:20 PM

    real 0m0.132s
    user 0m0.012s
    sys 0m0.004s

    2) Does FreeBSD's top command have an option to report on distinct threads
    of a process individually? Some top's do, but I'm not confident all of them
    do.

    3) Does the code run on Pypy? If it does, it might be a lot faster. The
    difference can be pretty dramatic sometimes.

    4) Profile it using something like profile or cProfile. Sometimes the
    issues so identified can be surprising. This should tell you which part of
    the code is consuming the most time.

    5) If the process is staying in select state, then it's probably making
    heavy use of the select syscall - conceivably more use of select than stat.
    If there's a select in the code with a small timeout inside a loop, you
    might check that over.
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/python-list/attachments/20110322/06cbd755/attachment.html>

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedMar 21, '11 at 8:32a
activeMar 24, '11 at 3:40p
posts8
users5
websitepython.org

People

Translate

site design / logo © 2022 Grokbase