FAQ
Hi everyone,

We're having some problems with nginx worker process that takes up too
much CPU on vcap router.
Only component running on the VM is the router. After deployment,
nginx worker process starts using more and more CPU, until reaching
100%. After killing the worker process, everything returns to normal.
Component is set up using vcap_dev_setup.

Any idea what might cause this high CPU usage and how to fix it?
Thanks

Search Discussions

  • Kwon-Han Bae at Jun 10, 2012 at 12:19 pm
    what's you nginx version?

    upgrade it to the lastest.



    2012/6/10 Florin Dragos <florin.dragos@gmail.com>
    Hi everyone,

    We're having some problems with nginx worker process that takes up too
    much CPU on vcap router.
    Only component running on the VM is the router. After deployment,
    nginx worker process starts using more and more CPU, until reaching
    100%. After killing the worker process, everything returns to normal.
    Component is set up using vcap_dev_setup.

    Any idea what might cause this high CPU usage and how to fix it?
    Thanks


    --
    배권한
    KwonHan Bae
    Kris Bae
    http://iz4u.net/blog
    linux, python, php, ruby developer
  • Yongkun Anfernee Gui at Jun 10, 2012 at 4:42 pm
    Hi,

    Can that be reliably reproduced? I never saw that. Is it 100% reproducable
    even without serving any requests?

    Could you check the nginx logs under $DEPLOY/devbox/log/nginx.*.log?
    or probably run strace to get the syscall which nginx is spending cpu
    time on.

    thanks,
    anfernee
    On Sun, Jun 10, 2012 at 8:01 PM, Florin Dragos wrote:

    Hi everyone,

    We're having some problems with nginx worker process that takes up too
    much CPU on vcap router.
    Only component running on the VM is the router. After deployment,
    nginx worker process starts using more and more CPU, until reaching
    100%. After killing the worker process, everything returns to normal.
    Component is set up using vcap_dev_setup.

    Any idea what might cause this high CPU usage and how to fix it?
    Thanks
  • Florin Dragos at Jun 12, 2012 at 7:09 am
    Nginx version is 1.2.0.
    It reproduces only if it is serving requests.
    I ran strace while the server was being accessed. During this time CPU
    reached even 100%.
    This is the output:

    % time seconds usecs/call calls errors syscall
    ------ ----------- ----------- --------- --------- ----------------
    80.44 0.105620 4 26281 brk
    5.64 0.007410 0 53047 writev
    4.73 0.006208 0 55666 epoll_wait
    3.96 0.005198 0 23636 mremap
    2.07 0.002716 0 35218 17609 connect
    1.74 0.002289 2 1128 munmap
    0.29 0.000385 0 17410 sendto
    0.28 0.000374 0 35606 close
    0.28 0.000371 0 17611 write
    0.26 0.000336 0 136365 65479 recvfrom
    0.17 0.000217 0 35218 socket
    0.04 0.000056 0 36006 epoll_ctl
    0.03 0.000040 0 17609 241 readv
    0.03 0.000035 0 35218 getsockopt
    0.02 0.000020 0 1128 mmap
    0.02 0.000020 0 35218 ioctl
    0.00 0.000000 0 1 open
    0.00 0.000000 0 31 pwrite
    0.00 0.000000 0 6 sendfile
    0.00 0.000000 0 389 shutdown
    0.00 0.000000 0 395 setsockopt
    0.00 0.000000 0 1 unlink
    0.00 0.000000 0 394 accept4
    ------ ----------- ----------- --------- --------- ----------------
    100.00 0.131295 563582 83329 total


    On Jun 10, 7:42 pm, Yongkun Anfernee Gui wrote:
    Hi,

    Can that be reliably reproduced? I never saw that. Is it 100% reproducable
    even without serving any requests?

    Could you check the nginx logs under $DEPLOY/devbox/log/nginx.*.log?
    or probably run strace to get the syscall which nginx is spending cpu
    time on.

    thanks,
    anfernee

    On Sun, Jun 10, 2012 at 8:01 PM, Florin Dragos wrote:






    Hi everyone,
    We're having some problems with nginx worker process that takes up too
    much CPU on vcap router.
    Only component running on the VM is the router. After deployment,
    nginx worker process starts using more and more CPU, until reaching
    100%. After killing the worker process, everything returns to normal.
    Component is set up using vcap_dev_setup.
    Any idea what might cause this high CPU usage and how to fix it?
    Thanks
  • Yongkun Anfernee Gui at Jun 12, 2012 at 7:45 am
    First thing, we are officially using nginx 0.8.54 in cloud foundry, though
    I think
    1.2.0 should work the same.

    Next, other than upgrading nginx, did you other special things, like
    changing the
    nginx config file, etc?

    Next, what is the result of the following: uname -a, nginx -V, lsb_release?
    I know it
    works very well on ubuntu 10.04, x86_64/i686.

    Your requests failed or became slow when cpu went to 100%? Is there
    anything
    abnormal in nginx access log and error log? Can you try a simple config, to
    isolate
    the nginx problem?

    Thanks,
    Anfernee

    On Tue, Jun 12, 2012 at 3:08 PM, Florin Dragos wrote:

    Nginx version is 1.2.0.
    It reproduces only if it is serving requests.
    I ran strace while the server was being accessed. During this time CPU
    reached even 100%.
    This is the output:

    % time seconds usecs/call calls errors syscall
    ------ ----------- ----------- --------- --------- ----------------
    80.44 0.105620 4 26281 brk
    5.64 0.007410 0 53047 writev
    4.73 0.006208 0 55666 epoll_wait
    3.96 0.005198 0 23636 mremap
    2.07 0.002716 0 35218 17609 connect
    1.74 0.002289 2 1128 munmap
    0.29 0.000385 0 17410 sendto
    0.28 0.000374 0 35606 close
    0.28 0.000371 0 17611 write
    0.26 0.000336 0 136365 65479 recvfrom
    0.17 0.000217 0 35218 socket
    0.04 0.000056 0 36006 epoll_ctl
    0.03 0.000040 0 17609 241 readv
    0.03 0.000035 0 35218 getsockopt
    0.02 0.000020 0 1128 mmap
    0.02 0.000020 0 35218 ioctl
    0.00 0.000000 0 1 open
    0.00 0.000000 0 31 pwrite
    0.00 0.000000 0 6 sendfile
    0.00 0.000000 0 389 shutdown
    0.00 0.000000 0 395 setsockopt
    0.00 0.000000 0 1 unlink
    0.00 0.000000 0 394 accept4
    ------ ----------- ----------- --------- --------- ----------------
    100.00 0.131295 563582 83329 total


    On Jun 10, 7:42 pm, Yongkun Anfernee Gui wrote:
    Hi,

    Can that be reliably reproduced? I never saw that. Is it 100%
    reproducable
    even without serving any requests?

    Could you check the nginx logs under $DEPLOY/devbox/log/nginx.*.log?
    or probably run strace to get the syscall which nginx is spending cpu
    time on.

    thanks,
    anfernee

    On Sun, Jun 10, 2012 at 8:01 PM, Florin Dragos <florin.dra...@gmail.com
    wrote:






    Hi everyone,
    We're having some problems with nginx worker process that takes up too
    much CPU on vcap router.
    Only component running on the VM is the router. After deployment,
    nginx worker process starts using more and more CPU, until reaching
    100%. After killing the worker process, everything returns to normal.
    Component is set up using vcap_dev_setup.
    Any idea what might cause this high CPU usage and how to fix it?
    Thanks
  • Chunjie Zhu at Jun 12, 2012 at 9:14 am
    From strace output, it seems brk system call consumes most of all CPU resource.

    As we all know, brk system call is called by glibc malloc to allocate heap memory. So, a naive guess is that, the process's heap space runs out, so kernel struggles to revoke memory and do re-allocation.

    Please refer to /proc/<pid>/maps to check the process virtual memory layout and find out whether the heap space runs out when this problem happens again.

    Regards,
    Chunjie

    ----- Original Message -----

    From: "Yongkun Anfernee Gui" <agui@rbcon.com>
    To: vcap-dev@cloudfoundry.org
    Sent: Tuesday, June 12, 2012 3:45:01 PM
    Subject: Re: [vcap-dev] Re: nginx worker process high CPU usage

    First thing, we are officially using nginx 0.8.54 in cloud foundry, though I think
    1.2.0 should work the same.


    Next, other than upgrading nginx, did you other special things, like changing the
    nginx config file, etc?


    Next, what is the result of the following: uname -a, nginx -V, lsb_release? I know it
    works very well on ubuntu 10.04, x86_64/i686.


    Your requests failed or became slow when cpu went to 100%? Is there anything
    abnormal in nginx access log and error log? Can you try a simple config, to isolate
    the nginx problem?


    Thanks,
    Anfernee



    On Tue, Jun 12, 2012 at 3:08 PM, Florin Dragos wrote:


    Nginx version is 1.2.0.
    It reproduces only if it is serving requests.
    I ran strace while the server was being accessed. During this time CPU
    reached even 100%.
    This is the output:

    % time seconds usecs/call calls errors syscall
    ------ ----------- ----------- --------- --------- ----------------
    80.44 0.105620 4 26281 brk
    5.64 0.007410 0 53047 writev
    4.73 0.006208 0 55666 epoll_wait
    3.96 0.005198 0 23636 mremap
    2.07 0.002716 0 35218 17609 connect
    1.74 0.002289 2 1128 munmap
    0.29 0.000385 0 17410 sendto
    0.28 0.000374 0 35606 close
    0.28 0.000371 0 17611 write
    0.26 0.000336 0 136365 65479 recvfrom
    0.17 0.000217 0 35218 socket
    0.04 0.000056 0 36006 epoll_ctl
    0.03 0.000040 0 17609 241 readv
    0.03 0.000035 0 35218 getsockopt
    0.02 0.000020 0 1128 mmap
    0.02 0.000020 0 35218 ioctl
    0.00 0.000000 0 1 open
    0.00 0.000000 0 31 pwrite
    0.00 0.000000 0 6 sendfile
    0.00 0.000000 0 389 shutdown
    0.00 0.000000 0 395 setsockopt
    0.00 0.000000 0 1 unlink
    0.00 0.000000 0 394 accept4
    ------ ----------- ----------- --------- --------- ----------------
    100.00 0.131295 563582 83329 total



    On Jun 10, 7:42 pm, Yongkun Anfernee Gui wrote:
    Hi,

    Can that be reliably reproduced? I never saw that. Is it 100% reproducable
    even without serving any requests?

    Could you check the nginx logs under $DEPLOY/devbox/log/nginx.*.log?
    or probably run strace to get the syscall which nginx is spending cpu
    time on.

    thanks,
    anfernee

    On Sun, Jun 10, 2012 at 8:01 PM, Florin Dragos wrote:





    Hi everyone,
    We're having some problems with nginx worker process that takes up too
    much CPU on vcap router.
    Only component running on the VM is the router. After deployment,
    nginx worker process starts using more and more CPU, until reaching
    100%. After killing the worker process, everything returns to normal.
    Component is set up using vcap_dev_setup.
    Any idea what might cause this high CPU usage and how to fix it?
    Thanks
  • Yongkun Anfernee Gui at Jun 12, 2012 at 9:36 am
    also, can you try google-perftools for more detailed profiling of nginx.

    - anfernee
    On Tue, Jun 12, 2012 at 5:14 PM, Chunjie Zhu wrote:

    From strace output, it seems brk system call consumes most of all CPU
    resource.

    As we all know, brk system call is called by glibc malloc to allocate heap
    memory. So, a naive guess is that, the process's heap space runs out, so
    kernel struggles to revoke memory and do re-allocation.

    Please refer to /proc/<pid>/maps to check the process virtual memory
    layout and find out whether the heap space runs out when this problem
    happens again.

    Regards,
    Chunjie

    ------------------------------
    *From: *"Yongkun Anfernee Gui" <agui@rbcon.com>
    *To: *vcap-dev@cloudfoundry.org
    *Sent: *Tuesday, June 12, 2012 3:45:01 PM
    *Subject: *Re: [vcap-dev] Re: nginx worker process high CPU usage


    First thing, we are officially using nginx 0.8.54 in cloud foundry, though
    I think
    1.2.0 should work the same.

    Next, other than upgrading nginx, did you other special things, like
    changing the
    nginx config file, etc?

    Next, what is the result of the following: uname -a, nginx -V,
    lsb_release? I know it
    works very well on ubuntu 10.04, x86_64/i686.

    Your requests failed or became slow when cpu went to 100%? Is there
    anything
    abnormal in nginx access log and error log? Can you try a simple config,
    to isolate
    the nginx problem?

    Thanks,
    Anfernee

    On Tue, Jun 12, 2012 at 3:08 PM, Florin Dragos wrote:

    Nginx version is 1.2.0.
    It reproduces only if it is serving requests.
    I ran strace while the server was being accessed. During this time CPU
    reached even 100%.
    This is the output:

    % time seconds usecs/call calls errors syscall
    ------ ----------- ----------- --------- --------- ----------------
    80.44 0.105620 4 26281 brk
    5.64 0.007410 0 53047 writev
    4.73 0.006208 0 55666 epoll_wait
    3.96 0.005198 0 23636 mremap
    2.07 0.002716 0 35218 17609 connect
    1.74 0.002289 2 1128 munmap
    0.29 0.000385 0 17410 sendto
    0.28 0.000374 0 35606 close
    0.28 0.000371 0 17611 write
    0.26 0.000336 0 136365 65479 recvfrom
    0.17 0.000217 0 35218 socket
    0.04 0.000056 0 36006 epoll_ctl
    0.03 0.000040 0 17609 241 readv
    0.03 0.000035 0 35218 getsockopt
    0.02 0.000020 0 1128 mmap
    0.02 0.000020 0 35218 ioctl
    0.00 0.000000 0 1 open
    0.00 0.000000 0 31 pwrite
    0.00 0.000000 0 6 sendfile
    0.00 0.000000 0 389 shutdown
    0.00 0.000000 0 395 setsockopt
    0.00 0.000000 0 1 unlink
    0.00 0.000000 0 394 accept4
    ------ ----------- ----------- --------- --------- ----------------
    100.00 0.131295 563582 83329 total


    On Jun 10, 7:42 pm, Yongkun Anfernee Gui wrote:
    Hi,

    Can that be reliably reproduced? I never saw that. Is it 100%
    reproducable
    even without serving any requests?

    Could you check the nginx logs under $DEPLOY/devbox/log/nginx.*.log?
    or probably run strace to get the syscall which nginx is spending cpu
    time on.

    thanks,
    anfernee

    On Sun, Jun 10, 2012 at 8:01 PM, Florin Dragos <florin.dra...@gmail.com
    wrote:






    Hi everyone,
    We're having some problems with nginx worker process that takes up too
    much CPU on vcap router.
    Only component running on the VM is the router. After deployment,
    nginx worker process starts using more and more CPU, until reaching
    100%. After killing the worker process, everything returns to normal.
    Component is set up using vcap_dev_setup.
    Any idea what might cause this high CPU usage and how to fix it?
    Thanks
  • Florin Dragos at Jun 12, 2012 at 3:07 pm
    I'm not sure how to interpret /proc/<pid>/maps
    The output right now is

    01ac2000-023f7000 rw-p 00000000 00:00
    0 [heap]

    before doing server requests it stayed at 01ac2000-01e7c000, while
    serving requests, second number keeps changing.

    the requested outputs:

    uname -a: Linux test-ubuntu1 2.6.32-33-server #70-Ubuntu SMP Thu Jul 7
    22:28:30 UTC 2011 x86_64 GNU/Linux

    nginx -V: nginx version: nginx/1.2.0
    built by gcc 4.4.3 (Ubuntu 4.4.3-4ubuntu5.1)
    configure arguments: --prefix=/home/cfuser/.deployments/deployment/
    deploy/nginx/nginx-1.2.0 --with-pcre=../pcre-8.21 --add-module=../
    nginx_upload_module-2.2.0 --add-module=../agentzh-headers-more-nginx-
    module-5fac223 --add-module=../simpl-ngx_devel_kit-bc97eea --add-
    module=../chaoslawful-lua-nginx-module-204ce2b

    lsb_release: No LSB modules are available.

    On Jun 12, 12:36 pm, Yongkun Anfernee Gui wrote:
    also, can you try google-perftools for more detailed profiling of nginx.

    - anfernee






    On Tue, Jun 12, 2012 at 5:14 PM, Chunjie Zhu wrote:
    From strace output, it seems brk system call consumes most of all CPU
    resource.
    As we all know, brk system call is called by glibc malloc to allocate heap
    memory. So, a naive guess is that, the process's heap space runs out, so
    kernel struggles to revoke memory and do re-allocation.
    Please refer to /proc/<pid>/maps to check the process virtual memory
    layout and find out whether the heap space runs out when this problem
    happens again.
    Regards,
    Chunjie
    ------------------------------
    *From: *"Yongkun Anfernee Gui" <a...@rbcon.com>
    *To: *vcap-...@cloudfoundry.org
    *Sent: *Tuesday, June 12, 2012 3:45:01 PM
    *Subject: *Re: [vcap-dev] Re: nginx worker process high CPU usage
    First thing, we are officially using nginx 0.8.54 in cloud foundry, though
    I think
    1.2.0 should work the same.
    Next, other than upgrading nginx, did you other special things, like
    changing the
    nginx config file, etc?
    Next, what is the result of the following: uname -a, nginx -V,
    lsb_release? I know it
    works very well on ubuntu 10.04, x86_64/i686.
    Your requests failed or became slow when cpu went to 100%? Is there
    anything
    abnormal in nginx access log and error log? Can you try a simple config,
    to isolate
    the nginx problem?
    Thanks,
    Anfernee
    On Tue, Jun 12, 2012 at 3:08 PM, Florin Dragos wrote:

    Nginx version is 1.2.0.
    It reproduces only if it is serving requests.
    I ran strace while the server was being accessed. During this time CPU
    reached even 100%.
    This is the output:
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
    80.44    0.105620           4     26281           brk
    5.64    0.007410           0     53047           writev
    4.73    0.006208           0     55666           epoll_wait
    3.96    0.005198           0     23636           mremap
    2.07    0.002716           0     35218     17609 connect
    1.74    0.002289           2      1128           munmap
    0.29    0.000385           0     17410           sendto
    0.28    0.000374           0     35606           close
    0.28    0.000371           0     17611           write
    0.26    0.000336           0    136365     65479 recvfrom
    0.17    0.000217           0     35218           socket
    0.04    0.000056           0     36006           epoll_ctl
    0.03    0.000040           0     17609       241 readv
    0.03    0.000035           0     35218           getsockopt
    0.02    0.000020           0      1128           mmap
    0.02    0.000020           0     35218           ioctl
    0.00    0.000000           0         1           open
    0.00    0.000000           0        31           pwrite
    0.00    0.000000           0         6           sendfile
    0.00    0.000000           0       389           shutdown
    0.00    0.000000           0       395           setsockopt
    0.00    0.000000           0         1           unlink
    0.00    0.000000           0       394           accept4
    ------ ----------- ----------- --------- --------- ----------------
    100.00    0.131295                563582     83329 total
    On Jun 10, 7:42 pm, Yongkun Anfernee Gui wrote:
    Hi,
    Can that be reliably reproduced? I never saw that. Is it 100%
    reproducable
    even without serving any requests?
    Could you check the nginx logs under $DEPLOY/devbox/log/nginx.*.log?
    or probably run strace to get the syscall which nginx is spending cpu
    time on.
    thanks,
    anfernee
    On Sun, Jun 10, 2012 at 8:01 PM, Florin Dragos <florin.dra...@gmail.com
    wrote:
    Hi everyone,
    We're having some problems with nginx worker process that takes up too
    much CPU on vcap router.
    Only component running on the VM is the router. After deployment,
    nginx worker process starts using more and more CPU, until reaching
    100%. After killing the worker process, everything returns to normal.
    Component is set up using vcap_dev_setup.
    Any idea what might cause this high CPU usage and how to fix it?
    Thanks
  • Florin Dragos at Jun 12, 2012 at 7:46 pm
    Tthere was a warning when running nginx -t:

    [warn] 2048 worker_connections exceed open
    file resource limit: 1024

    Reducing worker_connections to 1024, seems to solve the issue. At
    least for now, CPU seems stable.
    On Jun 12, 6:07 pm, Florin Dragos wrote:
    I'm not sure how to interpret /proc/<pid>/maps
    The output right now is

    01ac2000-023f7000 rw-p 00000000 00:00
    0                                  [heap]

    before doing server requests it stayed at 01ac2000-01e7c000, while
    serving requests, second number keeps changing.

    the requested outputs:

    uname -a: Linux test-ubuntu1 2.6.32-33-server #70-Ubuntu SMP Thu Jul 7
    22:28:30 UTC 2011 x86_64 GNU/Linux

    nginx -V: nginx version: nginx/1.2.0
    built by gcc 4.4.3 (Ubuntu 4.4.3-4ubuntu5.1)
    configure arguments: --prefix=/home/cfuser/.deployments/deployment/
    deploy/nginx/nginx-1.2.0 --with-pcre=../pcre-8.21 --add-module=../
    nginx_upload_module-2.2.0 --add-module=../agentzh-headers-more-nginx-
    module-5fac223 --add-module=../simpl-ngx_devel_kit-bc97eea --add-
    module=../chaoslawful-lua-nginx-module-204ce2b

    lsb_release: No LSB modules are available.

    On Jun 12, 12:36 pm, Yongkun Anfernee Gui wrote:






    also, can you try google-perftools for more detailed profiling of nginx.
    - anfernee
    On Tue, Jun 12, 2012 at 5:14 PM, Chunjie Zhu wrote:
    From strace output, it seems brk system call consumes most of all CPU
    resource.
    As we all know, brk system call is called by glibc malloc to allocate heap
    memory. So, a naive guess is that, the process's heap space runs out, so
    kernel struggles to revoke memory and do re-allocation.
    Please refer to /proc/<pid>/maps to check the process virtual memory
    layout and find out whether the heap space runs out when this problem
    happens again.
    Regards,
    Chunjie
    ------------------------------
    *From: *"Yongkun Anfernee Gui" <a...@rbcon.com>
    *To: *vcap-...@cloudfoundry.org
    *Sent: *Tuesday, June 12, 2012 3:45:01 PM
    *Subject: *Re: [vcap-dev] Re: nginx worker process high CPU usage
    First thing, we are officially using nginx 0.8.54 in cloud foundry, though
    I think
    1.2.0 should work the same.
    Next, other than upgrading nginx, did you other special things, like
    changing the
    nginx config file, etc?
    Next, what is the result of the following: uname -a, nginx -V,
    lsb_release? I know it
    works very well on ubuntu 10.04, x86_64/i686.
    Your requests failed or became slow when cpu went to 100%? Is there
    anything
    abnormal in nginx access log and error log? Can you try a simple config,
    to isolate
    the nginx problem?
    Thanks,
    Anfernee
    On Tue, Jun 12, 2012 at 3:08 PM, Florin Dragos wrote:

    Nginx version is 1.2.0.
    It reproduces only if it is serving requests.
    I ran strace while the server was being accessed. During this time CPU
    reached even 100%.
    This is the output:
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
    80.44    0.105620           4     26281           brk
    5.64    0.007410           0     53047           writev
    4.73    0.006208           0     55666           epoll_wait
    3.96    0.005198           0     23636           mremap
    2.07    0.002716           0     35218     17609 connect
    1.74    0.002289           2      1128           munmap
    0.29    0.000385           0     17410           sendto
    0.28    0.000374           0     35606           close
    0.28    0.000371           0     17611           write
    0.26    0.000336           0    136365     65479 recvfrom
    0.17    0.000217           0     35218           socket
    0.04    0.000056           0     36006           epoll_ctl
    0.03    0.000040           0     17609       241 readv
    0.03    0.000035           0     35218           getsockopt
    0.02    0.000020           0      1128           mmap
    0.02    0.000020           0     35218           ioctl
    0.00    0.000000           0         1           open
    0.00    0.000000           0        31           pwrite
    0.00    0.000000           0         6           sendfile
    0.00    0.000000           0       389           shutdown
    0.00    0.000000           0       395           setsockopt
    0.00    0.000000           0         1           unlink
    0.00    0.000000           0       394           accept4
    ------ ----------- ----------- --------- --------- ----------------
    100.00    0.131295                563582     83329 total
    On Jun 10, 7:42 pm, Yongkun Anfernee Gui wrote:
    Hi,
    Can that be reliably reproduced? I never saw that. Is it 100%
    reproducable
    even without serving any requests?
    Could you check the nginx logs under $DEPLOY/devbox/log/nginx.*.log?
    or probably run strace to get the syscall which nginx is spending cpu
    time on.
    thanks,
    anfernee
    On Sun, Jun 10, 2012 at 8:01 PM, Florin Dragos <florin.dra...@gmail.com
    wrote:
    Hi everyone,
    We're having some problems with nginx worker process that takes up too
    much CPU on vcap router.
    Only component running on the VM is the router. After deployment,
    nginx worker process starts using more and more CPU, until reaching
    100%. After killing the worker process, everything returns to normal.
    Component is set up using vcap_dev_setup.
    Any idea what might cause this high CPU usage and how to fix it?
    Thanks
  • Yongkun Anfernee Gui at Jun 13, 2012 at 12:23 am
    Glad your problem is fixed.

    FYI:
    adding this to nginx.conf wil increase the number of open file in worker:

    worker_rlimit_nofile 2048;

    Thanks,
    Anfernee
    On Wed, Jun 13, 2012 at 3:46 AM, Florin Dragos wrote:

    Tthere was a warning when running nginx -t:

    [warn] 2048 worker_connections exceed open
    file resource limit: 1024

    Reducing worker_connections to 1024, seems to solve the issue. At
    least for now, CPU seems stable.
    On Jun 12, 6:07 pm, Florin Dragos wrote:
    I'm not sure how to interpret /proc/<pid>/maps
    The output right now is

    01ac2000-023f7000 rw-p 00000000 00:00
    0 [heap]

    before doing server requests it stayed at 01ac2000-01e7c000, while
    serving requests, second number keeps changing.

    the requested outputs:

    uname -a: Linux test-ubuntu1 2.6.32-33-server #70-Ubuntu SMP Thu Jul 7
    22:28:30 UTC 2011 x86_64 GNU/Linux

    nginx -V: nginx version: nginx/1.2.0
    built by gcc 4.4.3 (Ubuntu 4.4.3-4ubuntu5.1)
    configure arguments: --prefix=/home/cfuser/.deployments/deployment/
    deploy/nginx/nginx-1.2.0 --with-pcre=../pcre-8.21 --add-module=../
    nginx_upload_module-2.2.0 --add-module=../agentzh-headers-more-nginx-
    module-5fac223 --add-module=../simpl-ngx_devel_kit-bc97eea --add-
    module=../chaoslawful-lua-nginx-module-204ce2b

    lsb_release: No LSB modules are available.

    On Jun 12, 12:36 pm, Yongkun Anfernee Gui wrote:






    also, can you try google-perftools for more detailed profiling of
    nginx.
    - anfernee
    On Tue, Jun 12, 2012 at 5:14 PM, Chunjie Zhu wrote:
    From strace output, it seems brk system call consumes most of all CPU
    resource.
    As we all know, brk system call is called by glibc malloc to
    allocate heap
    memory. So, a naive guess is that, the process's heap space runs
    out, so
    kernel struggles to revoke memory and do re-allocation.
    Please refer to /proc/<pid>/maps to check the process virtual memory
    layout and find out whether the heap space runs out when this problem
    happens again.
    Regards,
    Chunjie
    ------------------------------
    *From: *"Yongkun Anfernee Gui" <a...@rbcon.com>
    *To: *vcap-...@cloudfoundry.org
    *Sent: *Tuesday, June 12, 2012 3:45:01 PM
    *Subject: *Re: [vcap-dev] Re: nginx worker process high CPU usage
    First thing, we are officially using nginx 0.8.54 in cloud foundry,
    though
    I think
    1.2.0 should work the same.
    Next, other than upgrading nginx, did you other special things, like
    changing the
    nginx config file, etc?
    Next, what is the result of the following: uname -a, nginx -V,
    lsb_release? I know it
    works very well on ubuntu 10.04, x86_64/i686.
    Your requests failed or became slow when cpu went to 100%? Is there
    anything
    abnormal in nginx access log and error log? Can you try a simple
    config,
    to isolate
    the nginx problem?
    Thanks,
    Anfernee
    On Tue, Jun 12, 2012 at 3:08 PM, Florin Dragos <
    florin.dra...@gmail.com>wrote:
    Nginx version is 1.2.0.
    It reproduces only if it is serving requests.
    I ran strace while the server was being accessed. During this time
    CPU
    reached even 100%.
    This is the output:
    % time seconds usecs/call calls errors syscall
    ------ ----------- ----------- --------- --------- ----------------
    80.44 0.105620 4 26281 brk
    5.64 0.007410 0 53047 writev
    4.73 0.006208 0 55666 epoll_wait
    3.96 0.005198 0 23636 mremap
    2.07 0.002716 0 35218 17609 connect
    1.74 0.002289 2 1128 munmap
    0.29 0.000385 0 17410 sendto
    0.28 0.000374 0 35606 close
    0.28 0.000371 0 17611 write
    0.26 0.000336 0 136365 65479 recvfrom
    0.17 0.000217 0 35218 socket
    0.04 0.000056 0 36006 epoll_ctl
    0.03 0.000040 0 17609 241 readv
    0.03 0.000035 0 35218 getsockopt
    0.02 0.000020 0 1128 mmap
    0.02 0.000020 0 35218 ioctl
    0.00 0.000000 0 1 open
    0.00 0.000000 0 31 pwrite
    0.00 0.000000 0 6 sendfile
    0.00 0.000000 0 389 shutdown
    0.00 0.000000 0 395 setsockopt
    0.00 0.000000 0 1 unlink
    0.00 0.000000 0 394 accept4
    ------ ----------- ----------- --------- --------- ----------------
    100.00 0.131295 563582 83329 total
    On Jun 10, 7:42 pm, Yongkun Anfernee Gui wrote:
    Hi,
    Can that be reliably reproduced? I never saw that. Is it 100%
    reproducable
    even without serving any requests?
    Could you check the nginx logs under
    $DEPLOY/devbox/log/nginx.*.log?
    or probably run strace to get the syscall which nginx is spending
    cpu
    time on.
    thanks,
    anfernee
    On Sun, Jun 10, 2012 at 8:01 PM, Florin Dragos <
    florin.dra...@gmail.com
    wrote:
    Hi everyone,
    We're having some problems with nginx worker process that takes
    up too
    much CPU on vcap router.
    Only component running on the VM is the router. After
    deployment,
    nginx worker process starts using more and more CPU, until
    reaching
    100%. After killing the worker process, everything returns to
    normal.
    Component is set up using vcap_dev_setup.
    Any idea what might cause this high CPU usage and how to fix it?
    Thanks
  • Chunjie Zhu at Jun 13, 2012 at 2:20 am
    Most likely, the default limit for open file descriptors per linux process is 1024, network socket is also taken into account.

    chunjie@ubuntu:~$ ulimit -n
    1024

    However, this value is a soft limit, but not a hard limit. It means, if the soft limit exceeds then kernel will try to do something (refer to linux kernel fs/file.c alloc_fd function), kernel may trap into the loop of "expand fd array -> error -> repeat -> expand fd array". While hard limit or sysctl_nr_open exceeds then error will be returned to user land applications directly.

    So, besides the suggestion from Anfernee, we can also use ulimit to set open fd rlimit (ulimit -n 2048, additional configuration is needed if we want it take effect when system boots up). From the perspective of low-level, they both call setrlimit system call. In normal case, we do not need to touch sysctl_nr_open, because its value is large enough.

    NOTE: The priority for the above limits in linux kernel is, soft limit -> hard limit -> sysctl_nr_open

    Regards,
    Chunjie

    ----- Original Message -----

    From: "Yongkun Anfernee Gui" <agui@rbcon.com>
    To: vcap-dev@cloudfoundry.org
    Sent: Wednesday, June 13, 2012 8:23:29 AM
    Subject: Re: [vcap-dev] Re: nginx worker process high CPU usage

    Glad your problem is fixed.


    FYI:
    adding this to nginx.conf wil increase the number of open file in worker:


    worker_rlimit_nofile 2048;

    Thanks,
    Anfernee


    On Wed, Jun 13, 2012 at 3:46 AM, Florin Dragos wrote:


    Tthere was a warning when running nginx -t:


    [warn] 2048 worker_connections exceed open
    file resource limit: 1024

    Reducing worker_connections to 1024, seems to solve the issue. At
    least for now, CPU seems stable.


    On Jun 12, 6:07 pm, Florin Dragos wrote:
    I'm not sure how to interpret /proc/<pid>/maps
    The output right now is

    01ac2000-023f7000 rw-p 00000000 00:00
    0 [heap]

    before doing server requests it stayed at 01ac2000-01e7c000, while
    serving requests, second number keeps changing.

    the requested outputs:

    uname -a: Linux test-ubuntu1 2.6.32-33-server #70-Ubuntu SMP Thu Jul 7
    22:28:30 UTC 2011 x86_64 GNU/Linux

    nginx -V: nginx version: nginx/1.2.0
    built by gcc 4.4.3 (Ubuntu 4.4.3-4ubuntu5.1)
    configure arguments: --prefix=/home/cfuser/.deployments/deployment/
    deploy/nginx/nginx-1.2.0 --with-pcre=../pcre-8.21 --add-module=../
    nginx_upload_module-2.2.0 --add-module=../agentzh-headers-more-nginx-
    module-5fac223 --add-module=../simpl-ngx_devel_kit-bc97eea --add-
    module=../chaoslawful-lua-nginx-module-204ce2b

    lsb_release: No LSB modules are available.

    On Jun 12, 12:36 pm, Yongkun Anfernee Gui wrote:






    also, can you try google-perftools for more detailed profiling of nginx.
    - anfernee
    On Tue, Jun 12, 2012 at 5:14 PM, Chunjie Zhu wrote:
    From strace output, it seems brk system call consumes most of all CPU
    resource.
    As we all know, brk system call is called by glibc malloc to allocate heap
    memory. So, a naive guess is that, the process's heap space runs out, so
    kernel struggles to revoke memory and do re-allocation.
    Please refer to /proc/<pid>/maps to check the process virtual memory
    layout and find out whether the heap space runs out when this problem
    happens again.
    Regards,
    Chunjie
    ------------------------------
    *From: *"Yongkun Anfernee Gui" < a...@rbcon.com >
    *To: * vcap-...@cloudfoundry.org
    *Sent: *Tuesday, June 12, 2012 3:45:01 PM
    *Subject: *Re: [vcap-dev] Re: nginx worker process high CPU usage
    First thing, we are officially using nginx 0.8.54 in cloud foundry, though
    I think
    1.2.0 should work the same.
    Next, other than upgrading nginx, did you other special things, like
    changing the
    nginx config file, etc?
    Next, what is the result of the following: uname -a, nginx -V,
    lsb_release? I know it
    works very well on ubuntu 10.04, x86_64/i686.
    Your requests failed or became slow when cpu went to 100%? Is there
    anything
    abnormal in nginx access log and error log? Can you try a simple config,
    to isolate
    the nginx problem?
    Thanks,
    Anfernee
    On Tue, Jun 12, 2012 at 3:08 PM, Florin Dragos wrote:

    Nginx version is 1.2.0.
    It reproduces only if it is serving requests.
    I ran strace while the server was being accessed. During this time CPU
    reached even 100%.
    This is the output:
    % time seconds usecs/call calls errors syscall
    ------ ----------- ----------- --------- --------- ----------------
    80.44 0.105620 4 26281 brk
    5.64 0.007410 0 53047 writev
    4.73 0.006208 0 55666 epoll_wait
    3.96 0.005198 0 23636 mremap
    2.07 0.002716 0 35218 17609 connect
    1.74 0.002289 2 1128 munmap
    0.29 0.000385 0 17410 sendto
    0.28 0.000374 0 35606 close
    0.28 0.000371 0 17611 write
    0.26 0.000336 0 136365 65479 recvfrom
    0.17 0.000217 0 35218 socket
    0.04 0.000056 0 36006 epoll_ctl
    0.03 0.000040 0 17609 241 readv
    0.03 0.000035 0 35218 getsockopt
    0.02 0.000020 0 1128 mmap
    0.02 0.000020 0 35218 ioctl
    0.00 0.000000 0 1 open
    0.00 0.000000 0 31 pwrite
    0.00 0.000000 0 6 sendfile
    0.00 0.000000 0 389 shutdown
    0.00 0.000000 0 395 setsockopt
    0.00 0.000000 0 1 unlink
    0.00 0.000000 0 394 accept4
    ------ ----------- ----------- --------- --------- ----------------
    100.00 0.131295 563582 83329 total
    On Jun 10, 7:42 pm, Yongkun Anfernee Gui wrote:
    Hi,
    Can that be reliably reproduced? I never saw that. Is it 100%
    reproducable
    even without serving any requests?
    Could you check the nginx logs under $DEPLOY/devbox/log/nginx.*.log?
    or probably run strace to get the syscall which nginx is spending cpu
    time on.
    thanks,
    anfernee
    On Sun, Jun 10, 2012 at 8:01 PM, Florin Dragos < florin.dra...@gmail.com
    wrote:
    Hi everyone,
    We're having some problems with nginx worker process that takes up too
    much CPU on vcap router.
    Only component running on the VM is the router. After deployment,
    nginx worker process starts using more and more CPU, until reaching
    100%. After killing the worker process, everything returns to normal.
    Component is set up using vcap_dev_setup.
    Any idea what might cause this high CPU usage and how to fix it?
    Thanks
  • Yssk22 at Aug 16, 2012 at 9:29 am
    Hi,

    I also encountered the same issue. I could reproduce even if I set worker
    process as 1024. The reproduce procedure is just to make stresses nginx
    server like this:

    - restart nginx server
    - perform httperf as 'httperf --hog --server={nginx-ip} --port=80 --uri=/
    --num-conns=300000 --rate=500 --timeout 5 --send-buffer=4096
    --recv-buffer=16384 --server-name=non-existent.examplecom'
    (num-conns and rate depend on your system)
    - at first, CPU usage is around 10% (depending on your system and --rate
    param) and it works well.
    - after a while, CPU usage gets 100%.
    - strace reports 'brk' system call usage as mentioned.
    - if kill httperf and resume again, CPU usage gets 100% soon.

    I found the string leak of 'package.path' and 'package.cpath' in nginx
    setup cookbook:

    https://github.com/cloudfoundry/vcap/blob/master/dev_setup/cookbooks/nginx/templates/default/router-nginx.conf.erb#L121

    This append strings on package.(c)path unlimitedly by requests. I guess
    this cause the large number of 'malloc' and

    It should be removed and we should use 'lua_package_path' directive not to
    append but to set instead of this configuration, which should solve this
    issue.

    Thanks.

    2012年6月13日水曜日 11時20分03秒 UTC+9 Chunjie Zhu:
    Most likely, the default limit for open file descriptors per linux process
    is 1024, network socket is also taken into account.

    chunjie@ubuntu:~$ ulimit -n
    1024

    However, this value is a soft limit, but not a hard limit. It means, if
    the soft limit exceeds then kernel will try to do something (refer to linux
    kernel fs/file.c alloc_fd function), kernel may trap into the loop of
    "expand fd array -> error -> repeat -> expand fd array". While hard limit
    or sysctl_nr_open exceeds then error will be returned to user land
    applications directly.

    So, besides the suggestion from Anfernee, we can also use ulimit to set
    open fd rlimit (ulimit -n 2048, additional configuration is needed if we
    want it take effect when system boots up). From the perspective of
    low-level, they both call setrlimit system call. In normal case, we do not
    need to touch sysctl_nr_open, because its value is large enough.

    NOTE: The priority for the above limits in linux kernel is, soft limit ->
    hard limit -> sysctl_nr_open

    Regards,
    Chunjie

    ------------------------------
    *From: *"Yongkun Anfernee Gui" <ag...@rbcon.com <javascript:>>
    *To: *vcap...@cloudfoundry.org <javascript:>
    *Sent: *Wednesday, June 13, 2012 8:23:29 AM
    *Subject: *Re: [vcap-dev] Re: nginx worker process high CPU usage

    Glad your problem is fixed.

    FYI:
    adding this to nginx.conf wil increase the number of open file in worker:

    worker_rlimit_nofile 2048;

    Thanks,
    Anfernee

    On Wed, Jun 13, 2012 at 3:46 AM, Florin Dragos <florin...@gmail.com<javascript:>
    wrote:
    Tthere was a warning when running nginx -t:

    [warn] 2048 worker_connections exceed open
    file resource limit: 1024

    Reducing worker_connections to 1024, seems to solve the issue. At
    least for now, CPU seems stable.
    On Jun 12, 6:07 pm, Florin Dragos wrote:
    I'm not sure how to interpret /proc/<pid>/maps
    The output right now is

    01ac2000-023f7000 rw-p 00000000 00:00
    0 [heap]

    before doing server requests it stayed at 01ac2000-01e7c000, while
    serving requests, second number keeps changing.

    the requested outputs:

    uname -a: Linux test-ubuntu1 2.6.32-33-server #70-Ubuntu SMP Thu Jul 7
    22:28:30 UTC 2011 x86_64 GNU/Linux

    nginx -V: nginx version: nginx/1.2.0
    built by gcc 4.4.3 (Ubuntu 4.4.3-4ubuntu5.1)
    configure arguments: --prefix=/home/cfuser/.deployments/deployment/
    deploy/nginx/nginx-1.2.0 --with-pcre=../pcre-8.21 --add-module=../
    nginx_upload_module-2.2.0 --add-module=../agentzh-headers-more-nginx-
    module-5fac223 --add-module=../simpl-ngx_devel_kit-bc97eea --add-
    module=../chaoslawful-lua-nginx-module-204ce2b

    lsb_release: No LSB modules are available.

    On Jun 12, 12:36 pm, Yongkun Anfernee Gui wrote:






    also, can you try google-perftools for more detailed profiling of
    nginx.
    - anfernee
    On Tue, Jun 12, 2012 at 5:14 PM, Chunjie Zhu wrote:
    From strace output, it seems brk system call consumes most of all
    CPU
    resource.
    As we all know, brk system call is called by glibc malloc to
    allocate heap
    memory. So, a naive guess is that, the process's heap space runs
    out, so
    kernel struggles to revoke memory and do re-allocation.
    Please refer to /proc/<pid>/maps to check the process virtual memory
    layout and find out whether the heap space runs out when this
    problem
    happens again.
    Regards,
    Chunjie
    ------------------------------
    *From: *"Yongkun Anfernee Gui" <a...@rbcon.com>
    *To: *vcap-...@cloudfoundry.org
    *Sent: *Tuesday, June 12, 2012 3:45:01 PM
    *Subject: *Re: [vcap-dev] Re: nginx worker process high CPU usage
    First thing, we are officially using nginx 0.8.54 in cloud foundry,
    though
    I think
    1.2.0 should work the same.
    Next, other than upgrading nginx, did you other special things, like
    changing the
    nginx config file, etc?
    Next, what is the result of the following: uname -a, nginx -V,
    lsb_release? I know it
    works very well on ubuntu 10.04, x86_64/i686.
    Your requests failed or became slow when cpu went to 100%? Is there
    anything
    abnormal in nginx access log and error log? Can you try a simple
    config,
    to isolate
    the nginx problem?
    Thanks,
    Anfernee
    On Tue, Jun 12, 2012 at 3:08 PM, Florin Dragos <
    florin.dra...@gmail.com>wrote:
    Nginx version is 1.2.0.
    It reproduces only if it is serving requests.
    I ran strace while the server was being accessed. During this time
    CPU
    reached even 100%.
    This is the output:
    % time seconds usecs/call calls errors syscall
    ------ ----------- ----------- --------- --------- ----------------
    80.44 0.105620 4 26281 brk
    5.64 0.007410 0 53047 writev
    4.73 0.006208 0 55666 epoll_wait
    3.96 0.005198 0 23636 mremap
    2.07 0.002716 0 35218 17609 connect
    1.74 0.002289 2 1128 munmap
    0.29 0.000385 0 17410 sendto
    0.28 0.000374 0 35606 close
    0.28 0.000371 0 17611 write
    0.26 0.000336 0 136365 65479 recvfrom
    0.17 0.000217 0 35218 socket
    0.04 0.000056 0 36006 epoll_ctl
    0.03 0.000040 0 17609 241 readv
    0.03 0.000035 0 35218 getsockopt
    0.02 0.000020 0 1128 mmap
    0.02 0.000020 0 35218 ioctl
    0.00 0.000000 0 1 open
    0.00 0.000000 0 31 pwrite
    0.00 0.000000 0 6 sendfile
    0.00 0.000000 0 389 shutdown
    0.00 0.000000 0 395 setsockopt
    0.00 0.000000 0 1 unlink
    0.00 0.000000 0 394 accept4
    ------ ----------- ----------- --------- --------- ----------------
    100.00 0.131295 563582 83329 total
    On Jun 10, 7:42 pm, Yongkun Anfernee Gui wrote:
    Hi,
    Can that be reliably reproduced? I never saw that. Is it 100%
    reproducable
    even without serving any requests?
    Could you check the nginx logs under
    $DEPLOY/devbox/log/nginx.*.log?
    or probably run strace to get the syscall which nginx is
    spending cpu
    time on.
    thanks,
    anfernee
    On Sun, Jun 10, 2012 at 8:01 PM, Florin Dragos <
    florin.dra...@gmail.com
    wrote:
    Hi everyone,
    We're having some problems with nginx worker process that
    takes up too
    much CPU on vcap router.
    Only component running on the VM is the router. After
    deployment,
    nginx worker process starts using more and more CPU, until
    reaching
    100%. After killing the worker process, everything returns to
    normal.
    Component is set up using vcap_dev_setup.
    Any idea what might cause this high CPU usage and how to fix
    it?
    Thanks
  • Yongkun Anfernee Gui at Aug 16, 2012 at 11:20 am
    Hi Yohei,

    Thanks for reporting and analyzing the issue. That was an issue
    which only appears in dev_setup. The cause is exactly as you
    said. The fix is submitted here:
    http://reviews.cloudfoundry.org/#/c/8456/

    Thanks again,
    Anfernee
    On Thu, Aug 16, 2012 at 5:29 PM, yssk22 wrote:

    Hi,

    I also encountered the same issue. I could reproduce even if I set worker
    process as 1024. The reproduce procedure is just to make stresses nginx
    server like this:

    - restart nginx server
    - perform httperf as 'httperf --hog --server={nginx-ip} --port=80 --uri=/
    --num-conns=300000 --rate=500 --timeout 5 --send-buffer=4096
    --recv-buffer=16384 --server-name=non-existent.examplecom'
    (num-conns and rate depend on your system)
    - at first, CPU usage is around 10% (depending on your system and --rate
    param) and it works well.
    - after a while, CPU usage gets 100%.
    - strace reports 'brk' system call usage as mentioned.
    - if kill httperf and resume again, CPU usage gets 100% soon.

    I found the string leak of 'package.path' and 'package.cpath' in nginx
    setup cookbook:


    https://github.com/cloudfoundry/vcap/blob/master/dev_setup/cookbooks/nginx/templates/default/router-nginx.conf.erb#L121

    This append strings on package.(c)path unlimitedly by requests. I guess
    this cause the large number of 'malloc' and

    It should be removed and we should use 'lua_package_path' directive not to
    append but to set instead of this configuration, which should solve this
    issue.

    Thanks.

    2012年6月13日水曜日 11時20分03秒 UTC+9 Chunjie Zhu:
    Most likely, the default limit for open file descriptors per linux
    process is 1024, network socket is also taken into account.

    chunjie@ubuntu:~$ ulimit -n
    1024

    However, this value is a soft limit, but not a hard limit. It means, if
    the soft limit exceeds then kernel will try to do something (refer to linux
    kernel fs/file.c alloc_fd function), kernel may trap into the loop of
    "expand fd array -> error -> repeat -> expand fd array". While hard limit
    or sysctl_nr_open exceeds then error will be returned to user land
    applications directly.

    So, besides the suggestion from Anfernee, we can also use ulimit to set
    open fd rlimit (ulimit -n 2048, additional configuration is needed if we
    want it take effect when system boots up). From the perspective of
    low-level, they both call setrlimit system call. In normal case, we do not
    need to touch sysctl_nr_open, because its value is large enough.

    NOTE: The priority for the above limits in linux kernel is, soft limit ->
    hard limit -> sysctl_nr_open

    Regards,
    Chunjie

    ------------------------------
    *From: *"Yongkun Anfernee Gui" <ag...@rbcon.com>
    *To: *vcap...@cloudfoundry.org

    *Sent: *Wednesday, June 13, 2012 8:23:29 AM
    *Subject: *Re: [vcap-dev] Re: nginx worker process high CPU usage

    Glad your problem is fixed.

    FYI:
    adding this to nginx.conf wil increase the number of open file in worker:

    worker_rlimit_nofile 2048;

    Thanks,
    Anfernee
    On Wed, Jun 13, 2012 at 3:46 AM, Florin Dragos wrote:

    Tthere was a warning when running nginx -t:

    [warn] 2048 worker_connections exceed open
    file resource limit: 1024

    Reducing worker_connections to 1024, seems to solve the issue. At
    least for now, CPU seems stable.
    On Jun 12, 6:07 pm, Florin Dragos wrote:
    I'm not sure how to interpret /proc/<pid>/maps
    The output right now is

    01ac2000-023f7000 rw-p 00000000 00:00
    0 [heap]

    before doing server requests it stayed at 01ac2000-01e7c000, while
    serving requests, second number keeps changing.

    the requested outputs:

    uname -a: Linux test-ubuntu1 2.6.32-33-server #70-Ubuntu SMP Thu Jul 7
    22:28:30 UTC 2011 x86_64 GNU/Linux

    nginx -V: nginx version: nginx/1.2.0
    built by gcc 4.4.3 (Ubuntu 4.4.3-4ubuntu5.1)
    configure arguments: --prefix=/home/cfuser/.**deployments/deployment/
    deploy/nginx/nginx-1.2.0 --with-pcre=../pcre-8.21 --add-module=../
    nginx_upload_module-2.2.0 --add-module=../agentzh-**
    headers-more-nginx-
    module-5fac223 --add-module=../simpl-ngx_**devel_kit-bc97eea --add-
    module=../chaoslawful-lua-**nginx-module-204ce2b

    lsb_release: No LSB modules are available.

    On Jun 12, 12:36 pm, Yongkun Anfernee Gui wrote:






    also, can you try google-perftools for more detailed profiling of
    nginx.
    - anfernee
    On Tue, Jun 12, 2012 at 5:14 PM, Chunjie Zhu wrote:
    From strace output, it seems brk system call consumes most of all
    CPU
    resource.
    As we all know, brk system call is called by glibc malloc to
    allocate heap
    memory. So, a naive guess is that, the process's heap space runs
    out, so
    kernel struggles to revoke memory and do re-allocation.
    Please refer to /proc/<pid>/maps to check the process virtual
    memory
    layout and find out whether the heap space runs out when this
    problem
    happens again.
    Regards,
    Chunjie
    ------------------------------
    *From: *"Yongkun Anfernee Gui" <a...@rbcon.com>
    *To: *vcap-...@cloudfoundry.org
    *Sent: *Tuesday, June 12, 2012 3:45:01 PM
    *Subject: *Re: [vcap-dev] Re: nginx worker process high CPU usage
    First thing, we are officially using nginx 0.8.54 in cloud
    foundry, though
    I think
    1.2.0 should work the same.
    Next, other than upgrading nginx, did you other special things,
    like
    changing the
    nginx config file, etc?
    Next, what is the result of the following: uname -a, nginx -V,
    lsb_release? I know it
    works very well on ubuntu 10.04, x86_64/i686.
    Your requests failed or became slow when cpu went to 100%? Is there
    anything
    abnormal in nginx access log and error log? Can you try a simple
    config,
    to isolate
    the nginx problem?
    Thanks,
    Anfernee
    On Tue, Jun 12, 2012 at 3:08 PM, Florin Dragos <
    florin.dra...@gmail.com>**wrote:
    Nginx version is 1.2.0.
    It reproduces only if it is serving requests.
    I ran strace while the server was being accessed. During this
    time CPU
    reached even 100%.
    This is the output:
    % time seconds usecs/call calls errors syscall
    ------ ----------- ----------- --------- ---------
    ----------------
    80.44 0.105620 4 26281 brk
    5.64 0.007410 0 53047 writev
    4.73 0.006208 0 55666 epoll_wait
    3.96 0.005198 0 23636 mremap
    2.07 0.002716 0 35218 17609 connect
    1.74 0.002289 2 1128 munmap
    0.29 0.000385 0 17410 sendto
    0.28 0.000374 0 35606 close
    0.28 0.000371 0 17611 write
    0.26 0.000336 0 136365 65479 recvfrom
    0.17 0.000217 0 35218 socket
    0.04 0.000056 0 36006 epoll_ctl
    0.03 0.000040 0 17609 241 readv
    0.03 0.000035 0 35218 getsockopt
    0.02 0.000020 0 1128 mmap
    0.02 0.000020 0 35218 ioctl
    0.00 0.000000 0 1 open
    0.00 0.000000 0 31 pwrite
    0.00 0.000000 0 6 sendfile
    0.00 0.000000 0 389 shutdown
    0.00 0.000000 0 395 setsockopt
    0.00 0.000000 0 1 unlink
    0.00 0.000000 0 394 accept4
    ------ ----------- ----------- --------- ---------
    ----------------
    100.00 0.131295 563582 83329 total
    On Jun 10, 7:42 pm, Yongkun Anfernee Gui wrote:
    Hi,
    Can that be reliably reproduced? I never saw that. Is it 100%
    reproducable
    even without serving any requests?
    Could you check the nginx logs under $DEPLOY/devbox/log/nginx.*.
    **log?
    or probably run strace to get the syscall which nginx is
    spending cpu
    time on.
    thanks,
    anfernee
    On Sun, Jun 10, 2012 at 8:01 PM, Florin Dragos <
    florin.dra...@gmail.com
    wrote:
    Hi everyone,
    We're having some problems with nginx worker process that
    takes up too
    much CPU on vcap router.
    Only component running on the VM is the router. After
    deployment,
    nginx worker process starts using more and more CPU, until
    reaching
    100%. After killing the worker process, everything returns to
    normal.
    Component is set up using vcap_dev_setup.
    Any idea what might cause this high CPU usage and how to fix
    it?
    Thanks

    --
    Cheers,
    Anfernee
  • Yssk22 at Aug 16, 2012 at 2:54 pm
    Thanks! It seems fine.

    2012年8月16日木曜日 20時20分39秒 UTC+9 Anfernee Gui:
    Hi Yohei,

    Thanks for reporting and analyzing the issue. That was an issue
    which only appears in dev_setup. The cause is exactly as you
    said. The fix is submitted here:
    http://reviews.cloudfoundry.org/#/c/8456/

    Thanks again,
    Anfernee

    On Thu, Aug 16, 2012 at 5:29 PM, yssk22 <yss...@gmail.com <javascript:>>wrote:
    Hi,

    I also encountered the same issue. I could reproduce even if I set worker
    process as 1024. The reproduce procedure is just to make stresses nginx
    server like this:

    - restart nginx server
    - perform httperf as 'httperf --hog --server={nginx-ip} --port=80 --uri=/
    --num-conns=300000 --rate=500 --timeout 5 --send-buffer=4096
    --recv-buffer=16384 --server-name=non-existent.examplecom'
    (num-conns and rate depend on your system)
    - at first, CPU usage is around 10% (depending on your system and --rate
    param) and it works well.
    - after a while, CPU usage gets 100%.
    - strace reports 'brk' system call usage as mentioned.
    - if kill httperf and resume again, CPU usage gets 100% soon.

    I found the string leak of 'package.path' and 'package.cpath' in nginx
    setup cookbook:


    https://github.com/cloudfoundry/vcap/blob/master/dev_setup/cookbooks/nginx/templates/default/router-nginx.conf.erb#L121

    This append strings on package.(c)path unlimitedly by requests. I guess
    this cause the large number of 'malloc' and

    It should be removed and we should use 'lua_package_path' directive not
    to append but to set instead of this configuration, which should solve this
    issue.

    Thanks.

    2012年6月13日水曜日 11時20分03秒 UTC+9 Chunjie Zhu:
    Most likely, the default limit for open file descriptors per linux
    process is 1024, network socket is also taken into account.

    chunjie@ubuntu:~$ ulimit -n
    1024

    However, this value is a soft limit, but not a hard limit. It means, if
    the soft limit exceeds then kernel will try to do something (refer to linux
    kernel fs/file.c alloc_fd function), kernel may trap into the loop of
    "expand fd array -> error -> repeat -> expand fd array". While hard limit
    or sysctl_nr_open exceeds then error will be returned to user land
    applications directly.

    So, besides the suggestion from Anfernee, we can also use ulimit to set
    open fd rlimit (ulimit -n 2048, additional configuration is needed if we
    want it take effect when system boots up). From the perspective of
    low-level, they both call setrlimit system call. In normal case, we do not
    need to touch sysctl_nr_open, because its value is large enough.

    NOTE: The priority for the above limits in linux kernel is, soft limit
    -> hard limit -> sysctl_nr_open

    Regards,
    Chunjie

    ------------------------------
    *From: *"Yongkun Anfernee Gui" <ag...@rbcon.com>
    *To: *vcap...@cloudfoundry.org

    *Sent: *Wednesday, June 13, 2012 8:23:29 AM
    *Subject: *Re: [vcap-dev] Re: nginx worker process high CPU usage

    Glad your problem is fixed.

    FYI:
    adding this to nginx.conf wil increase the number of open file in worker:

    worker_rlimit_nofile 2048;

    Thanks,
    Anfernee
    On Wed, Jun 13, 2012 at 3:46 AM, Florin Dragos wrote:

    Tthere was a warning when running nginx -t:

    [warn] 2048 worker_connections exceed open
    file resource limit: 1024

    Reducing worker_connections to 1024, seems to solve the issue. At
    least for now, CPU seems stable.
    On Jun 12, 6:07 pm, Florin Dragos wrote:
    I'm not sure how to interpret /proc/<pid>/maps
    The output right now is

    01ac2000-023f7000 rw-p 00000000 00:00
    0 [heap]

    before doing server requests it stayed at 01ac2000-01e7c000, while
    serving requests, second number keeps changing.

    the requested outputs:

    uname -a: Linux test-ubuntu1 2.6.32-33-server #70-Ubuntu SMP Thu Jul 7
    22:28:30 UTC 2011 x86_64 GNU/Linux

    nginx -V: nginx version: nginx/1.2.0
    built by gcc 4.4.3 (Ubuntu 4.4.3-4ubuntu5.1)
    configure arguments: --prefix=/home/cfuser/.**deployments/deployment/
    deploy/nginx/nginx-1.2.0 --with-pcre=../pcre-8.21 --add-module=../
    nginx_upload_module-2.2.0 --add-module=../agentzh-**
    headers-more-nginx-
    module-5fac223 --add-module=../simpl-ngx_**devel_kit-bc97eea --add-
    module=../chaoslawful-lua-**nginx-module-204ce2b

    lsb_release: No LSB modules are available.

    On Jun 12, 12:36 pm, Yongkun Anfernee Gui wrote:






    also, can you try google-perftools for more detailed profiling of
    nginx.
    - anfernee
    On Tue, Jun 12, 2012 at 5:14 PM, Chunjie Zhu <c...@rbcon.com>
    wrote:
    From strace output, it seems brk system call consumes most of all
    CPU
    resource.
    As we all know, brk system call is called by glibc malloc to
    allocate heap
    memory. So, a naive guess is that, the process's heap space runs
    out, so
    kernel struggles to revoke memory and do re-allocation.
    Please refer to /proc/<pid>/maps to check the process virtual
    memory
    layout and find out whether the heap space runs out when this
    problem
    happens again.
    Regards,
    Chunjie
    ------------------------------
    *From: *"Yongkun Anfernee Gui" <a...@rbcon.com>
    *To: *vcap-...@cloudfoundry.org
    *Sent: *Tuesday, June 12, 2012 3:45:01 PM
    *Subject: *Re: [vcap-dev] Re: nginx worker process high CPU usage
    First thing, we are officially using nginx 0.8.54 in cloud
    foundry, though
    I think
    1.2.0 should work the same.
    Next, other than upgrading nginx, did you other special things,
    like
    changing the
    nginx config file, etc?
    Next, what is the result of the following: uname -a, nginx -V,
    lsb_release? I know it
    works very well on ubuntu 10.04, x86_64/i686.
    Your requests failed or became slow when cpu went to 100%? Is
    there
    anything
    abnormal in nginx access log and error log? Can you try a simple
    config,
    to isolate
    the nginx problem?
    Thanks,
    Anfernee
    On Tue, Jun 12, 2012 at 3:08 PM, Florin Dragos <
    florin.dra...@gmail.com>**wrote:
    Nginx version is 1.2.0.
    It reproduces only if it is serving requests.
    I ran strace while the server was being accessed. During this
    time CPU
    reached even 100%.
    This is the output:
    % time seconds usecs/call calls errors syscall
    ------ ----------- ----------- --------- ---------
    ----------------
    80.44 0.105620 4 26281 brk
    5.64 0.007410 0 53047 writev
    4.73 0.006208 0 55666 epoll_wait
    3.96 0.005198 0 23636 mremap
    2.07 0.002716 0 35218 17609 connect
    1.74 0.002289 2 1128 munmap
    0.29 0.000385 0 17410 sendto
    0.28 0.000374 0 35606 close
    0.28 0.000371 0 17611 write
    0.26 0.000336 0 136365 65479 recvfrom
    0.17 0.000217 0 35218 socket
    0.04 0.000056 0 36006 epoll_ctl
    0.03 0.000040 0 17609 241 readv
    0.03 0.000035 0 35218 getsockopt
    0.02 0.000020 0 1128 mmap
    0.02 0.000020 0 35218 ioctl
    0.00 0.000000 0 1 open
    0.00 0.000000 0 31 pwrite
    0.00 0.000000 0 6 sendfile
    0.00 0.000000 0 389 shutdown
    0.00 0.000000 0 395 setsockopt
    0.00 0.000000 0 1 unlink
    0.00 0.000000 0 394 accept4
    ------ ----------- ----------- --------- ---------
    ----------------
    100.00 0.131295 563582 83329 total
    On Jun 10, 7:42 pm, Yongkun Anfernee Gui wrote:
    Hi,
    Can that be reliably reproduced? I never saw that. Is it 100%
    reproducable
    even without serving any requests?
    Could you check the nginx logs under
    $DEPLOY/devbox/log/nginx.*.**log?
    or probably run strace to get the syscall which nginx is
    spending cpu
    time on.
    thanks,
    anfernee
    On Sun, Jun 10, 2012 at 8:01 PM, Florin Dragos <
    florin.dra...@gmail.com
    wrote:
    Hi everyone,
    We're having some problems with nginx worker process that
    takes up too
    much CPU on vcap router.
    Only component running on the VM is the router. After
    deployment,
    nginx worker process starts using more and more CPU, until
    reaching
    100%. After killing the worker process, everything returns
    to normal.
    Component is set up using vcap_dev_setup.
    Any idea what might cause this high CPU usage and how to fix
    it?
    Thanks

    --
    Cheers,
    Anfernee

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupvcap-dev @
postedJun 10, '12 at 12:01p
activeAug 16, '12 at 2:54p
posts14
users5

People

Translate

site design / logo © 2022 Grokbase