FAQ
Hi all, I have a go process with a really strange (interesting?) CPU usage
issue. Over the course of about 24 hours, the CPU usage increases from
very low (what I would expect) to 100% of CPU, for no apparent reason.

This process keeps a set of IMAP connections open to send a notification
when a message comes in, and right now it's handling maybe 20 accounts, so
it's got something like 20 persistent connections open and then performs an
HTTP request anytime it sees a new message. So, it should be sitting idle
waiting on network data to come in some very high percentage of the time
and then making a call using the built in http client on an occasional
basis.

So, I'd expect CPU usage to be very low until some messages come in and
then spike, but instead I see this slow, steady CPU usage increase over
time. I would feel like that is the result of some data structure somewhere
getting steadily larger, but the memory usage stays very steady around
15MB, so that doesn't seem to be happening, and I don't see anything
obvious growing when looking at a heap profile.

Ordinarily, here's where I'd expect doing a CPU profile to be quite useful,
but I'm seeing some really strange results. I've set up a goroutine to do a
3 minute profile and then sleep for 5 minutes continuously, so I can look
at the results over time. The first profile looks good and makes sense, but
the issue hasn't shown up yet. All subsequent profiles seem valid in in the
pprof tool, but always contain less than 10 samples, which is obviously not
right for a 3 minute profile. Those samples don't really give me any info
to go on.

So, does anybody have any suggestions about what could be causing this or
how I might track it down? A heap profile doesn't show any large data
structures, a goroutine sample doesn't a large number of goroutines piled
up (and the memory usage would show that), and a CPU profile isn't showing
me anything. I'm kind of stuck.

Thanks,
Ian

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Search Discussions

  • Minux at Jan 30, 2013 at 5:18 pm

    On Thu, Jan 31, 2013 at 1:12 AM, Ian Ragsdale wrote:

    Hi all, I have a go process with a really strange (interesting?) CPU usage
    issue. Over the course of about 24 hours, the CPU usage increases from
    very low (what I would expect) to 100% of CPU, for no apparent reason.

    This process keeps a set of IMAP connections open to send a notification
    when a message comes in, and right now it's handling maybe 20 accounts, so
    it's got something like 20 persistent connections open and then performs an
    HTTP request anytime it sees a new message. So, it should be sitting idle
    waiting on network data to come in some very high percentage of the time
    and then making a call using the built in http client on an occasional
    basis.

    So, I'd expect CPU usage to be very low until some messages come in and
    then spike, but instead I see this slow, steady CPU usage increase over
    time. I would feel like that is the result of some data structure somewhere
    getting steadily larger, but the memory usage stays very steady around
    15MB, so that doesn't seem to be happening, and I don't see anything
    obvious growing when looking at a heap profile.

    Ordinarily, here's where I'd expect doing a CPU profile to be quite
    useful, but I'm seeing some really strange results. I've set up a goroutine
    to do a 3 minute profile and then sleep for 5 minutes continuously, so I
    can look at the results over time. The first profile looks good and makes
    sense, but the issue hasn't shown up yet. All subsequent profiles seem
    valid in in the pprof tool, but always contain less than 10 samples, which
    is obviously not right for a 3 minute profile. Those samples don't really
    give me any info to go on.

    So, does anybody have any suggestions about what could be causing this or
    how I might track it down? A heap profile doesn't show any large data
    structures, a goroutine sample doesn't a large number of goroutines piled
    up (and the memory usage would show that), and a CPU profile isn't showing
    me anything. I'm kind of stuck.
    perhaps you can use "net/http/pprof" package and use the live profile data
    or view the goroutine stack trace at any instant.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Bryanturley at Jan 30, 2013 at 5:19 pm
    Perhaps track system stats external to your program as well.
    Using a full system profiler might help if your code is causing the system
    (kernel/other) to behave badly.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Dave Cheney at Jan 30, 2013 at 10:00 pm
    Hi Ian,

    Can you please provide the following details

    * your Go version and operating system details
    * the full output from the process when sent a SIGQUIT, once it entered the high CPU usage scenario.
    * the full output from running the process with GOGCTRACE=1 as above.
    * the source, I possible, or at least a sample that reproduces the issue.
    * if you are using Linux, try running the process under the perf(1) tool.
    * if you able, try running the process under strace(1).

    My gut feeling is your code, or a library is not checking an error code from a socket operation and entering into a tight loop. If this is the case temporarily breaking the networking on this host may induce the failure.

    Cheers

    Dave
    On 31/01/2013, at 4:12, Ian Ragsdale wrote:

    Hi all, I have a go process with a really strange (interesting?) CPU usage issue. Over the course of about 24 hours, the CPU usage increases from very low (what I would expect) to 100% of CPU, for no apparent reason.

    This process keeps a set of IMAP connections open to send a notification when a message comes in, and right now it's handling maybe 20 accounts, so it's got something like 20 persistent connections open and then performs an HTTP request anytime it sees a new message. So, it should be sitting idle waiting on network data to come in some very high percentage of the time and then making a call using the built in http client on an occasional basis.

    So, I'd expect CPU usage to be very low until some messages come in and then spike, but instead I see this slow, steady CPU usage increase over time. I would feel like that is the result of some data structure somewhere getting steadily larger, but the memory usage stays very steady around 15MB, so that doesn't seem to be happening, and I don't see anything obvious growing when looking at a heap profile.

    Ordinarily, here's where I'd expect doing a CPU profile to be quite useful, but I'm seeing some really strange results. I've set up a goroutine to do a 3 minute profile and then sleep for 5 minutes continuously, so I can look at the results over time. The first profile looks good and makes sense, but the issue hasn't shown up yet. All subsequent profiles seem valid in in the pprof tool, but always contain less than 10 samples, which is obviously not right for a 3 minute profile. Those samples don't really give me any info to go on.

    So, does anybody have any suggestions about what could be causing this or how I might track it down? A heap profile doesn't show any large data structures, a goroutine sample doesn't a large number of goroutines piled up (and the memory usage would show that), and a CPU profile isn't showing me anything. I'm kind of stuck.

    Thanks,
    Ian

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Ian Ragsdale at Jan 31, 2013 at 1:57 am
    Thanks Dave, those look like great suggestions. I'm running Go 1.0.3 on Ubuntu:

    # go version go version go1.0.3

    # uname -a
    Linux 4bf343d4-786f-4fa6-b37c-8bdd018fc63b 2.6.32-350-ec2 #57-Ubuntu SMP Thu Nov 15 15:59:03 UTC 2012 x86_64 GNU/Linux

    I'll work on gathering the SIGQUIT & GOGCTRACE output - didn't know about those techniques. I can't really share the source, and I'm not sure if there's a good way to cut it down to a sharable chunk, but I'll give it a shot. Strace was going to be my next move.

    I think your theory in general makes sense, as I'm familiar with that kind of failure, but if that was the case, would it not immediately shoot up to 100% cpu usage as soon as the problem occurred? In this situation, the cpu usage creeps up quite slowly, in a nearly perfect linear fashion over the course of many hours.

    - Ian
    On Jan 30, 2013, at 4:00 PM, Dave Cheney wrote:


    Can you please provide the following details

    * your Go version and operating system details
    * the full output from the process when sent a SIGQUIT, once it entered the high CPU usage scenario.
    * the full output from running the process with GOGCTRACE=1 as above.
    * the source, I possible, or at least a sample that reproduces the issue.
    * if you are using Linux, try running the process under the perf(1) tool.
    * if you able, try running the process under strace(1).

    My gut feeling is your code, or a library is not checking an error code from a socket operation and entering into a tight loop. If this is the case temporarily breaking the networking on this host may induce the failure.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedJan 30, '13 at 5:12p
activeJan 31, '13 at 1:57a
posts5
users4
websitegolang.org

People

Translate

site design / logo © 2021 Grokbase