FAQ
Hi there,

I'm running a golang webapp in production on about 15 large ec2 instances,
under ELB. It's handling about 11k requests / second, and doing very
little in the way of disk reads or writes.

It's compiled with Go 1.1.

After a few hours of starting up an instance, starting the golang server
webapp, and adding the instance to ELB, it starts to hang. CPU
usage, memory usage, and disk I/O are almost at zero.

During the "hanging", when we make a request with our browser, the browser
sits there waiting for a response indefinitely. If we ssh to the server and
kill the go webapp process, then the browser immediately shows a connection
terminated.

Then if we restart, we can access it fine for awhile, but then the
inevitable happens again -- the go webapp server hangs again, using almost
no resources, but unable to respond with any content over http.

The way we're handling the requests is this:

   // ... registration code ... //
   err = http.ListenAndServe(":80", nil)
   if err != nil {
     log.Fatal("ListenAndServe: ", err)
   }

I'm not quite sure how to diagnose the root cause. It seems to be
request-queue related.

Does anyone have ideas on how we can start to look for the root cause?

Thanks!



--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Search Discussions

  • Dave Cheney at May 29, 2013 at 2:44 am
    I'm assuming because you are handling such high request loads you
    already know how to adjust your kernel and per process tunables
    appropriately.

    Can you please post the full output of hitting a 'hung' process with
    SIGQUIT. This will cause the process to abort and print the stack
    trace for all goroutines.
    On Wed, May 29, 2013 at 8:58 AM, wrote:
    Hi there,

    I'm running a golang webapp in production on about 15 large ec2 instances,
    under ELB. It's handling about 11k requests / second, and doing very little
    in the way of disk reads or writes.

    It's compiled with Go 1.1.

    After a few hours of starting up an instance, starting the golang server
    webapp, and adding the instance to ELB, it starts to hang. CPU usage,
    memory usage, and disk I/O are almost at zero.

    During the "hanging", when we make a request with our browser, the browser
    sits there waiting for a response indefinitely. If we ssh to the server and
    kill the go webapp process, then the browser immediately shows a connection
    terminated.

    Then if we restart, we can access it fine for awhile, but then the
    inevitable happens again -- the go webapp server hangs again, using almost
    no resources, but unable to respond with any content over http.

    The way we're handling the requests is this:

    // ... registration code ... //
    err = http.ListenAndServe(":80", nil)
    if err != nil {
    log.Fatal("ListenAndServe: ", err)
    }

    I'm not quite sure how to diagnose the root cause. It seems to be
    request-queue related.

    Does anyone have ideas on how we can start to look for the root cause?

    Thanks!



    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Matthew Moore at Jun 3, 2013 at 2:38 pm
    Hey -- thanks for the response. I actually didn't know about GOMAXPROCS
    until someone brought it up after this email. Are there any other flags I
    can tune?

    Best,
    Matt
    --
    http://www.linkedin.com/in/matthewpaulmoore
    (650) 888-5962

    On Tue, May 28, 2013 at 7:44 PM, Dave Cheney wrote:

    I'm assuming because you are handling such high request loads you
    already know how to adjust your kernel and per process tunables
    appropriately.

    Can you please post the full output of hitting a 'hung' process with
    SIGQUIT. This will cause the process to abort and print the stack
    trace for all goroutines.
    On Wed, May 29, 2013 at 8:58 AM, wrote:
    Hi there,

    I'm running a golang webapp in production on about 15 large ec2
    instances,
    under ELB. It's handling about 11k requests / second, and doing very little
    in the way of disk reads or writes.

    It's compiled with Go 1.1.

    After a few hours of starting up an instance, starting the golang server
    webapp, and adding the instance to ELB, it starts to hang. CPU usage,
    memory usage, and disk I/O are almost at zero.

    During the "hanging", when we make a request with our browser, the browser
    sits there waiting for a response indefinitely. If we ssh to the server and
    kill the go webapp process, then the browser immediately shows a
    connection
    terminated.

    Then if we restart, we can access it fine for awhile, but then the
    inevitable happens again -- the go webapp server hangs again, using almost
    no resources, but unable to respond with any content over http.

    The way we're handling the requests is this:

    // ... registration code ... //
    err = http.ListenAndServe(":80", nil)
    if err != nil {
    log.Fatal("ListenAndServe: ", err)
    }

    I'm not quite sure how to diagnose the root cause. It seems to be
    request-queue related.

    Does anyone have ideas on how we can start to look for the root cause?

    Thanks!



    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Brad Fitzpatrick at Jun 3, 2013 at 3:12 pm
    What does "request-queue related" mean?

    I agree with Dave Cheney: send the process SIGQUIT and reply with its
    output (the dump of all goroutines).

    Also, what does lsof -n -p <pid> say for the process (before you kill it)?
      Are you leaking file descriptors? It's possible you're leaking (blocked)
    goroutines and those are retaining references to file descriptors.


    On Tue, May 28, 2013 at 3:58 PM, wrote:

    Hi there,

    I'm running a golang webapp in production on about 15 large ec2 instances,
    under ELB. It's handling about 11k requests / second, and doing very
    little in the way of disk reads or writes.

    It's compiled with Go 1.1.

    After a few hours of starting up an instance, starting the golang server
    webapp, and adding the instance to ELB, it starts to hang. CPU
    usage, memory usage, and disk I/O are almost at zero.

    During the "hanging", when we make a request with our browser, the browser
    sits there waiting for a response indefinitely. If we ssh to the server and
    kill the go webapp process, then the browser immediately shows a connection
    terminated.

    Then if we restart, we can access it fine for awhile, but then the
    inevitable happens again -- the go webapp server hangs again, using almost
    no resources, but unable to respond with any content over http.

    The way we're handling the requests is this:

    // ... registration code ... //
    err = http.ListenAndServe(":80", nil)
    if err != nil {
    log.Fatal("ListenAndServe: ", err)
    }

    I'm not quite sure how to diagnose the root cause. It seems to be
    request-queue related.

    Does anyone have ideas on how we can start to look for the root cause?

    Thanks!



    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Matthew Moore at Jun 14, 2013 at 9:31 pm
    Looks like you guys were right -- GOMAXPROCS seemed to fix everything.
      Thanks for all the help!

    Best,
    Matt
    --
    http://www.linkedin.com/in/matthewpaulmoore
    (650) 888-5962

    On Mon, Jun 3, 2013 at 8:12 AM, Brad Fitzpatrick wrote:

    What does "request-queue related" mean?

    I agree with Dave Cheney: send the process SIGQUIT and reply with its
    output (the dump of all goroutines).

    Also, what does lsof -n -p <pid> say for the process (before you kill it)?
    Are you leaking file descriptors? It's possible you're leaking (blocked)
    goroutines and those are retaining references to file descriptors.


    On Tue, May 28, 2013 at 3:58 PM, wrote:

    Hi there,

    I'm running a golang webapp in production on about 15 large ec2
    instances, under ELB. It's handling about 11k requests / second, and doing
    very little in the way of disk reads or writes.

    It's compiled with Go 1.1.

    After a few hours of starting up an instance, starting the golang server
    webapp, and adding the instance to ELB, it starts to hang. CPU
    usage, memory usage, and disk I/O are almost at zero.

    During the "hanging", when we make a request with our browser, the
    browser sits there waiting for a response indefinitely. If we ssh to the
    server and kill the go webapp process, then the browser immediately shows a
    connection terminated.

    Then if we restart, we can access it fine for awhile, but then the
    inevitable happens again -- the go webapp server hangs again, using almost
    no resources, but unable to respond with any content over http.

    The way we're handling the requests is this:

    // ... registration code ... //
    err = http.ListenAndServe(":80", nil)
    if err != nil {
    log.Fatal("ListenAndServe: ", err)
    }

    I'm not quite sure how to diagnose the root cause. It seems to be
    request-queue related.

    Does anyone have ideas on how we can start to look for the root cause?

    Thanks!



    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Dave Cheney at Jun 14, 2013 at 11:38 pm
    Umm, nobody suggested that as a solution. It sounds more like a band aid.


    On 15/06/2013, at 7:31, Matthew Moore wrote:

    Looks like you guys were right -- GOMAXPROCS seemed to fix everything. Thanks for all the help!

    Best,
    Matt
    --
    http://www.linkedin.com/in/matthewpaulmoore
    (650) 888-5962

    On Mon, Jun 3, 2013 at 8:12 AM, Brad Fitzpatrick wrote:
    What does "request-queue related" mean?

    I agree with Dave Cheney: send the process SIGQUIT and reply with its output (the dump of all goroutines).

    Also, what does lsof -n -p <pid> say for the process (before you kill it)? Are you leaking file descriptors? It's possible you're leaking (blocked) goroutines and those are retaining references to file descriptors.


    On Tue, May 28, 2013 at 3:58 PM, wrote:
    Hi there,

    I'm running a golang webapp in production on about 15 large ec2 instances, under ELB. It's handling about 11k requests / second, and doing very little in the way of disk reads or writes.

    It's compiled with Go 1.1.

    After a few hours of starting up an instance, starting the golang server webapp, and adding the instance to ELB, it starts to hang. CPU usage, memory usage, and disk I/O are almost at zero.

    During the "hanging", when we make a request with our browser, the browser sits there waiting for a response indefinitely. If we ssh to the server and kill the go webapp process, then the browser immediately shows a connection terminated.

    Then if we restart, we can access it fine for awhile, but then the inevitable happens again -- the go webapp server hangs again, using almost no resources, but unable to respond with any content over http.

    The way we're handling the requests is this:

    // ... registration code ... //
    err = http.ListenAndServe(":80", nil)
    if err != nil {
    log.Fatal("ListenAndServe: ", err)
    }

    I'm not quite sure how to diagnose the root cause. It seems to be request-queue related.

    Does anyone have ideas on how we can start to look for the root cause?

    Thanks!



    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Matt Silverlock at Jun 15, 2013 at 12:26 am

    On Saturday, June 15, 2013 5:31:16 AM UTC+8, Matthew Moore wrote:
    Looks like you guys were right -- GOMAXPROCS seemed to fix everything.
    Thanks for all the help!

    Note that when Dave is talking about "adjust[ing] your kernel and per
    process tunables" he doesn't mean GOMAXPROCS. He much more likely means
    tweaking your TCP keep-alive and kernel file handler parameters, which are
    one of the biggest "killers" of high load HTTP applications.


    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Matthew Moore at Jun 20, 2013 at 2:58 pm
    Thanks for the clarification! That definitely makes sense. I think that
    will become important as we go past 10k QPS (we're at 4k now) :)

    Best,
    Matt
    --
    http://www.linkedin.com/in/matthewpaulmoore
    (650) 888-5962

    On Fri, Jun 14, 2013 at 5:26 PM, Matt Silverlock wrote:

    On Saturday, June 15, 2013 5:31:16 AM UTC+8, Matthew Moore wrote:

    Looks like you guys were right -- GOMAXPROCS seemed to fix everything.
    Thanks for all the help!

    Note that when Dave is talking about "adjust[ing] your kernel and per
    process tunables" he doesn't mean GOMAXPROCS. He much more likely means
    tweaking your TCP keep-alive and kernel file handler parameters, which are
    one of the biggest "killers" of high load HTTP applications.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedMay 29, '13 at 2:40a
activeJun 20, '13 at 2:58p
posts8
users4
websitegolang.org

People

Translate

site design / logo © 2022 Grokbase