FAQ
How is it possible Go supports 100k+ concurrent goroutines on a single CPU,
doing context switching between all of them without large overhead?

I'm trying to understand how the Go scheduler works and in particular how
it compares to a traditional OS scheduler.

We all know it is a bad idea for a (C, Java) program to create thousands of
OS threads (e.g. one thread per request). This is mainly because:

1. Each thread uses fixed amount of memory for its stack
     (while in Go, the stack grows and shrinks as needed)

2. With many threads, the overhead of context switching becomes significant
     (isn't this true in Go as well?)

Even if Go context-switched only on IO and channel access, doesn't it still
have to save and restore the state of variables for each goroutine? How is
it possible it is so efficient compared to an OS scheduler?

There's a design doc<https://docs.google.com/document/d/1TTj4T2JO42uD5ID9e89oa0sLKhJYD0Y_kqxDv3I3XMw/edit>for the Go scheduler
(and an earlier discussion<https://groups.google.com/forum/#!topic/golang-nuts/tGvuvc8Z2IQ>on the topic).

Thanks!
Martin

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Search Discussions

  • Ian Lance Taylor at Aug 12, 2013 at 4:11 am

    On Sun, Aug 11, 2013 at 2:35 PM, wrote:
    How is it possible Go supports 100k+ concurrent goroutines on a single CPU,
    doing context switching between all of them without large overhead?
    If all 100k goroutines are actively doing things in parallel, the Go
    code will tend to have significant overhead. In a normal Go program,
    though, each goroutine will be waiting for network input.

    We all know it is a bad idea for a (C, Java) program to create thousands of
    OS threads (e.g. one thread per request). This is mainly because:

    1. Each thread uses fixed amount of memory for its stack
    (while in Go, the stack grows and shrinks as needed)

    2. With many threads, the overhead of context switching becomes significant
    (isn't this true in Go as well?)
    The context of a goroutine is much smaller and easier to change than
    the context of an OS thread. If nothing else an OS thread has a
    signal mask. At least with the gc toolchain, a goroutine really just
    has two values: a stack pointer and a pointer to the g structure.
    Switching to a new goroutine is just a matter of a few instructions.

    Also goroutines do cooperative scheduling and threads do not.

    Even if Go context-switched only on IO and channel access, doesn't it still
    have to save and restore the state of variables for each goroutine? How is
    it possible it is so efficient compared to an OS scheduler?
    It does not have to save and restore each variable. All goroutines
    see the same global variables, and local variables are always accessed
    via the stack pointer.

    Ian

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Rodrigo Kochenburger at Aug 12, 2013 at 6:55 am
    These two documents explain in more detail how the scheduler work:

    http://www.cs.columbia.edu/~aho/cs6998/reports/12-12-11_DeshpandeSponslerWeiss_GO.pdf
    https://docs.google.com/document/d/1TTj4T2JO42uD5ID9e89oa0sLKhJYD0Y_kqxDv3I3XMw/edit

    Cheers

    - RK

    On Sun, Aug 11, 2013 at 9:11 PM, Ian Lance Taylor wrote:
    On Sun, Aug 11, 2013 at 2:35 PM, wrote:

    How is it possible Go supports 100k+ concurrent goroutines on a single CPU,
    doing context switching between all of them without large overhead?
    If all 100k goroutines are actively doing things in parallel, the Go
    code will tend to have significant overhead. In a normal Go program,
    though, each goroutine will be waiting for network input.

    We all know it is a bad idea for a (C, Java) program to create thousands of
    OS threads (e.g. one thread per request). This is mainly because:

    1. Each thread uses fixed amount of memory for its stack
    (while in Go, the stack grows and shrinks as needed)

    2. With many threads, the overhead of context switching becomes
    significant
    (isn't this true in Go as well?)
    The context of a goroutine is much smaller and easier to change than
    the context of an OS thread. If nothing else an OS thread has a
    signal mask. At least with the gc toolchain, a goroutine really just
    has two values: a stack pointer and a pointer to the g structure.
    Switching to a new goroutine is just a matter of a few instructions.

    Also goroutines do cooperative scheduling and threads do not.

    Even if Go context-switched only on IO and channel access, doesn't it still
    have to save and restore the state of variables for each goroutine? How is
    it possible it is so efficient compared to an OS scheduler?
    It does not have to save and restore each variable. All goroutines
    see the same global variables, and local variables are always accessed
    via the stack pointer.

    Ian

    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Adrianschefflerz at Dec 11, 2015 at 1:16 pm

    On Monday, August 12, 2013 at 11:11:28 AM UTC+7, Ian Lance Taylor wrote:
    On Sun, Aug 11, 2013 at 2:35 PM, <martin....@gmail.com <javascript:>>
    wrote:
    How is it possible Go supports 100k+ concurrent goroutines on a single CPU,
    doing context switching between all of them without large overhead?
    If all 100k goroutines are actively doing things in parallel, the Go
    code will tend to have significant overhead. In a normal Go program,
    though, each goroutine will be waiting for network input.

    Isn't the net/http package essentially doing that by calling a go routine
    for every connection?
    Maybe I misunderstand it, but it seems like, that ServeHttp is called
    inside this goroutines. Which would mean, that a lot of computation happens
    inside of that routine, like dispatching the request based on the url and
    computing the response. Which would mean a lot of scheduling overhead in
    the webapplications build with this package.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Ian Lance Taylor at Dec 11, 2015 at 2:03 pm

    On Fri, Dec 11, 2015 at 12:12 AM, wrote:
    On Monday, August 12, 2013 at 11:11:28 AM UTC+7, Ian Lance Taylor wrote:
    On Sun, Aug 11, 2013 at 2:35 PM, wrote:

    How is it possible Go supports 100k+ concurrent goroutines on a single
    CPU,
    doing context switching between all of them without large overhead?
    If all 100k goroutines are actively doing things in parallel, the Go
    code will tend to have significant overhead. In a normal Go program,
    though, each goroutine will be waiting for network input.
    Isn't the net/http package essentially doing that by calling a go routine
    for every connection?
    Maybe I misunderstand it, but it seems like, that ServeHttp is called inside
    this goroutines. Which would mean, that a lot of computation happens inside
    of that routine, like dispatching the request based on the url and computing
    the response. Which would mean a lot of scheduling overhead in the
    webapplications build with this package.
    That CPU processing time is normally less than the time it takes a new
    network packet arrive, so it remains true that most goroutines are
    waiting for network input. That mighjt be false if the HTTP server is
    being hit at high speed over a very high speed network connection
    (that is, not the general Internet), but that is not the normal case.

    Ian

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Adrian Scheffler at Dec 11, 2015 at 3:08 pm

    If all 100k goroutines are actively doing things in parallel, the Go
    code will tend to have significant overhead. In a normal Go program,
    though, each goroutine will be waiting for network input.
    Isn't the net/http package essentially doing that by calling a go routine
    for every connection?
    Maybe I misunderstand it, but it seems like, that ServeHttp is called inside
    this goroutines. Which would mean, that a lot of computation happens inside
    of that routine, like dispatching the request based on the url and computing
    the response. Which would mean a lot of scheduling overhead in the
    webapplications build with this package.
    That CPU processing time is normally less than the time it takes a new
    network packet arrive, so it remains true that most goroutines are
    waiting for network input. That mighjt be false if the HTTP server is
    being hit at high speed over a very high speed network connection
    (that is, not the general Internet), but that is not the normal case.
    Okay, I just thought it would be most interesting to look at the behaviour
    of webservers, when they are under load. So if there is a big number of
    incoming requests, there will be a lot of blocked go routines, but also a
    lot of active ones. You said, that if a lot of go routines do a lot of
    active stuff in parallel, there will be a significant overhead. So I
    thought maybe it would be better to have a model with a fixed number of go
    routines, that have specialized professions, like accepting incoming
    requests, dispatching urls, processing requests, sending responses back. I
    would be interested, if its worth the effort to think more about this. Can
    you elaborate a bit more on this overhead under the condition of big
    numbers of goroutines doing stuff in parallel and how it compares to a
    small number of goroutines doing stuff in parallel? Is there any good
    material on the goroutine scheduler in its current state?

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Dave Cheney at Dec 11, 2015 at 3:14 pm

    Okay, I just thought it would be most interesting to look at the behaviour
    of webservers, when they are under load. So if there is a big number of
    incoming requests, there will be a lot of blocked go routines, but also a
    lot of active ones. You said, that if a lot of go routines do a lot of
    active stuff in parallel, there will be a significant overhead. So I
    thought maybe it would be better to have a model with a fixed number of go
    routines, that have specialized professions, like accepting incoming
    requests, dispatching urls, processing requests, sending responses back. I
    would be interested, if its worth the effort to think more about this. Can
    you elaborate a bit more on this overhead under the condition of big
    numbers of goroutines doing stuff in parallel and how it compares to a
    small number of goroutines doing stuff in parallel? Is there any good
    material on the goroutine scheduler in its current state?
    There are never more than GOMAXPROCS goroutines active in a Go program. All
    the other goroutines are either runnable; or waiting for some condition
    which will make them runnable.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Adrian Scheffler at Dec 11, 2015 at 3:46 pm

    On Friday, December 11, 2015 at 10:14:30 PM UTC+7, Dave Cheney wrote:
    Okay, I just thought it would be most interesting to look at the behaviour
    of webservers, when they are under load. So if there is a big number of
    incoming requests, there will be a lot of blocked go routines, but also a
    lot of active ones. You said, that if a lot of go routines do a lot of
    active stuff in parallel, there will be a significant overhead. So I
    thought maybe it would be better to have a model with a fixed number of go
    routines, that have specialized professions, like accepting incoming
    requests, dispatching urls, processing requests, sending responses back. I
    would be interested, if its worth the effort to think more about this. Can
    you elaborate a bit more on this overhead under the condition of big
    numbers of goroutines doing stuff in parallel and how it compares to a
    small number of goroutines doing stuff in parallel? Is there any good
    material on the goroutine scheduler in its current state?
    There are never more than GOMAXPROCS goroutines active in a Go program.
    All the other goroutines are either runnable; or waiting for some condition
    which will make them runnable.
    Okay; sorry I am new to all of this. I messed up the terminology. If we
    insist on a distinction between active and runnable goroutines I would
    interpret the sentence "If all 100k goroutines are actively doing things in
    parallel, the Go code will tend to have significant overhead." from Ian
    Lance Taylor as: "If there are a lot of runnable goroutines around(that
    have a CPU-bound behaviour), the Go code will tend to have significant
    overhead." Is that right? I assume runnable means not blocked for I/O and
    wants to be active. In the net/http package, the Serve() function accepts
    new connections and creates a new goroutine for each. So I would think,
    that under load, there would be a lot of runnable goroutines, because they
      cant finish, as quickly as new connections/requests come in. it would be
    nice to understand, why there is a significant overhead from a big number
    of goroutines that are not blocked and if this has to do with the
    scheduler(I just assumed that, because it seemed natural).

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Ian Lance Taylor at Dec 11, 2015 at 5:05 pm

    On Fri, Dec 11, 2015 at 7:46 AM, Adrian Scheffler wrote:
    On Friday, December 11, 2015 at 10:14:30 PM UTC+7, Dave Cheney wrote:

    Okay, I just thought it would be most interesting to look at the
    behaviour of webservers, when they are under load. So if there is a big
    number of incoming requests, there will be a lot of blocked go routines, but
    also a lot of active ones. You said, that if a lot of go routines do a lot
    of active stuff in parallel, there will be a significant overhead. So I
    thought maybe it would be better to have a model with a fixed number of go
    routines, that have specialized professions, like accepting incoming
    requests, dispatching urls, processing requests, sending responses back. I
    would be interested, if its worth the effort to think more about this. Can
    you elaborate a bit more on this overhead under the condition of big numbers
    of goroutines doing stuff in parallel and how it compares to a small number
    of goroutines doing stuff in parallel? Is there any good material on the
    goroutine scheduler in its current state?

    There are never more than GOMAXPROCS goroutines active in a Go program.
    All the other goroutines are either runnable; or waiting for some condition
    which will make them runnable.
    Okay; sorry I am new to all of this. I messed up the terminology. If we
    insist on a distinction between active and runnable goroutines I would
    interpret the sentence "If all 100k goroutines are actively doing things in
    parallel, the Go code will tend to have significant overhead." from Ian
    Lance Taylor as: "If there are a lot of runnable goroutines around(that have
    a CPU-bound behaviour), the Go code will tend to have significant overhead."
    Is that right? I assume runnable means not blocked for I/O and wants to be
    active.
    That is more or less right, though it depends on what you mean by
    significant. If you have 100k CPU-bound goroutines, there will be
    scheduling overhead.

    In the net/http package, the Serve() function accepts new
    connections and creates a new goroutine for each. So I would think, that
    under load, there would be a lot of runnable goroutines, because they cant
    finish, as quickly as new connections/requests come in. it would be nice to
    understand, why there is a significant overhead from a big number of
    goroutines that are not blocked and if this has to do with the scheduler(I
    just assumed that, because it seemed natural).
    If your server gets more connections than it can process, then it is
    overloaded. That is true no matter how you partition your program.

    From your earlier message, above, I think you are suggesting that it
    would be better to have a fixed number of goroutines handling
    different parts of each request. It would not be better. You would
    have the same amount of work, and therefore the same scheduling
    overhead, as if you had a separate goroutine per request.

    That is, when I talked earlier about "actively doing things in
    parallel," I didn't mean that the scheduling overhead increases based
    on the number of goroutines. I meant that the scheduling overhead
    increases based on the amount of work there is to do.

    Ian

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Luna Duclos at Dec 11, 2015 at 5:14 pm
    Switching goroutines, inlike switching threads, does not incur a huge
    overhead. It's fairly cheap. I'm uncertain where the "significant overhead"
    would be coming from under high load as a result.
    On Fri, Dec 11, 2015 at 4:46 PM, Adrian Scheffler wrote:


    On Friday, December 11, 2015 at 10:14:30 PM UTC+7, Dave Cheney wrote:

    Okay, I just thought it would be most interesting to look at the
    behaviour of webservers, when they are under load. So if there is a big
    number of incoming requests, there will be a lot of blocked go routines,
    but also a lot of active ones. You said, that if a lot of go routines do a
    lot of active stuff in parallel, there will be a significant overhead. So I
    thought maybe it would be better to have a model with a fixed number of go
    routines, that have specialized professions, like accepting incoming
    requests, dispatching urls, processing requests, sending responses back. I
    would be interested, if its worth the effort to think more about this. Can
    you elaborate a bit more on this overhead under the condition of big
    numbers of goroutines doing stuff in parallel and how it compares to a
    small number of goroutines doing stuff in parallel? Is there any good
    material on the goroutine scheduler in its current state?
    There are never more than GOMAXPROCS goroutines active in a Go program.
    All the other goroutines are either runnable; or waiting for some condition
    which will make them runnable.
    Okay; sorry I am new to all of this. I messed up the terminology. If we
    insist on a distinction between active and runnable goroutines I would
    interpret the sentence "If all 100k goroutines are actively doing things in
    parallel, the Go code will tend to have significant overhead." from Ian
    Lance Taylor as: "If there are a lot of runnable goroutines around(that
    have a CPU-bound behaviour), the Go code will tend to have significant
    overhead." Is that right? I assume runnable means not blocked for I/O and
    wants to be active. In the net/http package, the Serve() function accepts
    new connections and creates a new goroutine for each. So I would think,
    that under load, there would be a lot of runnable goroutines, because they
    cant finish, as quickly as new connections/requests come in. it would be
    nice to understand, why there is a significant overhead from a big number
    of goroutines that are not blocked and if this has to do with the
    scheduler(I just assumed that, because it seemed natural).

    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Konstantin Khomoutov at Dec 11, 2015 at 5:43 pm

    On Fri, 11 Dec 2015 07:46:31 -0800 (PST) Adrian Scheffler wrote: [...]
    So I would think, that under load, there
    would be a lot of runnable goroutines, because they cant finish, as
    quickly as new connections/requests come in. it would be nice to
    understand, why there is a significant overhead from a big number of
    goroutines that are not blocked and if this has to do with the
    scheduler(I just assumed that, because it seemed natural).
    The problem is that you are trying to solicit abstract discussion about
    pretty abstract things. The first (some beleive the sole) rule of
    optimization is that you never know where exactly it will be required
    until you test your program in production conditions. From what I
    observe lurking in this list for pretty some time, people in high-load
    segment tend to experience problems related to garbage collection, and
    not with some imaginary scheduling overhead you're trying to discuss.
    And that with servers experiencing several thouthands of requests per
    second.

    Go runtime comes with a quite strong support for profiling of various
    kinds of resources. So if you're really interested, search the list
    for those profiling techniques, wrote a sample server, put a load on it
    using `wrk' or another similar tool and study the profiling data.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Dave Cheney at Dec 11, 2015 at 10:23 pm
    You can investigate this yourself, write a small hello world style Go
    program, run it with the env var GODEBUG=gctrace=1000 and hit it with some
    kind of load testing software. The runtime will output a summary of the run
    queue once per second. Dmitry has a good blog post describing this output
    in more
    detail, https://software.intel.com/en-us/blogs/2014/05/10/debugging-performance-issues-in-go-programs.

    Thanks

    Dave
    On Saturday, 12 December 2015 02:46:31 UTC+11, Adrian Scheffler wrote:


    On Friday, December 11, 2015 at 10:14:30 PM UTC+7, Dave Cheney wrote:

    Okay, I just thought it would be most interesting to look at the
    behaviour of webservers, when they are under load. So if there is a big
    number of incoming requests, there will be a lot of blocked go routines,
    but also a lot of active ones. You said, that if a lot of go routines do a
    lot of active stuff in parallel, there will be a significant overhead. So I
    thought maybe it would be better to have a model with a fixed number of go
    routines, that have specialized professions, like accepting incoming
    requests, dispatching urls, processing requests, sending responses back. I
    would be interested, if its worth the effort to think more about this. Can
    you elaborate a bit more on this overhead under the condition of big
    numbers of goroutines doing stuff in parallel and how it compares to a
    small number of goroutines doing stuff in parallel? Is there any good
    material on the goroutine scheduler in its current state?
    There are never more than GOMAXPROCS goroutines active in a Go program.
    All the other goroutines are either runnable; or waiting for some condition
    which will make them runnable.
    Okay; sorry I am new to all of this. I messed up the terminology. If we
    insist on a distinction between active and runnable goroutines I would
    interpret the sentence "If all 100k goroutines are actively doing things in
    parallel, the Go code will tend to have significant overhead." from Ian
    Lance Taylor as: "If there are a lot of runnable goroutines around(that
    have a CPU-bound behaviour), the Go code will tend to have significant
    overhead." Is that right? I assume runnable means not blocked for I/O and
    wants to be active. In the net/http package, the Serve() function accepts
    new connections and creates a new goroutine for each. So I would think,
    that under load, there would be a lot of runnable goroutines, because they
    cant finish, as quickly as new connections/requests come in. it would be
    nice to understand, why there is a significant overhead from a big number
    of goroutines that are not blocked and if this has to do with the
    scheduler(I just assumed that, because it seemed natural).
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Adrian Scheffler at Dec 12, 2015 at 11:39 am

    On Saturday, December 12, 2015 at 5:23:13 AM UTC+7, Dave Cheney wrote:
    You can investigate this yourself, write a small hello world style Go
    program, run it with the env var GODEBUG=gctrace=1000 and hit it with some
    kind of load testing software. The runtime will output a summary of the run
    queue once per second. Dmitry has a good blog post describing this output
    in more detail,
    https://software.intel.com/en-us/blogs/2014/05/10/debugging-performance-issues-in-go-programs
    .
    Okay, so I am a few days into learning go and I am trying to understand the
    go routines. The two things I heard so far about them( in a redundant
    manner) is, that they are very cheap and that it is easy to reason about
    architectures that are build with them. I would like to build an opinion on
    different ways of composing these go routines, because I don't want to
    implement and measure all possibilities for every design decision. Even
    though I believe, go comes with good tools for this. :-)
    I didn't mean that the scheduling overhead increases based
    on the number of goroutines. I meant that the scheduling overhead
    increases based on the amount of work there is to do.
    Okay, but there is other overhead, than scheduling overhead attached to a
    goroutine, right? like the blocked OS-Thread, when the goroutine does a
    syscall. And there is a maximum number of threads (set by default to
    10000). So if there is a peak in requests, either by an increased amount of
    users, or because an adversary floods the server with requests, the server
    could crash pretty quickly. I just somehow have the suspicion, that it is
    not a good idea to create go routines in direct relation to the input of a
    program. Especially, when the input comes from the network, because it
    could lead to potential Denial Of Service vulnerabilities. This cant really
    be mitigated by a higher number of maxthreads, because eventually the
    operating system will crash. I would be glad to hear if this train of
    thought is right.
    During my digging I found this paper:
    http://www1.cs.columbia.edu/~aho/cs6998/reports/12-12-11_DeshpandeSponslerWeiss_GO.pdf but
    I couldn't find a discussion about it and if the considerations were
    accepted and implemented. If that is the case, the kind of architecture I
    thought of would be better, because the grouping of goroutines, proposed in
    this paper, works on the assumption that the goroutines communicate with
    each other over channels. The current net/http package has a lot of
    goroutines, that don't interact with each other, so this grouping yields no
    performance benefit. It would be nice, if someone could point me to the
    discussion.
    Thanks,
    Adrian

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Dave Cheney at Dec 12, 2015 at 11:52 am
    The number of concurrent connections per process is limited by the number of file descriptors available to the process. It is correct that some syscall consume an OS thread while they are in progress. If there is no free OS thread to replace the one that is blocked on a syscall a new one will be created to service goroutines, up to GOMAXPROCS. But the net package uses the available polling mechanisms provided by the operating system to coalesce all outstanding read and write operations onto a single blocking syscall.

    I gave a talk about this at oscon earlier in the year, maybe this blog post will give some useful background. http://dave.cheney.net/2015/08/08/performance-without-the-event-loop

    Thanks

    Dave

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Adrian Scheffler at Dec 12, 2015 at 12:16 pm

    On Saturday, December 12, 2015 at 6:52:37 PM UTC+7, Dave Cheney wrote:
    The number of concurrent connections per process is limited by the number
    of file descriptors available to the process. It is correct that some
    syscall consume an OS thread while they are in progress. If there is no
    free OS thread to replace the one that is blocked on a syscall a new one
    will be created to service goroutines, up to GOMAXPROCS. But the net
    package uses the available polling mechanisms provided by the operating
    system to coalesce all outstanding read and write operations onto a single
    blocking syscall.

    I gave a talk about this at oscon earlier in the year, maybe this blog
    post will give some useful background.
    http://dave.cheney.net/2015/08/08/performance-without-the-event-loop

    Thanks

    Dave
    I know about the netpoller. To be more specific: I mean the OS-thread
    overhead, by a goroutine, that is blocked on a syscall, that does not get
    handled by the netpoller. For example a John Doe like me might have a
    website with a page, that does some kind of syscall, that is not
    intercepted by the netpoller. I don't think this is so far fetched. So if
    10000 connections for this page come in in a short amount of time, the
    server should crash, because of the thread limit. Would be cool, if someone
    could falsify that concern.

    If there is no free OS thread to replace the one that is blocked on a
    syscall a new one will be created to service goroutines, up to GOMAXPROCS.
    This is very confusing to me. I thought GOMAXPROCS refers to the maximum
    number of parallel executing threads, which is now set by default to the
    number of CPUs instead of 1.


    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Dave Cheney at Dec 12, 2015 at 12:24 pm
    GOMAXPROCS is the maximum number of operating system threads assigned to service goroutines. There are other threads in use in a go program, blocking syscalls are one, cgo calls are another.

    The scenario you describe with a poorly written program consuming lots of a OS threads blocked on syscalls is possible, but rarely happens because those would be file system operations which would quickly overwhelm the io subsystem of the host. Also, you'll still run out of file descriptors. Yes, it can happen, possibly easily, but it is also easily debugged.

    There is an open issue to use the net packages polled code for local systems operations, is would reduce the number of blocking syscalls.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Adrian Scheffler at Dec 13, 2015 at 7:56 am

    On Saturday, December 12, 2015 at 7:24:22 PM UTC+7, Dave Cheney wrote:
    GOMAXPROCS is the maximum number of operating system threads assigned to
    service goroutines. There are other threads in use in a go program,
    blocking syscalls are one, cgo calls are another.

    The scenario you describe with a poorly written program consuming lots of
    a OS threads blocked on syscalls is possible, but rarely happens because
    those would be file system operations which would quickly overwhelm the io
    subsystem of the host. Also, you'll still run out of file descriptors.
    Yes, it can happen, possibly easily, but it is also easily debugged.

    There is an open issue to use the net packages polled code for local
    systems operations, is would reduce the number of blocking syscalls.
    I am sorry. I did not look closely enough at the sentence you wrote. Makes
    total sense to me now. I don't know ... if you look at the canonical
    "Writing Webapplications"-Tutorial https://golang.org/doc/articles/wiki/ it
    seems like there will be some filesystem I/O in these http request serving
    goroutines. So in theory, if there is a traffic spike, the webserver will
    crash. But maybe it has no relevance in practice, because the threads
    unblock faster, than new requests(connections) come in. So the traffic
    spike would need to be so big, that the server would overload anyway. I
    will just go and play around with it. Thanks for your help!
    Peace Out! :-)

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Jesper Louis Andersen at Dec 13, 2015 at 2:38 pm

    On Sat, Dec 12, 2015 at 1:16 PM, Adrian Scheffler wrote:

    I know about the netpoller. To be more specific: I mean the OS-thread
    overhead, by a goroutine, that is blocked on a syscall, that does not get
    handled by the netpoller. For example a John Doe like me might have a
    website with a page, that does some kind of syscall, that is not
    intercepted by the netpoller. I don't think this is so far fetched. So if
    10000 connections for this page come in in a short amount of time, the
    server should crash, because of the thread limit. Would be cool, if someone
    could falsify that concern.
    Any system will have a capacity limit somewhere. It is, for the most part,
    not 10k syscalls since database connections pools are far more likely to
    queue you long before you run out of syscalls. In a real system, once you
    know your engineered capacity limit, you can just start rejecting work as
    you approach it.

    An observation from the Erlang community is that the best design choice is
    often to limit at the border of your application as early as possible and
    then don't worry about internal resource usage down the line. That is most
    internal queues/channels are effectively unbounded in size, whereas you
    bound incoming requests, perhaps also with a short queue to smoothen out
    spikes and make the processing a bit more stable and less spiky. Erlang
    systems also picked a different solution in which you run a pool of threads
    to do the blocking I/O calls. The default is 5 threads, and I have never
    ever seen more than 300 in an application. We also have monitors in apps
    when you hit the limit and it usually happens when you end up being
    contended on disk, at which point you've lost in the speed department
    anyway...


    --
    J.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Adrian Scheffler at Dec 13, 2015 at 7:55 pm

    My critique is the paper doesn't experiment with this proposal. You have
    to demonstrate the recording cost is outweighed by faster execution, and
    the Go code exists as Open Source, which leaves you with no excuse. Barring
    a concrete design and algorithm, you can only speculate to its efficiency,
    and you have no chance at pointing out pathological behavior in the
    implementation. I think the Occam-PI design is nice and it could work as a
    good batcher, but I'd like it demonstrated on some real-world examples
    first.
    But nevertheless, someone should think about this. I looked at proc.go
    which seems to implement the scheduler and it links to a design document
    dated May 2012
    (
    https://docs.google.com/document/d/1TTj4T2JO42uD5ID9e89oa0sLKhJYD0Y_kqxDv3I3XMw/edit#
    ) So either their comments are not updated, or there was no progress on
    this in the last 3 years.

    An observation from the Erlang community is that the best design choice is
    often to limit at the border of your application as early as possible and
    then don't worry about internal resource usage down the line. That is most
    internal queues/channels are effectively unbounded in size, whereas you
    bound incoming requests, perhaps also with a short queue to smoothen out
    spikes and make the processing a bit more stable and less spiky. Erlang
    systems also picked a different solution in which you run a pool of threads
    to do the blocking I/O calls. The default is 5 threads, and I have never
    ever seen more than 300 in an application. We also have monitors in apps
    when you hit the limit and it usually happens when you end up being
    contended on disk, at which point you've lost in the speed department
    anyway...
    I like this kind of thinking. Thats why I got interested in go. Maybe I
    should look into Erlang instead. What would you recommend? The net/http
    package seems to care about other stuff than that. And its part of the
    standard library. So it reflects the culture of go somehow.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Jesper Louis Andersen at Dec 13, 2015 at 2:25 pm

    On Sat, Dec 12, 2015 at 12:39 PM, Adrian Scheffler wrote:

    During my digging I found this paper:
    http://www1.cs.columbia.edu/~aho/cs6998/reports/12-12-11_DeshpandeSponslerWeiss_GO.pdf but
    I couldn't find a discussion about it and if the considerations were
    accepted and implemented.
    The proposal is, rather simplified, to record which goroutines interact on
    which channels and then keep those goroutines on the same CPU core. The
    observation stems from Occam-Pi and an implementation by Ritson in which
    one uses the notion of counting "process was blocked on contention" to
    dynamically create batches of processes which are likely to be blocked on
    the same contention. The hope is to speed up computation speed by
    localization of data.

    My critique is the paper doesn't experiment with this proposal. You have to
    demonstrate the recording cost is outweighed by faster execution, and the
    Go code exists as Open Source, which leaves you with no excuse. Barring a
    concrete design and algorithm, you can only speculate to its efficiency,
    and you have no chance at pointing out pathological behavior in the
    implementation. I think the Occam-PI design is nice and it could work as a
    good batcher, but I'd like it demonstrated on some real-world examples
    first.




    --
    J.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Tamás Gulácsi at Dec 11, 2015 at 2:05 pm
    Those computations must be done somewhere. Goroutine switching is very cheap, and is done at specific points in code (esp. at function calls), so won't cut your important loop into pieces.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Dmitry Vyukov at Aug 12, 2013 at 8:23 am

    On Mon, Aug 12, 2013 at 1:35 AM, wrote:

    How is it possible Go supports 100k+ concurrent goroutines on a single
    CPU, doing context switching between all of them without large overhead?

    I'm trying to understand how the Go scheduler works and in particular how
    it compares to a traditional OS scheduler.

    We all know it is a bad idea for a (C, Java) program to create thousands
    of OS threads (e.g. one thread per request). This is mainly because:

    1. Each thread uses fixed amount of memory for its stack
    (while in Go, the stack grows and shrinks as needed)

    2. With many threads, the overhead of context switching becomes significant
    (isn't this true in Go as well?)

    Go scheduler just as thread schedulers in modern OSes is O(1). Overhead of
    making a goroutine runnable and scheduling it for execution is not
    dependent on number of goroutines.



    Even if Go context-switched only on IO and channel access, doesn't it still
    have to save and restore the state of variables for each goroutine? How is
    it possible it is so efficient compared to an OS scheduler?

    There's a design doc<https://docs.google.com/document/d/1TTj4T2JO42uD5ID9e89oa0sLKhJYD0Y_kqxDv3I3XMw/edit>for the Go scheduler
    (and an earlier discussion<https://groups.google.com/forum/#!topic/golang-nuts/tGvuvc8Z2IQ>on the topic).
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Martin Koníček at Aug 15, 2013 at 9:43 pm
    Thank you very much Ian, Rodrigo and Dmitry for helping me understand how
    goroutines work.

    Let me just a ask a clarifying followup question:

    When writing a server in Go, is it OK to spawn one goroutine per connection
    and do some blocking I/O inside each goroutine?

    I've seen a presentation by a member of the Go team which said that "Go
    APIs are blocking, and blocking in Go is fine."

    Is this really the case?


    Ian, it's interesting to know that local variables are always accessed via
    the stack pointer - thank you! I assumed Go would use registers for local
    variables and would have to save and restore all registers when switching a
    goroutine.

    I can see how the smaller context helps to keep goroutines more
    "lightweight" than OS threads. I can also imagine that doing scheduling in
    user mode helps.
    Still don't see why cooperative scheduling helps as opposed to say
    rescheduling every 10ms.

    Thanks for your patience!

    Best regards,
    Martin

    On Mon, Aug 12, 2013 at 10:23 AM, Dmitry Vyukov wrote:
    On Mon, Aug 12, 2013 at 1:35 AM, wrote:

    How is it possible Go supports 100k+ concurrent goroutines on a single
    CPU, doing context switching between all of them without large overhead?

    I'm trying to understand how the Go scheduler works and in particular how
    it compares to a traditional OS scheduler.

    We all know it is a bad idea for a (C, Java) program to create thousands
    of OS threads (e.g. one thread per request). This is mainly because:

    1. Each thread uses fixed amount of memory for its stack
    (while in Go, the stack grows and shrinks as needed)

    2. With many threads, the overhead of context switching becomes
    significant
    (isn't this true in Go as well?)

    Go scheduler just as thread schedulers in modern OSes is O(1). Overhead of
    making a goroutine runnable and scheduling it for execution is not
    dependent on number of goroutines.



    Even if Go context-switched only on IO and channel access, doesn't it
    still have to save and restore the state of variables for each goroutine?
    How is it possible it is so efficient compared to an OS scheduler?

    There's a design doc<https://docs.google.com/document/d/1TTj4T2JO42uD5ID9e89oa0sLKhJYD0Y_kqxDv3I3XMw/edit>for the Go scheduler
    (and an earlier discussion<https://groups.google.com/forum/#!topic/golang-nuts/tGvuvc8Z2IQ>on the topic).
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Martin Koníček at Aug 15, 2013 at 10:00 pm
    I think I found a satisfactory
    answer<http://stackoverflow.com/questions/1714136/can-you-detect-how-many-threads-a-given-number-of-goroutines-will-create?rq=1>
    :

    The following operations do not cause the goroutine to use a thread when
    they block:

        - channel operations
        - network operations
        - sleeping

    This means, for example, that if you have many goroutines that open
    /dev/ttyxx and block on read, you'll be using a thread for each one.


    Thanks again! Note that Go is my first language with non-(thread)-blocking
    APIs that don't rely on callbacks (apart from C#, which uses a compiler
    trick to hide the callbacks from the user).


    On Thu, Aug 15, 2013 at 11:42 PM, Martin Koníček
    wrote:
    Thank you very much Ian, Rodrigo and Dmitry for helping me understand how
    goroutines work.

    Let me just a ask a clarifying followup question:

    When writing a server in Go, is it OK to spawn one goroutine per
    connection and do some blocking I/O inside each goroutine?

    I've seen a presentation by a member of the Go team which said that "Go
    APIs are blocking, and blocking in Go is fine."

    Is this really the case?


    Ian, it's interesting to know that local variables are always accessed via
    the stack pointer - thank you! I assumed Go would use registers for local
    variables and would have to save and restore all registers when switching a
    goroutine.

    I can see how the smaller context helps to keep goroutines more
    "lightweight" than OS threads. I can also imagine that doing scheduling in
    user mode helps.
    Still don't see why cooperative scheduling helps as opposed to say
    rescheduling every 10ms.

    Thanks for your patience!

    Best regards,
    Martin

    On Mon, Aug 12, 2013 at 10:23 AM, Dmitry Vyukov wrote:
    On Mon, Aug 12, 2013 at 1:35 AM, wrote:

    How is it possible Go supports 100k+ concurrent goroutines on a single
    CPU, doing context switching between all of them without large overhead?


    I'm trying to understand how the Go scheduler works and in particular
    how it compares to a traditional OS scheduler.

    We all know it is a bad idea for a (C, Java) program to create thousands
    of OS threads (e.g. one thread per request). This is mainly because:

    1. Each thread uses fixed amount of memory for its stack
    (while in Go, the stack grows and shrinks as needed)

    2. With many threads, the overhead of context switching becomes
    significant
    (isn't this true in Go as well?)

    Go scheduler just as thread schedulers in modern OSes is O(1). Overhead
    of making a goroutine runnable and scheduling it for execution is not
    dependent on number of goroutines.



    Even if Go context-switched only on IO and channel access, doesn't it
    still have to save and restore the state of variables for each goroutine?
    How is it possible it is so efficient compared to an OS scheduler?

    There's a design doc<https://docs.google.com/document/d/1TTj4T2JO42uD5ID9e89oa0sLKhJYD0Y_kqxDv3I3XMw/edit>for the Go scheduler
    (and an earlier discussion<https://groups.google.com/forum/#!topic/golang-nuts/tGvuvc8Z2IQ>on the topic).
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Dmitry Vyukov at Aug 15, 2013 at 10:09 pm

    On Fri, Aug 16, 2013 at 2:00 AM, Martin Koníček wrote:

    I think I found a satisfactory answer<http://stackoverflow.com/questions/1714136/can-you-detect-how-many-threads-a-given-number-of-goroutines-will-create?rq=1>
    :

    The following operations do not cause the goroutine to use a thread when
    they block:

    - channel operations
    - network operations
    - sleeping

    This also includes blocking on all primitives in sync package.
    This means, for example, that if you have many goroutines that open
    /dev/ttyxx and block on read, you'll be using a thread for each one.
    Yes, that's correct.
    There is no reason why we can not do "non-blocking under the hoods"
    handling of files/devices/etc. It's just not done yet. On windows it's easy
    with IOCP, nonblocking network and file operations are the same. I am not
    sure what is the state of AIO on unixes today.





    Thanks again! Note that Go is my first language with non-(thread)-blocking
    APIs that don't rely on callbacks (apart from C#, which uses a compiler
    trick to hide the callbacks from the user).


    On Thu, Aug 15, 2013 at 11:42 PM, Martin Koníček <martin.konicek@gmail.com
    wrote:
    Thank you very much Ian, Rodrigo and Dmitry for helping me understand how
    goroutines work.

    Let me just a ask a clarifying followup question:

    When writing a server in Go, is it OK to spawn one goroutine per
    connection and do some blocking I/O inside each goroutine?

    I've seen a presentation by a member of the Go team which said that "Go
    APIs are blocking, and blocking in Go is fine."

    Is this really the case?


    Ian, it's interesting to know that local variables are always accessed via
    the stack pointer - thank you! I assumed Go would use registers for local
    variables and would have to save and restore all registers when switching a
    goroutine.

    I can see how the smaller context helps to keep goroutines more
    "lightweight" than OS threads. I can also imagine that doing scheduling in
    user mode helps.
    Still don't see why cooperative scheduling helps as opposed to say
    rescheduling every 10ms.

    Thanks for your patience!

    Best regards,
    Martin

    On Mon, Aug 12, 2013 at 10:23 AM, Dmitry Vyukov wrote:
    On Mon, Aug 12, 2013 at 1:35 AM, wrote:

    How is it possible Go supports 100k+ concurrent goroutines on a single
    CPU, doing context switching between all of them without large overhead?


    I'm trying to understand how the Go scheduler works and in particular
    how it compares to a traditional OS scheduler.

    We all know it is a bad idea for a (C, Java) program to create
    thousands of OS threads (e.g. one thread per request). This is mainly
    because:

    1. Each thread uses fixed amount of memory for its stack
    (while in Go, the stack grows and shrinks as needed)

    2. With many threads, the overhead of context switching becomes
    significant
    (isn't this true in Go as well?)

    Go scheduler just as thread schedulers in modern OSes is O(1). Overhead
    of making a goroutine runnable and scheduling it for execution is not
    dependent on number of goroutines.



    Even if Go context-switched only on IO and channel access, doesn't it
    still have to save and restore the state of variables for each goroutine?
    How is it possible it is so efficient compared to an OS scheduler?

    There's a design doc<https://docs.google.com/document/d/1TTj4T2JO42uD5ID9e89oa0sLKhJYD0Y_kqxDv3I3XMw/edit>for the Go scheduler
    (and an earlier discussion<https://groups.google.com/forum/#!topic/golang-nuts/tGvuvc8Z2IQ>on the topic).
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Dmitry Vyukov at Aug 15, 2013 at 10:04 pm

    On Fri, Aug 16, 2013 at 1:42 AM, Martin Koníček wrote:

    Thank you very much Ian, Rodrigo and Dmitry for helping me understand how
    goroutines work.

    Let me just a ask a clarifying followup question:

    When writing a server in Go, is it OK to spawn one goroutine per
    connection and do some blocking I/O inside each goroutine?

    Yes, this is the way to go.
    On "physical" (OS) level such program behaves the same way as an
    event-driven C program that uses epoll to handle large number of
    connections on few threads.



    I've seen a presentation by a member of the Go team which said that "Go
    APIs are blocking, and blocking in Go is fine."

    Is this really the case?


    Ian, it's interesting to know that local variables are always accessed via
    the stack pointer - thank you! I assumed Go would use registers for local
    variables and would have to save and restore all registers when switching a
    goroutine.

    I can see how the smaller context helps to keep goroutines more
    "lightweight" than OS threads. I can also imagine that doing scheduling in
    user mode helps.
    Still don't see why cooperative scheduling helps as opposed to say
    rescheduling every 10ms.

    If you do asynchronous preemption you need save/restore ALL registers, that
    is, 16 general purpose registers, PC, SP, segment registers, 16 XMM
    registers, FP coprocessor state, X AVX registers, all MSRs and so on.
    If you cooperatively preempt at known points, then you need to safe/restore
    only what is known to be alive at these points. In Go that's just 3
    registers -- PC, SP and DX.




    Thanks for your patience!

    Best regards,
    Martin

    On Mon, Aug 12, 2013 at 10:23 AM, Dmitry Vyukov wrote:
    On Mon, Aug 12, 2013 at 1:35 AM, wrote:

    How is it possible Go supports 100k+ concurrent goroutines on a single
    CPU, doing context switching between all of them without large overhead?


    I'm trying to understand how the Go scheduler works and in particular
    how it compares to a traditional OS scheduler.

    We all know it is a bad idea for a (C, Java) program to create thousands
    of OS threads (e.g. one thread per request). This is mainly because:

    1. Each thread uses fixed amount of memory for its stack
    (while in Go, the stack grows and shrinks as needed)

    2. With many threads, the overhead of context switching becomes
    significant
    (isn't this true in Go as well?)

    Go scheduler just as thread schedulers in modern OSes is O(1). Overhead
    of making a goroutine runnable and scheduling it for execution is not
    dependent on number of goroutines.



    Even if Go context-switched only on IO and channel access, doesn't it
    still have to save and restore the state of variables for each goroutine?
    How is it possible it is so efficient compared to an OS scheduler?

    There's a design doc<https://docs.google.com/document/d/1TTj4T2JO42uD5ID9e89oa0sLKhJYD0Y_kqxDv3I3XMw/edit>for the Go scheduler
    (and an earlier discussion<https://groups.google.com/forum/#!topic/golang-nuts/tGvuvc8Z2IQ>on the topic).
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Martin Koníček at Aug 15, 2013 at 10:05 pm
    Wow, thank you for the in-depth explanation Dmitry!

    On Fri, Aug 16, 2013 at 12:03 AM, Dmitry Vyukov wrote:
    On Fri, Aug 16, 2013 at 1:42 AM, Martin Koníček wrote:

    Thank you very much Ian, Rodrigo and Dmitry for helping me understand how
    goroutines work.

    Let me just a ask a clarifying followup question:

    When writing a server in Go, is it OK to spawn one goroutine per
    connection and do some blocking I/O inside each goroutine?

    Yes, this is the way to go.
    On "physical" (OS) level such program behaves the same way as an
    event-driven C program that uses epoll to handle large number of
    connections on few threads.



    I've seen a presentation by a member of the Go team which said that "Go
    APIs are blocking, and blocking in Go is fine."

    Is this really the case?


    Ian, it's interesting to know that local variables are always accessed via
    the stack pointer - thank you! I assumed Go would use registers for local
    variables and would have to save and restore all registers when switching a
    goroutine.

    I can see how the smaller context helps to keep goroutines more
    "lightweight" than OS threads. I can also imagine that doing scheduling in
    user mode helps.
    Still don't see why cooperative scheduling helps as opposed to say
    rescheduling every 10ms.

    If you do asynchronous preemption you need save/restore ALL registers,
    that is, 16 general purpose registers, PC, SP, segment registers, 16 XMM
    registers, FP coprocessor state, X AVX registers, all MSRs and so on.
    If you cooperatively preempt at known points, then you need to
    safe/restore only what is known to be alive at these points. In Go that's
    just 3 registers -- PC, SP and DX.




    Thanks for your patience!

    Best regards,
    Martin

    On Mon, Aug 12, 2013 at 10:23 AM, Dmitry Vyukov wrote:
    On Mon, Aug 12, 2013 at 1:35 AM, wrote:

    How is it possible Go supports 100k+ concurrent goroutines on a single
    CPU, doing context switching between all of them without large overhead?


    I'm trying to understand how the Go scheduler works and in particular
    how it compares to a traditional OS scheduler.

    We all know it is a bad idea for a (C, Java) program to create
    thousands of OS threads (e.g. one thread per request). This is mainly
    because:

    1. Each thread uses fixed amount of memory for its stack
    (while in Go, the stack grows and shrinks as needed)

    2. With many threads, the overhead of context switching becomes
    significant
    (isn't this true in Go as well?)

    Go scheduler just as thread schedulers in modern OSes is O(1). Overhead
    of making a goroutine runnable and scheduling it for execution is not
    dependent on number of goroutines.



    Even if Go context-switched only on IO and channel access, doesn't it
    still have to save and restore the state of variables for each goroutine?
    How is it possible it is so efficient compared to an OS scheduler?

    There's a design doc<https://docs.google.com/document/d/1TTj4T2JO42uD5ID9e89oa0sLKhJYD0Y_kqxDv3I3XMw/edit>for the Go scheduler
    (and an earlier discussion<https://groups.google.com/forum/#!topic/golang-nuts/tGvuvc8Z2IQ>on the topic).
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Alexei Sholik at Aug 15, 2013 at 11:18 pm
    If you cooperatively preempt at known points, then you need to
    safe/restore only what is known to be alive at these points. In Go that's
    just 3 registers -- PC, SP and DX.

    You can still have values stored in other registers. Does this mean that
    the compiler is aware of those scheduling points so that it emit spilling
    code before, say, reading from a channel?

    On Fri, Aug 16, 2013 at 1:03 AM, Dmitry Vyukov wrote:
    On Fri, Aug 16, 2013 at 1:42 AM, Martin Koníček wrote:

    Thank you very much Ian, Rodrigo and Dmitry for helping me understand how
    goroutines work.

    Let me just a ask a clarifying followup question:

    When writing a server in Go, is it OK to spawn one goroutine per
    connection and do some blocking I/O inside each goroutine?

    Yes, this is the way to go.
    On "physical" (OS) level such program behaves the same way as an
    event-driven C program that uses epoll to handle large number of
    connections on few threads.



    I've seen a presentation by a member of the Go team which said that "Go
    APIs are blocking, and blocking in Go is fine."

    Is this really the case?


    Ian, it's interesting to know that local variables are always accessed via
    the stack pointer - thank you! I assumed Go would use registers for local
    variables and would have to save and restore all registers when switching a
    goroutine.

    I can see how the smaller context helps to keep goroutines more
    "lightweight" than OS threads. I can also imagine that doing scheduling in
    user mode helps.
    Still don't see why cooperative scheduling helps as opposed to say
    rescheduling every 10ms.

    If you do asynchronous preemption you need save/restore ALL registers,
    that is, 16 general purpose registers, PC, SP, segment registers, 16 XMM
    registers, FP coprocessor state, X AVX registers, all MSRs and so on.
    If you cooperatively preempt at known points, then you need to
    safe/restore only what is known to be alive at these points. In Go that's
    just 3 registers -- PC, SP and DX.




    Thanks for your patience!

    Best regards,
    Martin

    On Mon, Aug 12, 2013 at 10:23 AM, Dmitry Vyukov wrote:
    On Mon, Aug 12, 2013 at 1:35 AM, wrote:

    How is it possible Go supports 100k+ concurrent goroutines on a single
    CPU, doing context switching between all of them without large overhead?


    I'm trying to understand how the Go scheduler works and in particular
    how it compares to a traditional OS scheduler.

    We all know it is a bad idea for a (C, Java) program to create
    thousands of OS threads (e.g. one thread per request). This is mainly
    because:

    1. Each thread uses fixed amount of memory for its stack
    (while in Go, the stack grows and shrinks as needed)

    2. With many threads, the overhead of context switching becomes
    significant
    (isn't this true in Go as well?)

    Go scheduler just as thread schedulers in modern OSes is O(1). Overhead
    of making a goroutine runnable and scheduling it for execution is not
    dependent on number of goroutines.



    Even if Go context-switched only on IO and channel access, doesn't it
    still have to save and restore the state of variables for each goroutine?
    How is it possible it is so efficient compared to an OS scheduler?

    There's a design doc<https://docs.google.com/document/d/1TTj4T2JO42uD5ID9e89oa0sLKhJYD0Y_kqxDv3I3XMw/edit>for the Go scheduler
    (and an earlier discussion<https://groups.google.com/forum/#!topic/golang-nuts/tGvuvc8Z2IQ>on the topic).
    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.


    --
    Best regards
    Alexei Sholik

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Dmitry Vyukov at Aug 16, 2013 at 8:56 am

    On Fri, Aug 16, 2013 at 3:18 AM, Alexei Sholik wrote:

    If you cooperatively preempt at known points, then you need to
    safe/restore only what is known to be alive at these points. In Go that's
    just 3 registers -- PC, SP and DX.

    You can still have values stored in other registers. Does this mean that
    the compiler is aware of those scheduling points so that it emit spilling
    code before, say, reading from a channel?

    Yes, the calling convention is that the callee can use all registers, so
    the spilling needs to be done anyway. So if we preempt at function entry,
    we get only-3-live-registers for free.



    On Fri, Aug 16, 2013 at 1:03 AM, Dmitry Vyukov wrote:

    On Fri, Aug 16, 2013 at 1:42 AM, Martin Koníček <martin.konicek@gmail.com
    wrote:
    Thank you very much Ian, Rodrigo and Dmitry for helping me understand
    how goroutines work.

    Let me just a ask a clarifying followup question:

    When writing a server in Go, is it OK to spawn one goroutine per
    connection and do some blocking I/O inside each goroutine?

    Yes, this is the way to go.
    On "physical" (OS) level such program behaves the same way as an
    event-driven C program that uses epoll to handle large number of
    connections on few threads.



    I've seen a presentation by a member of the Go team which said that "Go
    APIs are blocking, and blocking in Go is fine."

    Is this really the case?


    Ian, it's interesting to know that local variables are always accessed via
    the stack pointer - thank you! I assumed Go would use registers for local
    variables and would have to save and restore all registers when switching a
    goroutine.

    I can see how the smaller context helps to keep goroutines more
    "lightweight" than OS threads. I can also imagine that doing scheduling in
    user mode helps.
    Still don't see why cooperative scheduling helps as opposed to say
    rescheduling every 10ms.

    If you do asynchronous preemption you need save/restore ALL registers,
    that is, 16 general purpose registers, PC, SP, segment registers, 16 XMM
    registers, FP coprocessor state, X AVX registers, all MSRs and so on.
    If you cooperatively preempt at known points, then you need to
    safe/restore only what is known to be alive at these points. In Go that's
    just 3 registers -- PC, SP and DX.




    Thanks for your patience!

    Best regards,
    Martin

    On Mon, Aug 12, 2013 at 10:23 AM, Dmitry Vyukov wrote:
    On Mon, Aug 12, 2013 at 1:35 AM, wrote:

    How is it possible Go supports 100k+ concurrent goroutines on a single
    CPU, doing context switching between all of them without large overhead?


    I'm trying to understand how the Go scheduler works and in particular
    how it compares to a traditional OS scheduler.

    We all know it is a bad idea for a (C, Java) program to create
    thousands of OS threads (e.g. one thread per request). This is mainly
    because:

    1. Each thread uses fixed amount of memory for its stack
    (while in Go, the stack grows and shrinks as needed)

    2. With many threads, the overhead of context switching becomes
    significant
    (isn't this true in Go as well?)

    Go scheduler just as thread schedulers in modern OSes is O(1). Overhead
    of making a goroutine runnable and scheduling it for execution is not
    dependent on number of goroutines.



    Even if Go context-switched only on IO and channel access, doesn't it
    still have to save and restore the state of variables for each goroutine?
    How is it possible it is so efficient compared to an OS scheduler?

    There's a design doc<https://docs.google.com/document/d/1TTj4T2JO42uD5ID9e89oa0sLKhJYD0Y_kqxDv3I3XMw/edit>for the Go scheduler
    (and an earlier discussion<https://groups.google.com/forum/#!topic/golang-nuts/tGvuvc8Z2IQ>on the topic).
    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.


    --
    Best regards
    Alexei Sholik
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Alexei Sholik at Aug 17, 2013 at 5:11 pm
    Thanks for the clarification, Dmitry. Is Go's calling convention documented
    anywhere?

    On Fri, Aug 16, 2013 at 11:56 AM, Dmitry Vyukov wrote:
    On Fri, Aug 16, 2013 at 3:18 AM, Alexei Sholik wrote:

    If you cooperatively preempt at known points, then you need to
    safe/restore only what is known to be alive at these points. In Go that's
    just 3 registers -- PC, SP and DX.

    You can still have values stored in other registers. Does this mean that
    the compiler is aware of those scheduling points so that it emit spilling
    code before, say, reading from a channel?

    Yes, the calling convention is that the callee can use all registers, so
    the spilling needs to be done anyway. So if we preempt at function entry,
    we get only-3-live-registers for free.



    On Fri, Aug 16, 2013 at 1:03 AM, Dmitry Vyukov wrote:

    On Fri, Aug 16, 2013 at 1:42 AM, Martin Koníček <
    martin.konicek@gmail.com> wrote:
    Thank you very much Ian, Rodrigo and Dmitry for helping me understand
    how goroutines work.

    Let me just a ask a clarifying followup question:

    When writing a server in Go, is it OK to spawn one goroutine per
    connection and do some blocking I/O inside each goroutine?

    Yes, this is the way to go.
    On "physical" (OS) level such program behaves the same way as an
    event-driven C program that uses epoll to handle large number of
    connections on few threads.



    I've seen a presentation by a member of the Go team which said that "Go
    APIs are blocking, and blocking in Go is fine."

    Is this really the case?


    Ian, it's interesting to know that local variables are always accessed via
    the stack pointer - thank you! I assumed Go would use registers for local
    variables and would have to save and restore all registers when switching a
    goroutine.

    I can see how the smaller context helps to keep goroutines more
    "lightweight" than OS threads. I can also imagine that doing scheduling in
    user mode helps.
    Still don't see why cooperative scheduling helps as opposed to say
    rescheduling every 10ms.

    If you do asynchronous preemption you need save/restore ALL registers,
    that is, 16 general purpose registers, PC, SP, segment registers, 16 XMM
    registers, FP coprocessor state, X AVX registers, all MSRs and so on.
    If you cooperatively preempt at known points, then you need to
    safe/restore only what is known to be alive at these points. In Go that's
    just 3 registers -- PC, SP and DX.




    Thanks for your patience!

    Best regards,
    Martin

    On Mon, Aug 12, 2013 at 10:23 AM, Dmitry Vyukov wrote:
    On Mon, Aug 12, 2013 at 1:35 AM, wrote:

    How is it possible Go supports 100k+ concurrent goroutines on a
    single CPU, doing context switching between all of them without large
    overhead?

    I'm trying to understand how the Go scheduler works and in particular
    how it compares to a traditional OS scheduler.

    We all know it is a bad idea for a (C, Java) program to create
    thousands of OS threads (e.g. one thread per request). This is mainly
    because:

    1. Each thread uses fixed amount of memory for its stack
    (while in Go, the stack grows and shrinks as needed)

    2. With many threads, the overhead of context switching becomes
    significant
    (isn't this true in Go as well?)

    Go scheduler just as thread schedulers in modern OSes is O(1).
    Overhead of making a goroutine runnable and scheduling it for execution is
    not dependent on number of goroutines.



    Even if Go context-switched only on IO and channel access, doesn't it
    still have to save and restore the state of variables for each goroutine?
    How is it possible it is so efficient compared to an OS scheduler?

    There's a design doc<https://docs.google.com/document/d/1TTj4T2JO42uD5ID9e89oa0sLKhJYD0Y_kqxDv3I3XMw/edit>for the Go scheduler
    (and an earlier discussion<https://groups.google.com/forum/#!topic/golang-nuts/tGvuvc8Z2IQ>on the topic).
    --
    You received this message because you are subscribed to the Google
    Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.


    --
    Best regards
    Alexei Sholik

    --
    Best regards
    Alexei Sholik

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Dmitry Vyukov at Aug 17, 2013 at 5:13 pm
    I have not seen any description.


    On Sat, Aug 17, 2013 at 9:11 PM, Alexei Sholik wrote:

    Thanks for the clarification, Dmitry. Is Go's calling convention
    documented anywhere?

    On Fri, Aug 16, 2013 at 11:56 AM, Dmitry Vyukov wrote:
    On Fri, Aug 16, 2013 at 3:18 AM, Alexei Sholik wrote:

    If you cooperatively preempt at known points, then you need to
    safe/restore only what is known to be alive at these points. In Go that's
    just 3 registers -- PC, SP and DX.

    You can still have values stored in other registers. Does this mean that
    the compiler is aware of those scheduling points so that it emit spilling
    code before, say, reading from a channel?

    Yes, the calling convention is that the callee can use all registers, so
    the spilling needs to be done anyway. So if we preempt at function entry,
    we get only-3-live-registers for free.



    On Fri, Aug 16, 2013 at 1:03 AM, Dmitry Vyukov wrote:

    On Fri, Aug 16, 2013 at 1:42 AM, Martin Koníček <
    martin.konicek@gmail.com> wrote:
    Thank you very much Ian, Rodrigo and Dmitry for helping me understand
    how goroutines work.

    Let me just a ask a clarifying followup question:

    When writing a server in Go, is it OK to spawn one goroutine per
    connection and do some blocking I/O inside each goroutine?

    Yes, this is the way to go.
    On "physical" (OS) level such program behaves the same way as an
    event-driven C program that uses epoll to handle large number of
    connections on few threads.



    I've seen a presentation by a member of the Go team which said that
    "Go APIs are blocking, and blocking in Go is fine."

    Is this really the case?


    Ian, it's interesting to know that local variables are always
    accessed via the stack pointer - thank you! I assumed Go would use
    registers for local variables and would have to save and restore all
    registers when switching a goroutine.

    I can see how the smaller context helps to keep goroutines more
    "lightweight" than OS threads. I can also imagine that doing scheduling in
    user mode helps.
    Still don't see why cooperative scheduling helps as opposed to say
    rescheduling every 10ms.

    If you do asynchronous preemption you need save/restore ALL registers,
    that is, 16 general purpose registers, PC, SP, segment registers, 16 XMM
    registers, FP coprocessor state, X AVX registers, all MSRs and so on.
    If you cooperatively preempt at known points, then you need to
    safe/restore only what is known to be alive at these points. In Go that's
    just 3 registers -- PC, SP and DX.




    Thanks for your patience!

    Best regards,
    Martin

    On Mon, Aug 12, 2013 at 10:23 AM, Dmitry Vyukov wrote:
    On Mon, Aug 12, 2013 at 1:35 AM, wrote:

    How is it possible Go supports 100k+ concurrent goroutines on a
    single CPU, doing context switching between all of them without large
    overhead?

    I'm trying to understand how the Go scheduler works and in
    particular how it compares to a traditional OS scheduler.

    We all know it is a bad idea for a (C, Java) program to create
    thousands of OS threads (e.g. one thread per request). This is mainly
    because:

    1. Each thread uses fixed amount of memory for its stack
    (while in Go, the stack grows and shrinks as needed)

    2. With many threads, the overhead of context switching becomes
    significant
    (isn't this true in Go as well?)

    Go scheduler just as thread schedulers in modern OSes is O(1).
    Overhead of making a goroutine runnable and scheduling it for execution is
    not dependent on number of goroutines.



    Even if Go context-switched only on IO and channel access, doesn't it
    still have to save and restore the state of variables for each goroutine?
    How is it possible it is so efficient compared to an OS scheduler?

    There's a design doc<https://docs.google.com/document/d/1TTj4T2JO42uD5ID9e89oa0sLKhJYD0Y_kqxDv3I3XMw/edit>for the Go scheduler
    (and an earlier discussion<https://groups.google.com/forum/#!topic/golang-nuts/tGvuvc8Z2IQ>on the topic).
    --
    You received this message because you are subscribed to the Google
    Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.


    --
    Best regards
    Alexei Sholik

    --
    Best regards
    Alexei Sholik
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedAug 12, '13 at 4:03a
activeDec 13, '15 at 7:55p
posts31
users11
websitegolang.org

People

Translate

site design / logo © 2022 Grokbase