FAQ
This looks nice, thanks. Is the speedup correlated with number of cores?
Should we use runtime.NumCPUs() as the number of poll servers instead of
a fixed constant? Dmitriy?



http://codereview.appspot.com/6496054/diff/1002/src/pkg/net/fd.go
File src/pkg/net/fd.go (right):

http://codereview.appspot.com/6496054/diff/1002/src/pkg/net/fd.go#newcode307
src/pkg/net/fd.go:307: if err =
pollservers[fd.sysfd%pollN].WaitWrite(fd); err != nil {
This calculation seems like it might change over time. Could you please
add


func (fd *netFD) pollServer() *pollServer {
return pollservers[fd.sysfd % pollN]
}

and then use fd.pollServer().WaitWrite(fd) etc, so that the assignment
detail is in just one place?

http://codereview.appspot.com/6496054/

Search Discussions

  • Dave Cheney at Sep 11, 2012 at 1:32 am
    I'm -1 on this change as it stands. Spreading the fds over a number of
    pollservers will reduce the latency between the fd becoming ready and
    the caller being signaled, but I think it is overkill to spawn N
    pollservers in the case of a simple program that just wants to do a
    http get, or respond to a few tcp requests.

    Would it be possible to make the process of spawning (and reaping
    pollservers) sensitive to the load on the net package ?
    On Tue, Sep 11, 2012 at 11:22 AM, wrote:
    This looks nice, thanks. Is the speedup correlated with number of cores?
    Should we use runtime.NumCPUs() as the number of poll servers instead of
    a fixed constant? Dmitriy?



    http://codereview.appspot.com/6496054/diff/1002/src/pkg/net/fd.go
    File src/pkg/net/fd.go (right):

    http://codereview.appspot.com/6496054/diff/1002/src/pkg/net/fd.go#newcode307
    src/pkg/net/fd.go:307: if err =
    pollservers[fd.sysfd%pollN].WaitWrite(fd); err != nil {
    This calculation seems like it might change over time. Could you please
    add


    func (fd *netFD) pollServer() *pollServer {
    return pollservers[fd.sysfd % pollN]
    }

    and then use fd.pollServer().WaitWrite(fd) etc, so that the assignment
    detail is in just one place?

    http://codereview.appspot.com/6496054/
  • Russ Cox at Sep 11, 2012 at 2:58 am
    It would probably be okay to create poll servers on demand up to the max?
  • Dave Cheney at Sep 11, 2012 at 2:59 am
    SGTM
    On Tue, Sep 11, 2012 at 12:58 PM, Russ Cox wrote:
    It would probably be okay to create poll servers on demand up to the max?
  • Dvyukov at Sep 11, 2012 at 3:25 am

    On 2012/08/29 06:53:20, Sebastien Paolacci wrote:
    PTAL.
    As already observed by Dmitry, numbers really get crazy (and stable)
    for any cpu
    number greater than 8.
    Does anybody have an explanation? It seems to correlate with number of
    connections. Is it a kernel issue?

    I didn't investigate around the "Persistent*" variants being
    slowed-down for
    cpu=1, it just seems counter intuitive at first.
    I think it has to do with the fact that now you have 8 additional
    polling goroutines. It must go away once you limit number of poll
    servers.


    http://codereview.appspot.com/6496054/
  • Dvyukov at Sep 11, 2012 at 3:30 am

    On 2012/09/11 01:22:15, rsc wrote:
    This looks nice, thanks. Is the speedup correlated with number of
    cores? Should
    we use runtime.NumCPUs() as the number of poll servers instead of a fixed
    constant? Dmitriy?
    I think you are right. Because on a 64-way machine one will have the
    same contention as he has now on an 8-way machine.
    It should be limited by min(GOMAXPROCS, NumCPU), because it makes no
    sense to have 8 poll servers with GOMAXPROCS=1 (I believe that is what
    causes slowdown on BenchmarkTCPPersistent-1).



    http://codereview.appspot.com/6496054/
  • Sebastien Paolacci at Sep 11, 2012 at 7:23 pm

    On 2012/09/11 01:22:15, rsc wrote:
    This looks nice, thanks. Is the speedup correlated with number of
    cores?
    It doesn't seems to be. On a 4 cores @ 3.4GHz, amd/linux-3.2.23, I have

    BenchmarkTCPOneShot 173950 ns/op 179293 ns/op 3.07%

    BenchmarkTCPOneShot-2 102390 ns/op 107469 ns/op 4.96%

    BenchmarkTCPOneShot-4 92847 ns/op 35918 ns/op -61.31%

    BenchmarkTCPOneShot-6 92034 ns/op 29586 ns/op -67.85%

    BenchmarkTCPOneShot-8 91204 ns/op 28002 ns/op -69.30%

    BenchmarkTCPOneShot-16 101174 ns/op 122650 ns/op 21.23%


    BenchmarkTCPOneShotTimeout 177841 ns/op 182579 ns/op 2.66%

    BenchmarkTCPOneShotTimeout-2 103251 ns/op 109657 ns/op 6.20%

    BenchmarkTCPOneShotTimeout-4 93524 ns/op 36019 ns/op -61.49%

    BenchmarkTCPOneShotTimeout-6 93358 ns/op 30165 ns/op -67.69%

    BenchmarkTCPOneShotTimeout-8 92107 ns/op 28542 ns/op -69.01%

    BenchmarkTCPOneShotTimeout-16 106491 ns/op 122996 ns/op 15.50%


    BenchmarkTCPPersistent 52489 ns/op 65834 ns/op 25.42%

    BenchmarkTCPPersistent-2 33433 ns/op 34018 ns/op 1.75%

    BenchmarkTCPPersistent-4 26408 ns/op 14033 ns/op -46.86%

    BenchmarkTCPPersistent-6 28270 ns/op 9917 ns/op -64.92%

    BenchmarkTCPPersistent-8 27838 ns/op 9671 ns/op -65.26%

    BenchmarkTCPPersistent-16 110090 ns/op 102760 ns/op -6.66%


    BenchmarkTCPPersistentTimeout 52612 ns/op 65356 ns/op 24.22%

    BenchmarkTCPPersistentTimeout-2 33480 ns/op 33967 ns/op 1.45%

    BenchmarkTCPPersistentTimeout-4 26460 ns/op 11179 ns/op -57.75%

    BenchmarkTCPPersistentTimeout-6 28208 ns/op 9916 ns/op -64.85%

    BenchmarkTCPPersistentTimeout-8 27660 ns/op 9696 ns/op -64.95%

    BenchmarkTCPPersistentTimeout-16 103784 ns/op 103534 ns/op -0.24%

    Should
    we use runtime.NumCPUs() as the number of poll servers instead of a fixed
    constant? Dmitriy?
    That what I used first, but on a 12 core box.. argh 12 is just a weird
    number. More seriously, there's currently no scalabilty beyond 8 cores,
    so I arbitrary cut here.
    I would actually vote for min(NumCPU, 8), and then increase when it will
    make sense.


    http://codereview.appspot.com/6496054/diff/1002/src/pkg/net/fd.go#newcode307
    src/pkg/net/fd.go:307: if err =
    pollservers[fd.sysfd%pollN].WaitWrite(fd); err
    != nil {
    This calculation seems like it might change over time. Could you
    please add
    func (fd *netFD) pollServer() *pollServer {
    return pollservers[fd.sysfd % pollN]
    }
    and then use fd.pollServer().WaitWrite(fd) etc, so that the assignment detail is
    in just one place?
    Done.

    http://codereview.appspot.com/6496054/
  • Sebastien Paolacci at Sep 11, 2012 at 7:58 pm

    On 2012/09/11 03:25:12, dvyukov wrote:
    On 2012/08/29 06:53:20, Sebastien Paolacci wrote:
    PTAL.

    As already observed by Dmitry, numbers really get crazy (and stable)
    for any
    cpu
    number greater than 8.
    Does anybody have an explanation? It seems to correlate with number of
    connections. Is it a kernel issue?
    Kernel is the easy culprit.. but I honestly first thought about Go's
    scheduler;) . I have the very same behavior on both 2.6.32 and 3.2
    kernels, but didn't found any time to further investigate.

    I didn't investigate around the "Persistent*" variants being
    slowed-down for
    cpu=1, it just seems counter intuitive at first.
    I think it has to do with the fact that now you have 8 additional polling
    goroutines. It must go away once you limit number of poll servers.
    I think I still don't get it. I'm puzzled about how the connection
    persistence "alone" can affect that patch so much.
    It obviously must have some explanation, but it is not natural to me.
    I'll run some more tests (the patch was actually left alone since the
    initial upload).

    Sebastien

    http://codereview.appspot.com/6496054/
  • Sebastien Paolacci at Sep 11, 2012 at 8:01 pm

    On 2012/09/11 02:58:02, rsc wrote:
    It would probably be okay to create poll servers on demand up to the
    max?

    A pollServer, despite its name, might not be a so heavy beast. I'll try
    to measure it tomorrow (out of curiosity), and go for a lazy
    instantiation unless someone really advocates the opposite.

    Sebastien

    http://codereview.appspot.com/6496054/
  • Sebastien Paolacci at Sep 11, 2012 at 8:13 pm

    On 2012/09/11 03:29:55, dvyukov wrote:
    It should be limited by min(GOMAXPROCS, NumCPU), because it makes no sense to
    have 8 poll servers with GOMAXPROCS=1 (I believe that is what causes
    slowdown on
    BenchmarkTCPPersistent-1).
    I think we should avoid GOMAXPROCS and use something more static. Every
    process that is playing with GOMAXPROC, e.g `go test', would otherwise
    miss the point (and I would avoid dynamically reassigning fds).

    I would consider that anything beyond NumCPU is useless, that anything
    smaller is not really damaging, and that anything greater than 8
    currently doesn't bring any improvements.

    Min(NumCPU, 8) + lazy instantiation as suggested by Dave/Russ ?

    Sebastien

    http://codereview.appspot.com/6496054/
  • Russ Cox at Sep 21, 2012 at 3:04 pm

    I would consider that anything beyond NumCPU is useless, that anything
    smaller is not really damaging, and that anything greater than 8
    currently doesn't bring any improvements.

    Min(NumCPU, 8) + lazy instantiation as suggested by Dave/Russ ?
    This is okay with me but I would like dvyukov to agree.

    Russ

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-dev @
categoriesgo
postedSep 11, '12 at 1:22a
activeSep 21, '12 at 3:04p
posts11
users4
websitegolang.org

People

Translate

site design / logo © 2022 Grokbase