FAQ
Hello rsc@golang.org, dvyukov@google.com, dave@cheney.net (cc:
golang-dev@googlegroups.com),

I'd like you to review this change to
https://code.google.com/p/go/


http://codereview.appspot.com/6496054/

Search Discussions

  • Dvyukov at Sep 16, 2012 at 5:05 am

    On 2012/09/11 19:50:49, Sebastien Paolacci wrote:
    On 2012/09/11 03:25:12, dvyukov wrote:
    On 2012/08/29 06:53:20, Sebastien Paolacci wrote:
    PTAL.

    As already observed by Dmitry, numbers really get crazy (and
    stable) for any
    cpu
    number greater than 8.
    Does anybody have an explanation? It seems to correlate with number
    of
    connections. Is it a kernel issue?
    Kernel is the easy culprit.. but I honestly first thought about Go's
    scheduler;)
    . I have the very same behavior on both 2.6.32 and 3.2 kernels, but
    didn't found
    any time to further investigate.
    I observe it on 3.6.38 with both current and my new scheduler and poll
    server.
    I didn't investigate around the "Persistent*" variants being
    slowed-down for
    cpu=1, it just seems counter intuitive at first.
    I think it has to do with the fact that now you have 8 additional
    polling
    goroutines. It must go away once you limit number of poll servers.
    I think I still don't get it. I'm puzzled about how the connection
    persistence
    "alone" can affect that patch so much.
    It obviously must have some explanation, but it is not natural to me. I'll run
    some more tests (the patch was actually left alone since the initial
    upload).

    Each pollServer is a separate thread and goroutine. The hypothesis is
    when you have 8 pollServers with GOMAXPROCS=1, there are additional
    latencies and penalties caused by constant thread and goroutine
    switching where each goroutine does very little work.


    http://codereview.appspot.com/6496054/
  • Dvyukov at Sep 16, 2012 at 5:07 am

    On 2012/09/11 19:30:26, Sebastien Paolacci wrote:
    On 2012/09/11 02:58:02, rsc wrote:
    It would probably be okay to create poll servers on demand up to the
    max?
    A pollServer, despite its name, might not be a so heavy beast. I'll try to
    measure it tomorrow (out of curiosity), and go for a lazy
    instantiation unless
    someone really advocates the opposite.
    Each pollServer is a dedicated persistent thread + I think it may
    benefit from some amount of batching.

    http://codereview.appspot.com/6496054/
  • Dvyukov at Sep 16, 2012 at 5:09 am

    On 2012/09/11 20:05:38, Sebastien Paolacci wrote:
    On 2012/09/11 03:29:55, dvyukov wrote:
    It should be limited by min(GOMAXPROCS, NumCPU), because it makes no
    sense to
    have 8 poll servers with GOMAXPROCS=1 (I believe that is what causes
    slowdown
    on
    BenchmarkTCPPersistent-1).
    I think we should avoid GOMAXPROCS and use something more static.
    Every process
    that is playing with GOMAXPROC, e.g `go test', would otherwise miss the point
    (and I would avoid dynamically reassigning fds).
    I would consider that anything beyond NumCPU is useless, that anything smaller
    is not really damaging, and that anything greater than 8 currently
    doesn't bring
    any improvements.
    Min(NumCPU, 8) + lazy instantiation as suggested by Dave/Russ ?
    25% performance penalty does not look particularly cool. I would
    experiment with Min(GOMAXPROCS, NumCPU, 8). I think you do not need to
    re-distribute fds, just grow up to Min(GOMAXPROCS, NumCPU, 8) using
    current GOMAXPROCS value. I do not care a lot about programs that "play
    with GOMAXPROCS" (especially about go test).


    http://codereview.appspot.com/6496054/
  • Sebastien Paolacci at Sep 19, 2012 at 8:39 pm

    On 2012/09/16 05:09:07, dvyukov wrote:
    25% performance penalty does not look particularly cool.
    Indeed it is weird, fixed. The point was more about understanding why it
    just impacted the "Persistent" benchs, but decreasing the number of
    pollservers does actually fix the situation.
    I would experiment with Min(GOMAXPROCS, NumCPU, 8). Done.
    I think you do not need to re-distribute fds, just grow up
    to Min(GOMAXPROCS, NumCPU, 8) using current GOMAXPROCS ?
    value.
    I incidentally realized that assignment stability was not required at
    all, so the various requirements can be met (i.e changing GOMAXPROCS +
    Dave's lazy instantiation).
    I do not care a lot about programs that "play with GOMAXPROCS"?
    (especially about go test).
    I do care about those programs, I wouldn't even be able to bench that
    very patch otherwise;)

    Please take another look.

    Sebastien

    http://codereview.appspot.com/6496054/
  • Sebastien Paolacci at Sep 19, 2012 at 8:45 pm
    CL's description updated, here's last patch set result:

    Median of 10 runs, 4 cores @ 3.4GHz, amd/linux-3.2:

    BenchmarkTCPOneShot 171917 ns/op 175194 ns/op
    1.91%
    BenchmarkTCPOneShot-2 101413 ns/op 109462 ns/op
    7.94%
    BenchmarkTCPOneShot-4 91796 ns/op 35712 ns/op
    -61.10%
    BenchmarkTCPOneShot-6 90938 ns/op 30607 ns/op
    -66.34%
    BenchmarkTCPOneShot-8 90374 ns/op 29150 ns/op
    -67.75%
    BenchmarkTCPOneShot-16 101089 ns/op 111526 ns/op
    10.32%

    BenchmarkTCPOneShotTimeout 174986 ns/op 178606 ns/op
    2.07%
    BenchmarkTCPOneShotTimeout-2 101585 ns/op 110678 ns/op
    8.95%
    BenchmarkTCPOneShotTimeout-4 91547 ns/op 35931 ns/op
    -60.75%
    BenchmarkTCPOneShotTimeout-6 91496 ns/op 31019 ns/op
    -66.10%
    BenchmarkTCPOneShotTimeout-8 90670 ns/op 29531 ns/op
    -67.43%
    BenchmarkTCPOneShotTimeout-16 101013 ns/op 106026 ns/op
    4.96%

    BenchmarkTCPPersistent 51731 ns/op 53324 ns/op
    3.08%
    BenchmarkTCPPersistent-2 32888 ns/op 30678 ns/op
    -6.72%
    BenchmarkTCPPersistent-4 25751 ns/op 15595 ns/op
    -39.44%
    BenchmarkTCPPersistent-6 26737 ns/op 9805 ns/op
    -63.33%
    BenchmarkTCPPersistent-8 26850 ns/op 9730 ns/op
    -63.76%
    BenchmarkTCPPersistent-16 104449 ns/op 102838 ns/op
    -1.54%

    BenchmarkTCPPersistentTimeout 51806 ns/op 53281 ns/op
    2.85%
    BenchmarkTCPPersistentTimeout-2 32956 ns/op 30895 ns/op
    -6.25%
    BenchmarkTCPPersistentTimeout-4 25994 ns/op 18111 ns/op
    -30.33%
    BenchmarkTCPPersistentTimeout-6 26679 ns/op 9846 ns/op
    -63.09%
    BenchmarkTCPPersistentTimeout-8 26810 ns/op 9727 ns/op
    -63.72%
    BenchmarkTCPPersistentTimeout-16 101652 ns/op 104410 ns/op
    2.71%

    http://codereview.appspot.com/6496054/
  • Dvyukov at Sep 19, 2012 at 9:03 pm
  • Sebastien Paolacci at Sep 19, 2012 at 9:34 pm
    Hello rsc@golang.org, dvyukov@google.com, dave@cheney.net (cc:
    golang-dev@googlegroups.com),

    Please take another look.


    http://codereview.appspot.com/6496054/
  • Mikioh Mikioh at Sep 19, 2012 at 9:57 pm
    https://codereview.appspot.com/6496054/diff/13002/src/pkg/net/sendfile_linux.go
    File src/pkg/net/sendfile_linux.go (right):

    https://codereview.appspot.com/6496054/diff/13002/src/pkg/net/sendfile_linux.go#newcode62
    src/pkg/net/sendfile_linux.go:62: if err1 = c.pollServer().WaitWrite(c);
    err1 == nil {
    sendfile_freebsd.go needs the same change too.

    https://codereview.appspot.com/6496054/
  • Sebastien Paolacci at Sep 19, 2012 at 10:13 pm
    https://codereview.appspot.com/6496054/diff/13002/src/pkg/net/sendfile_linux.go
    File src/pkg/net/sendfile_linux.go (right):

    https://codereview.appspot.com/6496054/diff/13002/src/pkg/net/sendfile_linux.go#newcode62
    src/pkg/net/sendfile_linux.go:62: if err1 = c.pollServer().WaitWrite(c);
    err1 == nil {
    On 2012/09/19 21:57:11, mikio wrote:
    sendfile_freebsd.go needs the same change too.
    Done, thanks.

    https://codereview.appspot.com/6496054/
  • Mikioh Mikioh at Sep 19, 2012 at 10:19 pm
  • Brad Fitzpatrick at Sep 19, 2012 at 10:33 pm
    So if GOMAXPROCS changes at runtime, the pollServer() accessor can lose its
    *pollServer?

    Why not just store the *pollServer in the *netFD?

    GOMAXPROCS changing at runtime used to sound crazy, but it's not uncommon
    now to adjust the number of VCPUs a VM gets at runtime due to changing
    demand.
  • Sebastien Paolacci at Sep 20, 2012 at 8:28 am
    Hello Brad,
    So if GOMAXPROCS changes at runtime, the pollServer() accessor can
    lose its *pollServer?
    Yes and no. A same `*netFD' can indeed get buried in different
    pollServers over the time, but we don't really care as long as the
    selected pollServer is maintained for each single {wait,unblock}
    sequence, which always happens in e.g `fd.pollServer().WaitWrite(fd)'.
    I/O can't also happen concurrently thanks to rio/wio mutexes.
    All the tests pass, an aborting Close should however be racy if
    GOMAXPROCS change in the between.
    Why not just store the *pollServer in the *netFD?
    Indeed, seems to be the safest/easiest way to maintain hard coherency at
    the *netFD level. Thanks.
    GOMAXPROCS changing at runtime used to sound crazy, but it's not
    uncommon now
    It's exactly the purpose of the last patch set, the (quite used)
    benchmark framework relies on that feature too.

    Sebastien

    http://codereview.appspot.com/6496054/
  • Sebastien Paolacci at Sep 20, 2012 at 11:03 am
    I eventually manage to make `TestTCPListenClose' fail by injecting a
    dummy `rand.Int() % pollN' selection code in `pollserver()'. Even if it
    doesn't make so much sense, it should have worked.

    CL updated with Brad's suggestion, please take another look.

    Thanks,
    Sebastien

    https://codereview.appspot.com/6496054/
  • Dave at Sep 20, 2012 at 11:12 am
    Thank you for your proposal, but I remain uncomfortable with it. In
    return for additional scalability, a lot more code has been added in the
    hot path of newFD which will impact the latency of every accept and dial
    operation. I am concerned that this is a price that all users of net
    have to pay even if they are not interested in handling tens of
    thousands of active connections.


    https://codereview.appspot.com/6496054/diff/23001/src/pkg/net/fd_unix.go
    File src/pkg/net/fd_unix.go (right):

    https://codereview.appspot.com/6496054/diff/23001/src/pkg/net/fd_unix.go#newcode285
    src/pkg/net/fd_unix.go:285: panic(err)
    if any error from newPollServer is a panic, newPollServer() itself
    should panic.

    https://codereview.appspot.com/6496054/
  • Sebastien Paolacci at Sep 20, 2012 at 2:16 pm
    Hello Dave,

    I'm not sure I really got your point. It indeed adds few lines of code
    in the "hot" `newFD()' (not in read and/or write), but if you look at
    them you'll find:
    - an access to an already burned `Once' (that was already present),
    - an additional {modulo + test + two slice lookups} sequence.

    Given that anyway going for few syscalls, two/three mutexes, a
    sleep/wake-up sequence, and hopefully few real bytes of transfer, I
    can't see that as a major latency impact. Benchmarks, given that few
    percents are really within the confidence interval for that kind of run,
    seems decent (to me) even in the cpu=1 situation.

    On the other side, handling few tens of thousands of active connections
    is honestly not a corner-case.

    Cheers,
    Sebastien

    http://codereview.appspot.com/6496054/
  • Bradfitz at Sep 20, 2012 at 6:43 pm
    https://codereview.appspot.com/6496054/diff/23001/src/pkg/net/fd_unix.go
    File src/pkg/net/fd_unix.go (right):

    https://codereview.appspot.com/6496054/diff/23001/src/pkg/net/fd_unix.go#newcode308
    src/pkg/net/fd_unix.go:308: startServersOnce[k]()
    this line may crash if GOMAXPROCS has increased since the size it was
    during init(), when startServersOnce was initialized.

    https://codereview.appspot.com/6496054/
  • Bradfitz at Sep 20, 2012 at 7:18 pm
    https://codereview.appspot.com/6496054/diff/23001/src/pkg/net/fd_unix.go
    File src/pkg/net/fd_unix.go (right):

    https://codereview.appspot.com/6496054/diff/23001/src/pkg/net/fd_unix.go#newcode308
    src/pkg/net/fd_unix.go:308: startServersOnce[k]()
    On 2012/09/20 19:15:11, dvyukov wrote:
    On 2012/09/20 18:43:18, bradfitz wrote:
    this line may crash if GOMAXPROCS has increased since the size it
    was during
    init(), when startServersOnce was initialized.
    init() does not call GOMAXPROCS, it calls NumCPU()
    sorry, I see. Teaches me to read code on a bus.

    https://codereview.appspot.com/6496054/
  • Dvyukov at Sep 20, 2012 at 7:22 pm
    https://codereview.appspot.com/6496054/diff/23001/src/pkg/net/fd_unix.go
    File src/pkg/net/fd_unix.go (right):

    https://codereview.appspot.com/6496054/diff/23001/src/pkg/net/fd_unix.go#newcode303
    src/pkg/net/fd_unix.go:303: pollN := runtime.GOMAXPROCS(0)
    Please move this out of newFD().
    This change addresses inefficiency in current pollserver impl. I
    anticipate that pollserver impl will not only change in future, but also
    diverge for different OSes. The netFD part of this file implements
    generic functionality that won't diverge, so please split it from the
    partitioning logic (most OSes are perfectly able to implement a scalable
    pollserver, and so won't need it).

    https://codereview.appspot.com/6496054/
  • Dvyukov at Sep 20, 2012 at 8:18 pm
    https://codereview.appspot.com/6496054/diff/23001/src/pkg/net/fd_unix.go
    File src/pkg/net/fd_unix.go (right):

    https://codereview.appspot.com/6496054/diff/23001/src/pkg/net/fd_unix.go#newcode308
    src/pkg/net/fd_unix.go:308: startServersOnce[k]()
    On 2012/09/20 18:43:18, bradfitz wrote:
    this line may crash if GOMAXPROCS has increased since the size it was during
    init(), when startServersOnce was initialized.
    init() does not call GOMAXPROCS, it calls NumCPU()

    https://codereview.appspot.com/6496054/
  • Sebastien Paolacci at Sep 21, 2012 at 11:13 am
    On 2012/09/20 19:22:20, dvyukov wrote:
    https://codereview.appspot.com/6496054/diff/23001/src/pkg/net/fd_unix.go#newcode303
    src/pkg/net/fd_unix.go:303: pollN := runtime.GOMAXPROCS(0)
    Please move this out of newFD().
    Done.

    https://codereview.appspot.com/6496054/
  • Sebastien Paolacci at Sep 21, 2012 at 4:43 pm

    On 2012/09/21 14:36:13, rsc wrote:
    I would consider that anything beyond NumCPU is useless, that
    anything
    smaller is not really damaging, and that anything greater than 8
    currently doesn't bring any improvements.

    Min(NumCPU, 8) + lazy instantiation as suggested by Dave/Russ ?
    This is okay with me but I would like dvyukov to agree.
    Russ
    Hello Russ,

    Just for the record, last patch round is stabilizing on `Min(GOMAXPROCS,
    NumCPU, 8)' with lazy instantiation. CL's description also mentions (but
    does not explain) that choice.

    Sebastien

    http://codereview.appspot.com/6496054/
  • Dvyukov at Sep 21, 2012 at 8:16 pm
  • Remyoudompheng at Sep 26, 2012 at 7:27 pm
    Is this going to be submitted?

    http://codereview.appspot.com/6496054/
  • Rsc at Sep 26, 2012 at 7:33 pm
    *** Submitted as
    http://code.google.com/p/go/source/detail?r=19cf9361902a ***

    net: spread fd over several pollservers.

    Lighten contention without preventing further improvements on
    pollservers.
    Connections are spread over Min(GOMAXPROCS, NumCPU, 8) pollserver
    instances.

    Median of 10 runs, 4 cores @ 3.4GHz, amd/linux-3.2:

    BenchmarkTCPOneShot 171917 ns/op 175194 ns/op
    1.91%
    BenchmarkTCPOneShot-2 101413 ns/op 109462 ns/op
    7.94%
    BenchmarkTCPOneShot-4 91796 ns/op 35712 ns/op
    -61.10%
    BenchmarkTCPOneShot-6 90938 ns/op 30607 ns/op
    -66.34%
    BenchmarkTCPOneShot-8 90374 ns/op 29150 ns/op
    -67.75%
    BenchmarkTCPOneShot-16 101089 ns/op 111526 ns/op
    10.32%

    BenchmarkTCPOneShotTimeout 174986 ns/op 178606 ns/op
    2.07%
    BenchmarkTCPOneShotTimeout-2 101585 ns/op 110678 ns/op
    8.95%
    BenchmarkTCPOneShotTimeout-4 91547 ns/op 35931 ns/op
    -60.75%
    BenchmarkTCPOneShotTimeout-6 91496 ns/op 31019 ns/op
    -66.10%
    BenchmarkTCPOneShotTimeout-8 90670 ns/op 29531 ns/op
    -67.43%
    BenchmarkTCPOneShotTimeout-16 101013 ns/op 106026 ns/op
    4.96%

    BenchmarkTCPPersistent 51731 ns/op 53324 ns/op
    3.08%
    BenchmarkTCPPersistent-2 32888 ns/op 30678 ns/op
    -6.72%
    BenchmarkTCPPersistent-4 25751 ns/op 15595 ns/op
    -39.44%
    BenchmarkTCPPersistent-6 26737 ns/op 9805 ns/op
    -63.33%
    BenchmarkTCPPersistent-8 26850 ns/op 9730 ns/op
    -63.76%
    BenchmarkTCPPersistent-16 104449 ns/op 102838 ns/op
    -1.54%

    BenchmarkTCPPersistentTimeout 51806 ns/op 53281 ns/op
    2.85%
    BenchmarkTCPPersistentTimeout-2 32956 ns/op 30895 ns/op
    -6.25%
    BenchmarkTCPPersistentTimeout-4 25994 ns/op 18111 ns/op
    -30.33%
    BenchmarkTCPPersistentTimeout-6 26679 ns/op 9846 ns/op
    -63.09%
    BenchmarkTCPPersistentTimeout-8 26810 ns/op 9727 ns/op
    -63.72%
    BenchmarkTCPPersistentTimeout-16 101652 ns/op 104410 ns/op
    2.71%

    R=rsc, dvyukov, dave, mikioh.mikioh, bradfitz, remyoudompheng
    CC=golang-dev
    http://codereview.appspot.com/6496054

    Committer: Russ Cox <rsc@golang.org>


    http://codereview.appspot.com/6496054/

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-dev @
categoriesgo
postedSep 11, '12 at 8:15p
activeSep 26, '12 at 7:33p
posts25
users7
websitegolang.org

People

Translate

site design / logo © 2022 Grokbase