FAQ
Reviewers: golang-dev_googlegroups.com,

Message:
Hello golang-dev@googlegroups.com,

I'd like you to review this change to
https://dvyukov%40google.com@code.google.com/p/go/


Description:
net: fix flaky test
The test failed on one of the builders with:
timeout_test.go:594: ln.Accept: accept tcp 127.0.0.1:19373: use of
closed network connection
http://build.golang.org/log/e83f4a152b37071b9d079096e15913811ad296b5

Please review this at http://codereview.appspot.com/6859043/

Affected files:
M src/pkg/net/timeout_test.go


Index: src/pkg/net/timeout_test.go
===================================================================
--- a/src/pkg/net/timeout_test.go
+++ b/src/pkg/net/timeout_test.go
@@ -588,8 +588,10 @@

ln := newLocalListener(t)
defer ln.Close()
+ connected := make(chan bool)
go func() {
s, err := ln.Accept()
+ connected <- true
if err != nil {
t.Fatalf("ln.Accept: %v", err)
}
@@ -619,6 +621,7 @@
t.Fatalf("DialTCP: %v", err)
}
defer c.Close()
+ <-connected
for i := 0; i < 1024; i++ {
var buf [1]byte
c.Write(buf[:])

Search Discussions

  • Brad Fitzpatrick at Nov 26, 2012 at 8:29 pm
    LGTM
    On Mon, Nov 26, 2012 at 11:15 AM, wrote:

    Reviewers: golang-dev_googlegroups.com,

    Message:
    Hello golang-dev@googlegroups.com,

    I'd like you to review this change to
    https://dvyukov%40google.com@**code.google.com/p/go/<http://40google.com@code.google.com/p/go/>


    Description:
    net: fix flaky test
    The test failed on one of the builders with:
    timeout_test.go:594: ln.Accept: accept tcp 127.0.0.1:19373: use of
    closed network connection
    http://build.golang.org/log/**e83f4a152b37071b9d079096e15913**811ad296b5<http://build.golang.org/log/e83f4a152b37071b9d079096e15913811ad296b5>

    Please review this at http://codereview.appspot.com/**6859043/<http://codereview.appspot.com/6859043/>

    Affected files:
    M src/pkg/net/timeout_test.go


    Index: src/pkg/net/timeout_test.go
    ==============================**==============================**=======
    --- a/src/pkg/net/timeout_test.go
    +++ b/src/pkg/net/timeout_test.go
    @@ -588,8 +588,10 @@

    ln := newLocalListener(t)
    defer ln.Close()
    + connected := make(chan bool)
    go func() {
    s, err := ln.Accept()
    + connected <- true
    if err != nil {
    t.Fatalf("ln.Accept: %v", err)
    }
    @@ -619,6 +621,7 @@
    t.Fatalf("DialTCP: %v", err)
    }
    defer c.Close()
    + <-connected
    for i := 0; i < 1024; i++ {
    var buf [1]byte
    c.Write(buf[:])

  • Dave at Nov 26, 2012 at 8:57 pm
    +mikioh

    I'm not sure why this works, Dial should not return until the connection
    has been Accept'd, right ? We use the same pattern in the test
    immediately above this one.

    What looks like is happening is the client is connecting, pumping out
    1024 x 1 byte writes, then shutting the connection before the Accept has
    completed.

    Also, why is this only failing on FreeBSD/386? Then again most of the
    other *BSD builders are flaky at the same time so we don't have the best
    coverage.


    https://codereview.appspot.com/6859043/diff/4001/src/pkg/net/timeout_test.go
    File src/pkg/net/timeout_test.go (right):

    https://codereview.appspot.com/6859043/diff/4001/src/pkg/net/timeout_test.go#newcode596
    src/pkg/net/timeout_test.go:596: t.Fatalf("ln.Accept: %v", err)
    Must use t.Errorf here

    https://codereview.appspot.com/6859043/diff/4001/src/pkg/net/timeout_test.go#newcode604
    src/pkg/net/timeout_test.go:604: if err != nil {
    Can this be tightened up a bit here, the expect errors are EOF and
    timeout, right ?

    https://codereview.appspot.com/6859043/diff/4001/src/pkg/net/timeout_test.go#newcode627
    src/pkg/net/timeout_test.go:627: c.Write(buf[:])
    Please add a check for the err value here.

    https://codereview.appspot.com/6859043/
  • Rémy Oudompheng at Nov 26, 2012 at 9:08 pm

    On 2012/11/26 wrote:
    +mikioh

    I'm not sure why this works, Dial should not return until the connection
    has been Accept'd, right ? We use the same pattern in the test
    immediately above this one.
    I suppose it should return when the connection is established at the
    TCP level, but being accepted by a process is a different thing.

    Rémy.
  • Russ Cox at Nov 26, 2012 at 9:09 pm
    I believe Dial can return before the Accept executes. On most systems
    I believe that the completion of the TCP 3-way handshake is what
    places the socket on the queue of connections that can be returned by
    accept(2) and also what makes the connect(2) on the client side
    finish. It is easy to believe that a fast client might close the
    connection (send a FIN) before the accept(2) happens.

    Of course implementations differ on whether a closed connection turns
    into a failed accept(2).
    http://www.cl.cam.ac.uk/cgi-bin/manpage?2+accept says:

    ERROR HANDLING

    Linux accept() passes already-pending network errors on the new socket
    as an error code from accept(). This behaviour differs from other BSD
    socket implementations. For reliable operation the application should
    detect the network errors defined for the protocol after accept() and
    treat them like EAGAIN by retrying. In case of TCP/IP these are ENET-
    DOWN, EPROTO, ENOPROTOOPT, EHOSTDOWN, ENONET, EHOSTUNREACH, EOPNOTSUPP,
    and ENETUNREACH.

    but of course you are on BSD so maybe the characterization is wrong,
    outdated, or backward.

    Russ
  • Brad Fitzpatrick at Nov 26, 2012 at 9:21 pm

    On Mon, Nov 26, 2012 at 1:09 PM, Russ Cox wrote:

    I believe Dial can return before the Accept executes. On most systems
    I believe that the completion of the TCP 3-way handshake is what
    places the socket on the queue of connections that can be returned by
    accept(2) and also what makes the connect(2) on the client side
    finish. It is easy to believe that a fast client might close the
    connection (send a FIN) before the accept(2) happens.

    Of course implementations differ on whether a closed connection turns
    into a failed accept(2).
    http://www.cl.cam.ac.uk/cgi-bin/manpage?2+accept says:

    ERROR HANDLING

    Linux accept() passes already-pending network errors on the new
    socket
    as an error code from accept(). This behaviour differs from other
    BSD
    socket implementations. For reliable operation the application
    should
    detect the network errors defined for the protocol after accept()
    and
    treat them like EAGAIN by retrying. In case of TCP/IP these are
    ENET-
    DOWN, EPROTO, ENOPROTOOPT, EHOSTDOWN, ENONET, EHOSTUNREACH,
    EOPNOTSUPP,
    and ENETUNREACH.
    Do that mean there's a possible series of packets I can throw at a Linux Go
    server to make syscall.Accept return an error and end its
    http.ListenAndServe remotely?

    Do we need to change the net package on Linux to treat all those EFOO
    errors as Temporary in our accept?
  • Russ Cox at Nov 26, 2012 at 9:53 pm

    Do that mean there's a possible series of packets I can throw at a Linux Go
    server to make syscall.Accept return an error and end its
    http.ListenAndServe remotely?
    Yes, but that's not what is happening here and should already be fixed
    (see Remy's link).

    What's happening here is that after a Listen, one goroutine does c :=
    Dial, c.Write c.Write c.Write, c.Close, ln.Close while another
    goroutine does ln.Accept. In some cases the first goroutine can do its
    whole sequence (including ln.Close) before the other goroutine does
    ln.Accept. That is, the one goroutine is closing the *listener* out
    from under the other.

    Dave suggested that perhaps the Dial would not return until the
    ln.Accept had completed; if so the ln.Close could not possibly run
    before the ln.Accept finished, but I was explaining why that's not the
    case.

    Russ
  • Brad Fitzpatrick at Nov 26, 2012 at 10:03 pm

    On Mon, Nov 26, 2012 at 1:53 PM, Russ Cox wrote:

    Do that mean there's a possible series of packets I can throw at a Linux Go
    server to make syscall.Accept return an error and end its
    http.ListenAndServe remotely?
    Yes, but that's not what is happening here and should already be fixed
    (see Remy's link).
    Yeah, I was just going on a tangent. Issue 3395 only fixed ECONNABORTED.
    The manpage suggests there are more? But I guess most servers would have
    bigger problems if ENETDOWN or ENONET.
  • Russ Cox at Nov 26, 2012 at 10:17 pm

    Yeah, I was just going on a tangent. Issue 3395 only fixed ECONNABORTED.
    The manpage suggests there are more? But I guess most servers would have
    bigger problems if ENETDOWN or ENONET.
    No idea. Linux man pages are somewhere below Wikipedia on the reliability scale.

    Russ
  • Dmitry Vyukov at Nov 27, 2012 at 5:49 am
    So, is this change LGTM?

    On Tue, Nov 27, 2012 at 2:10 AM, Russ Cox wrote:
    Yeah, I was just going on a tangent. Issue 3395 only fixed ECONNABORTED.
    The manpage suggests there are more? But I guess most servers would have
    bigger problems if ENETDOWN or ENONET.
    No idea. Linux man pages are somewhere below Wikipedia on the reliability scale.

    Russ
  • Rémy Oudompheng at Nov 26, 2012 at 11:19 pm

    On 2012/11/26 Brad Fitzpatrick wrote:
    On Mon, Nov 26, 2012 at 1:09 PM, Russ Cox wrote:

    I believe Dial can return before the Accept executes. On most systems
    I believe that the completion of the TCP 3-way handshake is what
    places the socket on the queue of connections that can be returned by
    accept(2) and also what makes the connect(2) on the client side
    finish. It is easy to believe that a fast client might close the
    connection (send a FIN) before the accept(2) happens.

    Of course implementations differ on whether a closed connection turns
    into a failed accept(2).
    http://www.cl.cam.ac.uk/cgi-bin/manpage?2+accept says:

    ERROR HANDLING

    Linux accept() passes already-pending network errors on the new
    socket
    as an error code from accept(). This behaviour differs from other
    BSD
    socket implementations. For reliable operation the application
    should
    detect the network errors defined for the protocol after accept()
    and
    treat them like EAGAIN by retrying. In case of TCP/IP these are
    ENET-
    DOWN, EPROTO, ENOPROTOOPT, EHOSTDOWN, ENONET, EHOSTUNREACH,
    EOPNOTSUPP,
    and ENETUNREACH.

    Do that mean there's a possible series of packets I can throw at a Linux Go
    server to make syscall.Accept return an error and end its
    http.ListenAndServe remotely?

    Do we need to change the net package on Linux to treat all those EFOO errors
    as Temporary in our accept?
    We already fixed issue 3395 but indeed more may be hidden.

    http://code.google.com/p/go/issues/detail?id=3395

    Rémy.
  • Dave Cheney at Nov 26, 2012 at 9:28 pm
    LGTM assuming the other comments are addressed.

    Sorry for being a stick in the mud about this. I can see how, if nagel
    was in effect, then those 1024 writes would be coalesced into a single
    write, which could piggy back on the 3rd packet in the handshake (TCP
    allows data on the 3rd ACK, assuming there is window size available,
    right?) then follow up with a FIN before the server even saw the
    packet.
    On Tue, Nov 27, 2012 at 8:09 AM, Russ Cox wrote:
    I believe Dial can return before the Accept executes. On most systems
    I believe that the completion of the TCP 3-way handshake is what
    places the socket on the queue of connections that can be returned by
    accept(2) and also what makes the connect(2) on the client side
    finish. It is easy to believe that a fast client might close the
    connection (send a FIN) before the accept(2) happens.

    Of course implementations differ on whether a closed connection turns
    into a failed accept(2).
    http://www.cl.cam.ac.uk/cgi-bin/manpage?2+accept says:

    ERROR HANDLING

    Linux accept() passes already-pending network errors on the new socket
    as an error code from accept(). This behaviour differs from other BSD
    socket implementations. For reliable operation the application should
    detect the network errors defined for the protocol after accept() and
    treat them like EAGAIN by retrying. In case of TCP/IP these are ENET-
    DOWN, EPROTO, ENOPROTOOPT, EHOSTDOWN, ENONET, EHOSTUNREACH, EOPNOTSUPP,
    and ENETUNREACH.

    but of course you are on BSD so maybe the characterization is wrong,
    outdated, or backward.

    Russ
  • Russ Cox at Nov 26, 2012 at 9:50 pm

    Sorry for being a stick in the mud about this. I can see how, if nagel
    was in effect, then those 1024 writes would be coalesced into a single
    write, which could piggy back on the 3rd packet in the handshake (TCP
    allows data on the 3rd ACK, assuming there is window size available,
    right?) then follow up with a FIN before the server even saw the
    packet.
    You don't need any packet coalescing. The kernel is running the TCP
    protocol independently of the accept(2) system call finishing. You
    just need enough delay on the machine that all the 1024 writes and the
    final FIN get processed by the server half of the kernel before the
    user-level Go program gets around to calling accept(2) to retrieve an
    fd for the connection. And by then the function has returned and done
    the deferred ln.Close, so the ln.Accept fails.

    Russ
  • Dave at Nov 27, 2012 at 6:10 am

    On 2012/11/27 05:49:41, dvyukov wrote:
    So, is this change LGTM?

    Yes.

    https://codereview.appspot.com/6859043/
  • Mikioh Mikioh at Nov 27, 2012 at 7:32 am
    LGTM

    I couldn't repro the issue on freebsd/386 9.0-release
    but let's see what happens.

    https://codereview.appspot.com/6859043/
  • Dvyukov at Nov 27, 2012 at 8:24 am
    *** Submitted as
    http://code.google.com/p/go/source/detail?r=34e54cf71fed ***

    net: fix flaky test
    The test failed on one of the builders with:
    timeout_test.go:594: ln.Accept: accept tcp 127.0.0.1:19373: use of
    closed network connection
    http://build.golang.org/log/e83f4a152b37071b9d079096e15913811ad296b5

    R=golang-dev, bradfitz, dave, mikioh.mikioh, remyoudompheng, rsc
    CC=golang-dev
    http://codereview.appspot.com/6859043


    http://codereview.appspot.com/6859043/

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-dev @
categoriesgo
postedNov 26, '12 at 8:17p
activeNov 27, '12 at 8:24a
posts16
users6
websitegolang.org

People

Translate

site design / logo © 2022 Grokbase