FAQ
Hi All,

This is my first post. So please excuse if I break any forum rules :-)

*Background*:
We are running a proxy server in golang which takes request from client,
fetches the content client requested for and sends a response back. We run
the server on port 8080.
Go version 1.2.1 .Code runs on CentOs6. TCP parameters tuned for faster use
of connections.
Processe's Max open file limit set to 200000

*Issue*:
Once in every 3-5 days, server gets in to unresponsive state saying:
Accept error: accept tcp [::]:8080: too many open files; retrying in 1s
Accept error: accept tcp [::]:8080: too many open files; retrying in 1s

A manual restart is required every time we see this.

*Suspicious findings*:
*->* When we observed lsof output in our internal testing, some *connections
are there for ever*. And are increasing randomly. So we suspect there is an
FD leak and this number goes way higher in real world case.
Ex:
agent 13771 agent 7u IPv6 75640925 0t0 TCP
128.199.211.246:webcache->123.63.202.169:57585 (ESTABLISHED)
agent 13771 agent 9u IPv4 75636003 0t0 TCP
10.130.203.8:37627->10.130.203.8:6379 (ESTABLISHED)
agent 13771 agent 10u IPv6 75640287 0t0 TCP
128.199.211.246:webcache->222.186.129.5:fotogcad (ESTABLISHED)
agent 13771 agent 11u IPv6 75912851 0t0 TCP
128.199.211.246:webcache->222.186.129.5:ieee-mih (ESTABLISHED)
agent 13771 agent 12u IPv6 76102384 0t0 TCP
128.199.211.246:webcache->123.63.202.169:56670 (ESTABLISHED)
agent 13771 agent 13u IPv6 76080513 0t0 TCP
128.199.211.246:webcache->123.63.202.169:49662 (ESTABLISHED)
agent 13771 agent 14u IPv4 75645718 0t0 TCP
10.130.203.8:37962->10.130.203.8:6379 (ESTABLISHED)
agent 13771 agent 15u IPv6 76080655 0t0 TCP
128.199.211.246:webcache->123.63.202.169:49666 (ESTABLISHED)
agent 13771 agent 18u IPv6 76080514 0t0 TCP
128.199.211.246:webcache->123.63.202.169:49663 (ESTABLISHED)

*->* In production lsof output, we found these lines (some hundreds)
agent 15976 agent 1563u sock 0,6 0t0 39770995 *can't
identify protocol*

*->*In server handler, we are setting connection to close
       w.Header().Set("Connection", "close")

    This is when we fetch the content.
       page, err := GetPage(origURL)
       defer page.Body.Close()

     This how we listen
         l, e := net.Listen(proto, srv.Addr)

These are the only things controlled in application code. *No timeouts
specified*. Read from other similar posts that we need to specify timeouts
for read/write.

Let me know how to identify the problem and pointers to solve this. Not
sure how to reproduce the issue. Thanks in advance.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Search Discussions

  • James Bardin at Jul 17, 2015 at 3:26 pm

    On Friday, July 17, 2015 at 10:57:27 AM UTC-4, Adithya Vendra wrote:


    These are the only things controlled in application code. *No timeouts
    specified*. Read from other similar posts that we need to specify
    timeouts for read/write.
    You seemed to have answered your own question. When you have a long running
    server, you need timeouts on everything. The http.Server, http.Client, and
    http.Transport (including the Transport.Dial function) all have applicable
    settings.

    You also need to update your go version. Besides numerous other changes,
    there have been some related bugs fixed in net/http.

    Once you have an updated version of Go, and reasonable timeouts; if you're
    still losing track of connections then we will need to see more specific
    code to reproduce. (FYI, the `can't identify protocol` output is usually
    from not closing a connection in your code, leaving an open FD where the
    socket has been cleaned up, so there's no way to identify the protocol)

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Adithya Vendra at Jul 20, 2015 at 6:12 am

    On Friday, July 17, 2015 at 8:56:29 PM UTC+5:30, James Bardin wrote:

    On Friday, July 17, 2015 at 10:57:27 AM UTC-4, Adithya Vendra wrote:



    These are the only things controlled in application code. *No timeouts
    specified*. Read from other similar posts that we need to specify
    timeouts for read/write.
    You seemed to have answered your own question. When you have a long
    running server, you need timeouts on everything. The http.Server,
    http.Client, and http.Transport (including the Transport.Dial function) all
    have applicable settings.
    Thanks for the answer :-) I will put some timeouts on all the aspects you
    mentioned and will check again. If you have any idea on reproducing the
    issue, that will be helpful :-) Thanks again.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Roger Pack at Jul 20, 2015 at 5:26 pm

    On Monday, July 20, 2015 at 12:12:20 AM UTC-6, Adithya Vendra wrote:

    On Friday, July 17, 2015 at 8:56:29 PM UTC+5:30, James Bardin wrote:


    On Friday, July 17, 2015 at 10:57:27 AM UTC-4, Adithya Vendra wrote:



    These are the only things controlled in application code. *No timeouts
    specified*. Read from other similar posts that we need to specify
    timeouts for read/write.
    You seemed to have answered your own question. When you have a long
    running server, you need timeouts on everything. The http.Server,
    http.Client, and http.Transport (including the Transport.Dial function) all
    have applicable settings.
    Thanks for the answer :-) I will put some timeouts on all the aspects you
    mentioned and will check again. If you have any idea on reproducing the
    issue, that will be helpful :-) Thanks again.
    I assume to repro the problem you'd want to connect a lot of clients and
    have them "wait forever"... [in this case, probably what's happening is the
    clients connection is being aborted, but until you send something to verify
    the connection is still good [like a timeout ping] then you won't detect
    it, so the number of connections grows forever...].



    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedJul 17, '15 at 2:57p
activeJul 20, '15 at 5:26p
posts4
users3
websitegolang.org

People

Translate

site design / logo © 2021 Grokbase