FAQ
We're trying migrating from node 0.6.21 to 0.8.9, and one of our processes
that does a database backup (a couple million items fetched over HTTP and
HTTPS from Amazon SimpleDB and DynamoDB APIs) has started getting
"HPE_INVALID_CHUNK_SIZE" errors, presumably from node's http_parser.c (and
then after that, it seems we're generally handed back a partial response,
and sometimes also a disconnected socket on the next request, which
black-holes the connection without some other fixes). It generally gets
one or two of these errors in the millions of fetched items (maybe 10s-100s
of thousands of individual HTTP requests over just a handful of HTTP 1.1
connections with connection keep-alive). Rolling back to 0.6.21 makes
everything fine. Doing a diff of http_parser.c between the two versions
show's it's had some major work done (though nothing right around where
this error is emitted), so perhaps a bug has been introduced, but I know
nothing of that code and don't know where to start debugging it. Any
suggestions? Any recent changes to related code I should try bisecting?
Anyone else getting this error?

Jimb Esser
Cloud Party, Inc

--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
nodejs+[email protected]
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

Search Discussions

  • Jimb Esser at Sep 20, 2012 at 10:00 pm
    As a follow-up, it looks like the data I get back from the response often
    has "\r\n2000\r\n" in the middle of what otherwise looks like good data, so
    I'm guessing something is going wrong with processing multi-part (chunked?
    whatever is the right term?) responses, and it sometimes reads too much and
    then tries to read the next chunk size from the middle of the response data
    (sometimes getting an "invalid chunk size" error, or if it gets lucky and
    reads a small number, it just truncates my response (though leaves the
    "\r\n2000\r\n" in the middle of it).
    On Thursday, September 20, 2012 1:48:02 PM UTC-7, Jimb Esser wrote:

    We're trying migrating from node 0.6.21 to 0.8.9, and one of our processes
    that does a database backup (a couple million items fetched over HTTP and
    HTTPS from Amazon SimpleDB and DynamoDB APIs) has started getting
    "HPE_INVALID_CHUNK_SIZE" errors, presumably from node's http_parser.c (and
    then after that, it seems we're generally handed back a partial response,
    and sometimes also a disconnected socket on the next request, which
    black-holes the connection without some other fixes). It generally gets
    one or two of these errors in the millions of fetched items (maybe 10s-100s
    of thousands of individual HTTP requests over just a handful of HTTP 1.1
    connections with connection keep-alive). Rolling back to 0.6.21 makes
    everything fine. Doing a diff of http_parser.c between the two versions
    show's it's had some major work done (though nothing right around where
    this error is emitted), so perhaps a bug has been introduced, but I know
    nothing of that code and don't know where to start debugging it. Any
    suggestions? Any recent changes to related code I should try bisecting?
    Anyone else getting this error?

    Jimb Esser
    Cloud Party, Inc
    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to [email protected]
    To unsubscribe from this group, send email to
    nodejs+[email protected]
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Ben Noordhuis at Sep 21, 2012 at 12:45 pm

    On Fri, Sep 21, 2012 at 12:00 AM, Jimb Esser wrote:
    As a follow-up, it looks like the data I get back from the response often
    has "\r\n2000\r\n" in the middle of what otherwise looks like good data, so
    I'm guessing something is going wrong with processing multi-part (chunked?
    whatever is the right term?) responses, and it sometimes reads too much and
    then tries to read the next chunk size from the middle of the response data
    (sometimes getting an "invalid chunk size" error, or if it gets lucky and
    reads a small number, it just truncates my response (though leaves the
    "\r\n2000\r\n" in the middle of it).
    Do you have some captured traffic I can look at?

    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to [email protected]
    To unsubscribe from this group, send email to
    nodejs+[email protected]
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Jimb Esser at Sep 21, 2012 at 4:48 pm
    I have no idea how to capture traffic on Linux, but I could do so if there
    are some good instructions somewhere. The problem generally takes hundreds
    of MB of traffic to reproduce, however, so I'm not sure if logging that
    traffic will affect it and cause it to not occur, but I'm willing to try.

    I did a git bisect and found the revision in the v0.8 history which
    "caused" the issue, but unfortunately, I think that is a red herring, as
    the revision is just one that fixed keep-alive being broken on that branch
    [1], and older than that revision keep-alive is simply off, so the traffic
    is drastically different (not to mention orders of magnitudes slower in my
    test). On the 0.6 branch, however, even with keep-alive functioning
    correctly, I do not see this problem, so that leads me to believe that
    something broke while keep-alive was also broken on the 0.8 branch, which
    makes it much harder to bisect. I may try cherry-picking in the keep-alive
    fix while doing a bisect to see if I can narrow it down, though since each
    test can take up to a half hour before it's conclusive, I'm not sure how
    productive that will be.

    [1]
    https://github.com/Joyent/node/commit/a6f3451e2549d0b82ff18990e8bfe28ea4386aba

    On Friday, September 21, 2012 5:45:57 AM UTC-7, Ben Noordhuis wrote:
    On Fri, Sep 21, 2012 at 12:00 AM, Jimb Esser wrote:
    As a follow-up, it looks like the data I get back from the response often
    has "\r\n2000\r\n" in the middle of what otherwise looks like good data, so
    I'm guessing something is going wrong with processing multi-part (chunked?
    whatever is the right term?) responses, and it sometimes reads too much and
    then tries to read the next chunk size from the middle of the response data
    (sometimes getting an "invalid chunk size" error, or if it gets lucky and
    reads a small number, it just truncates my response (though leaves the
    "\r\n2000\r\n" in the middle of it).
    Do you have some captured traffic I can look at?
    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to [email protected]
    To unsubscribe from this group, send email to
    nodejs+[email protected]
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Ben Noordhuis at Sep 21, 2012 at 11:27 pm

    On Fri, Sep 21, 2012 at 6:40 PM, Jimb Esser wrote:
    I have no idea how to capture traffic on Linux, but I could do so if there
    are some good instructions somewhere. The problem generally takes hundreds
    of MB of traffic to reproduce, however, so I'm not sure if logging that
    traffic will affect it and cause it to not occur, but I'm willing to try.
    tcpdump? It's easy:

    tcpdump -C 1 -G 5 -i <interface> -n -s 1500 -w <filename> tcp port <port>

    The -C and -G options makes it rotate the log file every 5 seconds or
    when it grows to 1 million bytes, whatever comes first. You may need
    to increase the snarf size (-s) to 8192 or higher if you're capturing
    loopback traffic.

    You can replay the traffic with `tcpdump -A -r <filename>`.

    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to [email protected]
    To unsubscribe from this group, send email to
    nodejs+[email protected]
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Jimb Esser at Sep 24, 2012 at 9:37 pm
    I tried tcpdump, and it starts saving off packets (writes a few dump files,
    a couple 10s of MBs), but them seems to stall out - nothing more gets
    written though my app is clearly still receiving data. By the time it gets
    the bad data, it's inevitably stopped by then, and searching through the
    dumps does not contain the bad data.

    I added some logging to http_parser.c, however, simply dumping out the raw
    data that it is parsing, and it looks like the data is bad before it gets
    in there. Specifically, in one case I'm looking at (raw data, though all
    text at [1]) it looks like the tail of the buffer has 4377 bytes that
    begins what should be a 8192 byte chunk, and then a duplicated copy of the
    last ~4708 bytes or so, followed by "0\r\n\r\n". Sadly, now that I'm
    familiar with the http_parser code, this really points to a problem
    somewhere higher up... -_- I'll do some poking around with tcpdump to see
    if I can get it or something else to dump more reliably, it would be good
    to have absolute confirmation that the data was fine coming in at the TCP
    level, although the fact that node 0.6 has no problems makes me pretty
    certain the input is fine.

    On other fronts, I did more tests searching through node's git history, and
    found that before a specific revision [2], this seems to happen much less
    frequently (happened once in 6 runs, as opposed to almost every run), but
    I'm pretty convinced there's nothing wrong with this change (I added some
    logging to both before and after and found nothing behaving differently,
    though the frequency changed when adding logging, so it may just indicate
    it's a delicate timing issue - although it does happen reliably both on my
    work machine, home machine, and AWS instances - all on pretty drastically
    different hardware, and similarly never seems to happen on node 0.6.21 on
    any of those machines (although if it was only happening 1 in 6 times
    before, it could be I would have never seen it in my tests)).

    [1] http://thesilentb.com/files/bad_http_stream.txt
    [2] https://github.com/joyent/node/commit/70033bd
    On Friday, September 21, 2012 4:27:15 PM UTC-7, Ben Noordhuis wrote:
    On Fri, Sep 21, 2012 at 6:40 PM, Jimb Esser wrote:
    I have no idea how to capture traffic on Linux, but I could do so if there
    are some good instructions somewhere. The problem generally takes hundreds
    of MB of traffic to reproduce, however, so I'm not sure if logging that
    traffic will affect it and cause it to not occur, but I'm willing to
    try.

    tcpdump? It's easy:

    tcpdump -C 1 -G 5 -i <interface> -n -s 1500 -w <filename> tcp port
    <port>

    The -C and -G options makes it rotate the log file every 5 seconds or
    when it grows to 1 million bytes, whatever comes first. You may need
    to increase the snarf size (-s) to 8192 or higher if you're capturing
    loopback traffic.

    You can replay the traffic with `tcpdump -A -r <filename>`.
    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to [email protected]
    To unsubscribe from this group, send email to
    nodejs+[email protected]
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Ben Noordhuis at Sep 24, 2012 at 10:27 pm

    On Mon, Sep 24, 2012 at 11:37 PM, Jimb Esser wrote:
    I tried tcpdump, and it starts saving off packets (writes a few dump files,
    a couple 10s of MBs), but them seems to stall out - nothing more gets
    written though my app is clearly still receiving data. By the time it gets
    the bad data, it's inevitably stopped by then, and searching through the
    dumps does not contain the bad data.

    I added some logging to http_parser.c, however, simply dumping out the raw
    data that it is parsing, and it looks like the data is bad before it gets in
    there. Specifically, in one case I'm looking at (raw data, though all text
    at [1]) it looks like the tail of the buffer has 4377 bytes that begins what
    should be a 8192 byte chunk, and then a duplicated copy of the last ~4708
    bytes or so, followed by "0\r\n\r\n". Sadly, now that I'm familiar with the
    http_parser code, this really points to a problem somewhere higher up... -_-
    I'll do some poking around with tcpdump to see if I can get it or something
    else to dump more reliably, it would be good to have absolute confirmation
    that the data was fine coming in at the TCP level, although the fact that
    node 0.6 has no problems makes me pretty certain the input is fine.

    On other fronts, I did more tests searching through node's git history, and
    found that before a specific revision [2], this seems to happen much less
    frequently (happened once in 6 runs, as opposed to almost every run), but
    I'm pretty convinced there's nothing wrong with this change (I added some
    logging to both before and after and found nothing behaving differently,
    though the frequency changed when adding logging, so it may just indicate
    it's a delicate timing issue - although it does happen reliably both on my
    work machine, home machine, and AWS instances - all on pretty drastically
    different hardware, and similarly never seems to happen on node 0.6.21 on
    any of those machines (although if it was only happening 1 in 6 times
    before, it could be I would have never seen it in my tests)).

    [1] http://thesilentb.com/files/bad_http_stream.txt
    [2] https://github.com/joyent/node/commit/70033bd
    Can you post the patch you used for logging? I don't see anything
    obviously wrong in the dump itself.

    It's kind of odd that tcpdump hangs, I've never had issues with it
    myself, even on maxed out Gbit interfaces.

    On a completely unrelated note:

    {"S":"[6.24719116,-3.552713678800501e-15,2.252045820000001]"}

    Clamp your floats, man! :-)

    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to [email protected]
    To unsubscribe from this group, send email to
    nodejs+[email protected]
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Jimb Esser at Sep 24, 2012 at 10:49 pm
    Patch for logging was to replace the two occurrences of
    SET_ERRNO(HPE_INVALID_CHUNK_SIZE) with:
    {
    const char *p2;
    for (p2=data; p2 != data + len; p2++) {
    fprintf(stderr, "0x%02x, ", *p2);
    if (p == p2) {
    fprintf(stderr, "/******/ ");
    }
    }
    }
    fprintf(stderr, "HPE_INVALID_CHUNK_SIZE case 1\n");
    SET_ERRNO(HPE_INVALID_CHUNK_SIZE);

    I then loaded the stderr output into a trivial JS program to write it out
    as a binary file.

    When you say you don't see anything obviously wrong in the dump itself, do
    you mean in looks like a valid HTTP stream? It looks to me like, near the
    end, it says an 8k data chunk is coming (2000\r\n), and then provides less
    than 8k data and starts another chunk (with the original 8k terminating in
    the middle of the second chunk). I am basing this entirely on assumptions
    about the HTTP protocol gleaned from assumptions and reading http_parser.c,
    though, so I could be quite mistaken =).
    On Monday, September 24, 2012 3:27:36 PM UTC-7, Ben Noordhuis wrote:


    Can you post the patch you used for logging? I don't see anything
    obviously wrong in the dump itself.

    It's kind of odd that tcpdump hangs, I've never had issues with it
    myself, even on maxed out Gbit interfaces.

    On a completely unrelated note:

    {"S":"[6.24719116,-3.552713678800501e-15,2.252045820000001]"}

    Clamp your floats, man! :-)
    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to [email protected]
    To unsubscribe from this group, send email to
    nodejs+[email protected]
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Ben Noordhuis at Sep 25, 2012 at 1:17 am

    On Tue, Sep 25, 2012 at 12:48 AM, Jimb Esser wrote:
    Patch for logging was to replace the two occurrences of
    SET_ERRNO(HPE_INVALID_CHUNK_SIZE) with:
    {
    const char *p2;
    for (p2=data; p2 != data + len; p2++) {
    fprintf(stderr, "0x%02x, ", *p2);
    if (p == p2) {
    fprintf(stderr, "/******/ ");
    }
    }
    }
    fprintf(stderr, "HPE_INVALID_CHUNK_SIZE case 1\n");
    SET_ERRNO(HPE_INVALID_CHUNK_SIZE);

    I then loaded the stderr output into a trivial JS program to write it out as
    a binary file.

    When you say you don't see anything obviously wrong in the dump itself, do
    you mean in looks like a valid HTTP stream? It looks to me like, near the
    end, it says an 8k data chunk is coming (2000\r\n), and then provides less
    than 8k data and starts another chunk (with the original 8k terminating in
    the middle of the second chunk). I am basing this entirely on assumptions
    about the HTTP protocol gleaned from assumptions and reading http_parser.c,
    though, so I could be quite mistaken =).
    The HTTP parser works across TCP packet boundaries. A request or
    response doesn't necessarily fit in a single packet.

    Unless you're benchmarking with a concurrency of 1 (a single client
    that issues sequential requests), you'll see HTTP requests and
    responses getting interleaved.

    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to [email protected]
    To unsubscribe from this group, send email to
    nodejs+[email protected]
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Jimb Esser at Sep 25, 2012 at 4:38 am
    Yes, but, as I understand, this is logging during processing a single
    stream of data, in a single call to http_parser_execute(), specifically
    inside of a for loop looping over exactly the data I logged. That should
    be internally consistent, at least, right? If the stream says it has an
    8192 byte chunk coming, and there's more than that many bytes of date, 8192
    bytes later there should be another chunk size in the stream (which, there
    is not, which of course is why it's throwing the HPE_INVALID_CHUNK_SIZE
    error as it reads some non-hex characters at the offset its expecting the
    next chunk size).

    Anyway, I managed to capture it as it went bad with tcpdump (I added -U -B
    1000000, not sure if that helped or the timing was just different). Short
    result is: it looks like the data was bad as it came in over TCP! I'll go
    into detail and post the dumps below, but this quickly leads me to ask
    (after apologizing about thinking this was a node bug): why does this
    generally fail on 0.8 and never on 0.6?

    My guess is that on 0.6 an HTTP parse error simply disconnected the socket,
    and all of our AWS APIs simply retry any requests which get a socket
    disconnect (as those happen rather frequently with AWS). We also
    automatically retry on a bunch of 5xx http response codes, so if any of
    those were being generated, we'd retry on those as well. I'll add some
    additional logging and run this on node 0.6 again to see if I can verify
    this.

    Back to the details on the symptom at hand (just in case
    I misinterpreted some of this data, it might be good to get another set of
    eyes on it). In this case, it didn't actually get a http parser error, it
    simply handed back what looks like bad data (includes the HTTP chunk size
    lines inserted in the middle of it), perhaps because the socket got closed
    or something (although it seems if it got closed prematurely that perhaps
    that should have generated a parsing error since it was expecting more
    bytes? I'm not sure on this, exactly). I've uploaded the parsed/printed
    TCP dump of the end of the relevant stream (starting from before the last
    good chunk size line) [1], as well as the tail of the data that was
    returned to my app, starting from the same point [2].

    As far as I can tell, the stream indicated there was an 8kb chunk coming,
    8kb later it indicates another 8kb chunk is coming, but then 1296 bytes
    later there's another chunk size line ("\r\n2000\r\n"). At the packet
    level, the final packet (starting at seq 230681) appears to be simply
    repeated data from 1304 bytes back in the stream, but then terminated
    unexpectedly (never gets to the 8kb of data the earlier chunk size
    indicated was coming). No other packets show up in this stream in my dump.

    I can not think of any possible thing on our side (either my code or
    node's) that could cause this, so my conclusion is that something is going
    wrong within Amazon's Dynamo DB (or routing between there and our servers,
    though I've tried both inside and outside of their datacenters with the
    same results). In retrospect, every bad packet I've gotten has been on
    DynamoDB queries, despite us slamming a couple other Amazon APIs with
    similar levels of traffic during this test, so that also points the finger
    in that direction. I think I will try and bring this up on the DynamoDB
    forums, and just make our DDB client more robust if it gets low-level
    parsing errors (although I'm a little afraid of what could happen if it
    happens to accidentally parse correctly because of some of our data
    happening to match up!).

    Thank you very much for your help with this.

    [1] http://thesilentb.com/files/tcpdump.txt
    [2] http://thesilentb.com/files/packet.txt

    Jimb Esser
    Cloud Party, Inc
    On Monday, September 24, 2012 6:17:18 PM UTC-7, Ben Noordhuis wrote:
    On Tue, Sep 25, 2012 at 12:48 AM, Jimb Esser wrote:
    Patch for logging was to replace the two occurrences of
    SET_ERRNO(HPE_INVALID_CHUNK_SIZE) with:
    {
    const char *p2;
    for (p2=data; p2 != data + len; p2++) {
    fprintf(stderr, "0x%02x, ", *p2);
    if (p == p2) {
    fprintf(stderr, "/******/ ");
    }
    }
    }
    fprintf(stderr, "HPE_INVALID_CHUNK_SIZE case 1\n");
    SET_ERRNO(HPE_INVALID_CHUNK_SIZE);

    I then loaded the stderr output into a trivial JS program to write it out as
    a binary file.

    When you say you don't see anything obviously wrong in the dump itself, do
    you mean in looks like a valid HTTP stream? It looks to me like, near the
    end, it says an 8k data chunk is coming (2000\r\n), and then provides less
    than 8k data and starts another chunk (with the original 8k terminating in
    the middle of the second chunk). I am basing this entirely on
    assumptions
    about the HTTP protocol gleaned from assumptions and reading
    http_parser.c,
    though, so I could be quite mistaken =).
    The HTTP parser works across TCP packet boundaries. A request or
    response doesn't necessarily fit in a single packet.

    Unless you're benchmarking with a concurrency of 1 (a single client
    that issues sequential requests), you'll see HTTP requests and
    responses getting interleaved.
    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to [email protected]
    To unsubscribe from this group, send email to
    nodejs+[email protected]
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Ben Noordhuis at Sep 25, 2012 at 12:32 pm

    On Tue, Sep 25, 2012 at 6:38 AM, Jimb Esser wrote:
    Yes, but, as I understand, this is logging during processing a single stream
    of data, in a single call to http_parser_execute(), specifically inside of a
    for loop looping over exactly the data I logged. That should be internally
    consistent, at least, right? If the stream says it has an 8192 byte chunk
    coming, and there's more than that many bytes of date, 8192 bytes later
    there should be another chunk size in the stream (which, there is not, which
    of course is why it's throwing the HPE_INVALID_CHUNK_SIZE error as it reads
    some non-hex characters at the offset its expecting the next chunk size).

    Anyway, I managed to capture it as it went bad with tcpdump (I added -U -B
    1000000, not sure if that helped or the timing was just different). Short
    result is: it looks like the data was bad as it came in over TCP! I'll go
    into detail and post the dumps below, but this quickly leads me to ask
    (after apologizing about thinking this was a node bug): why does this
    generally fail on 0.8 and never on 0.6?

    My guess is that on 0.6 an HTTP parse error simply disconnected the socket,
    and all of our AWS APIs simply retry any requests which get a socket
    disconnect (as those happen rather frequently with AWS). We also
    automatically retry on a bunch of 5xx http response codes, so if any of
    those were being generated, we'd retry on those as well. I'll add some
    additional logging and run this on node 0.6 again to see if I can verify
    this.

    Back to the details on the symptom at hand (just in case I misinterpreted
    some of this data, it might be good to get another set of eyes on it). In
    this case, it didn't actually get a http parser error, it simply handed back
    what looks like bad data (includes the HTTP chunk size lines inserted in the
    middle of it), perhaps because the socket got closed or something (although
    it seems if it got closed prematurely that perhaps that should have
    generated a parsing error since it was expecting more bytes? I'm not sure on
    this, exactly). I've uploaded the parsed/printed TCP dump of the end of the
    relevant stream (starting from before the last good chunk size line) [1], as
    well as the tail of the data that was returned to my app, starting from the
    same point [2].

    As far as I can tell, the stream indicated there was an 8kb chunk coming,
    8kb later it indicates another 8kb chunk is coming, but then 1296 bytes
    later there's another chunk size line ("\r\n2000\r\n"). At the packet
    level, the final packet (starting at seq 230681) appears to be simply
    repeated data from 1304 bytes back in the stream, but then terminated
    unexpectedly (never gets to the 8kb of data the earlier chunk size indicated
    was coming). No other packets show up in this stream in my dump.

    I can not think of any possible thing on our side (either my code or node's)
    that could cause this, so my conclusion is that something is going wrong
    within Amazon's Dynamo DB (or routing between there and our servers, though
    I've tried both inside and outside of their datacenters with the same
    results). In retrospect, every bad packet I've gotten has been on DynamoDB
    queries, despite us slamming a couple other Amazon APIs with similar levels
    of traffic during this test, so that also points the finger in that
    direction. I think I will try and bring this up on the DynamoDB forums, and
    just make our DDB client more robust if it gets low-level parsing errors
    (although I'm a little afraid of what could happen if it happens to
    accidentally parse correctly because of some of our data happening to match
    up!).

    Thank you very much for your help with this.

    [1] http://thesilentb.com/files/tcpdump.txt
    [2] http://thesilentb.com/files/packet.txt
    Glad you got it solved - or the next best thing, know what the culprit is. :-)

    The reason that you're seeing it with 0.8 and not 0.6 probably is that
    http_parser.c became considerably more strict. Error handling on the
    JS level hasn't changed significantly.

    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to [email protected]
    To unsubscribe from this group, send email to
    nodejs+[email protected]
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Jimb Esser at Sep 25, 2012 at 5:00 pm
    Well, I did a bunch of testing on node 0.6, and after running for hours, I
    received no http parsing errors, no JSON parsing errors on the data
    returned, and no unexpected socket disconnects. I generally get one or two
    of these within 5-10 minutes on node 0.8. So, there's definitely something
    different node 0.8 is doing that is affecting this, though the tcpdump
    definitely indicates the problem is on the other end, so there's basically
    nothing more to try, but it is still worrisome :(. I think we'll have to
    stick on node 0.6 at least in production for a while yet.

    For posterity, the thread I started with Amazon is here (no responses yet):
    https://forums.aws.amazon.com/thread.jspa?threadID=105268

    Jimb Esser
    On Tuesday, September 25, 2012 5:32:47 AM UTC-7, Ben Noordhuis wrote:
    On Tue, Sep 25, 2012 at 6:38 AM, Jimb Esser wrote:
    Yes, but, as I understand, this is logging during processing a single stream
    of data, in a single call to http_parser_execute(), specifically inside of a
    for loop looping over exactly the data I logged. That should be
    internally
    consistent, at least, right? If the stream says it has an 8192 byte chunk
    coming, and there's more than that many bytes of date, 8192 bytes later
    there should be another chunk size in the stream (which, there is not, which
    of course is why it's throwing the HPE_INVALID_CHUNK_SIZE error as it reads
    some non-hex characters at the offset its expecting the next chunk size).
    Anyway, I managed to capture it as it went bad with tcpdump (I added -U -B
    1000000, not sure if that helped or the timing was just different). Short
    result is: it looks like the data was bad as it came in over TCP! I'll go
    into detail and post the dumps below, but this quickly leads me to ask
    (after apologizing about thinking this was a node bug): why does this
    generally fail on 0.8 and never on 0.6?

    My guess is that on 0.6 an HTTP parse error simply disconnected the socket,
    and all of our AWS APIs simply retry any requests which get a socket
    disconnect (as those happen rather frequently with AWS). We also
    automatically retry on a bunch of 5xx http response codes, so if any of
    those were being generated, we'd retry on those as well. I'll add some
    additional logging and run this on node 0.6 again to see if I can verify
    this.

    Back to the details on the symptom at hand (just in case I
    misinterpreted
    some of this data, it might be good to get another set of eyes on it). In
    this case, it didn't actually get a http parser error, it simply handed back
    what looks like bad data (includes the HTTP chunk size lines inserted in the
    middle of it), perhaps because the socket got closed or something (although
    it seems if it got closed prematurely that perhaps that should have
    generated a parsing error since it was expecting more bytes? I'm not sure on
    this, exactly). I've uploaded the parsed/printed TCP dump of the end of the
    relevant stream (starting from before the last good chunk size line) [1], as
    well as the tail of the data that was returned to my app, starting from the
    same point [2].

    As far as I can tell, the stream indicated there was an 8kb chunk coming,
    8kb later it indicates another 8kb chunk is coming, but then 1296 bytes
    later there's another chunk size line ("\r\n2000\r\n"). At the packet
    level, the final packet (starting at seq 230681) appears to be simply
    repeated data from 1304 bytes back in the stream, but then terminated
    unexpectedly (never gets to the 8kb of data the earlier chunk size indicated
    was coming). No other packets show up in this stream in my dump.

    I can not think of any possible thing on our side (either my code or node's)
    that could cause this, so my conclusion is that something is going wrong
    within Amazon's Dynamo DB (or routing between there and our servers, though
    I've tried both inside and outside of their datacenters with the same
    results). In retrospect, every bad packet I've gotten has been on DynamoDB
    queries, despite us slamming a couple other Amazon APIs with similar levels
    of traffic during this test, so that also points the finger in that
    direction. I think I will try and bring this up on the DynamoDB forums, and
    just make our DDB client more robust if it gets low-level parsing errors
    (although I'm a little afraid of what could happen if it happens to
    accidentally parse correctly because of some of our data happening to match
    up!).

    Thank you very much for your help with this.

    [1] http://thesilentb.com/files/tcpdump.txt
    [2] http://thesilentb.com/files/packet.txt
    Glad you got it solved - or the next best thing, know what the culprit is.
    :-)

    The reason that you're seeing it with 0.8 and not 0.6 probably is that
    http_parser.c became considerably more strict. Error handling on the
    JS level hasn't changed significantly.
    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to [email protected]
    To unsubscribe from this group, send email to
    nodejs+[email protected]
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupnodejs @
categoriesnodejs
postedSep 20, '12 at 8:48p
activeSep 25, '12 at 5:00p
posts12
users2
websitenodejs.org
irc#node.js

2 users in discussion

Jimb Esser: 7 posts Ben Noordhuis: 5 posts

People

Translate

site design / logo © 2023 Grokbase