Yes, but, as I understand, this is logging during processing a single
stream of data, in a single call to http_parser_execute(), specifically
inside of a for loop looping over exactly the data I logged. That should
be internally consistent, at least, right? If the stream says it has an
8192 byte chunk coming, and there's more than that many bytes of date, 8192
bytes later there should be another chunk size in the stream (which, there
is not, which of course is why it's throwing the HPE_INVALID_CHUNK_SIZE
error as it reads some non-hex characters at the offset its expecting the
next chunk size).
Anyway, I managed to capture it as it went bad with tcpdump (I added -U -B
1000000, not sure if that helped or the timing was just different). Short
result is: it looks like the data was bad as it came in over TCP! I'll go
into detail and post the dumps below, but this quickly leads me to ask
(after apologizing about thinking this was a node bug): why does this
generally fail on 0.8 and never on 0.6?
My guess is that on 0.6 an HTTP parse error simply disconnected the socket,
and all of our AWS APIs simply retry any requests which get a socket
disconnect (as those happen rather frequently with AWS). We also
automatically retry on a bunch of 5xx http response codes, so if any of
those were being generated, we'd retry on those as well. I'll add some
additional logging and run this on node 0.6 again to see if I can verify
this.
Back to the details on the symptom at hand (just in case
I misinterpreted some of this data, it might be good to get another set of
eyes on it). In this case, it didn't actually get a http parser error, it
simply handed back what looks like bad data (includes the HTTP chunk size
lines inserted in the middle of it), perhaps because the socket got closed
or something (although it seems if it got closed prematurely that perhaps
that should have generated a parsing error since it was expecting more
bytes? I'm not sure on this, exactly). I've uploaded the parsed/printed
TCP dump of the end of the relevant stream (starting from before the last
good chunk size line) [1], as well as the tail of the data that was
returned to my app, starting from the same point [2].
As far as I can tell, the stream indicated there was an 8kb chunk coming,
8kb later it indicates another 8kb chunk is coming, but then 1296 bytes
later there's another chunk size line ("\r\n2000\r\n"). At the packet
level, the final packet (starting at seq 230681) appears to be simply
repeated data from 1304 bytes back in the stream, but then terminated
unexpectedly (never gets to the 8kb of data the earlier chunk size
indicated was coming). No other packets show up in this stream in my dump.
I can not think of any possible thing on our side (either my code or
node's) that could cause this, so my conclusion is that something is going
wrong within Amazon's Dynamo DB (or routing between there and our servers,
though I've tried both inside and outside of their datacenters with the
same results). In retrospect, every bad packet I've gotten has been on
DynamoDB queries, despite us slamming a couple other Amazon APIs with
similar levels of traffic during this test, so that also points the finger
in that direction. I think I will try and bring this up on the DynamoDB
forums, and just make our DDB client more robust if it gets low-level
parsing errors (although I'm a little afraid of what could happen if it
happens to accidentally parse correctly because of some of our data
happening to match up!).
Thank you very much for your help with this.
[1]
http://thesilentb.com/files/tcpdump.txt[2]
http://thesilentb.com/files/packet.txtJimb Esser
Cloud Party, Inc
On Monday, September 24, 2012 6:17:18 PM UTC-7, Ben Noordhuis wrote:On Tue, Sep 25, 2012 at 12:48 AM, Jimb Esser wrote:
Patch for logging was to replace the two occurrences of
SET_ERRNO(HPE_INVALID_CHUNK_SIZE) with:
{
const char *p2;
for (p2=data; p2 != data + len; p2++) {
fprintf(stderr, "0x%02x, ", *p2);
if (p == p2) {
fprintf(stderr, "/******/ ");
}
}
}
fprintf(stderr, "HPE_INVALID_CHUNK_SIZE case 1\n");
SET_ERRNO(HPE_INVALID_CHUNK_SIZE);
I then loaded the stderr output into a trivial JS program to write it out as
a binary file.
When you say you don't see anything obviously wrong in the dump itself, do
you mean in looks like a valid HTTP stream? It looks to me like, near the
end, it says an 8k data chunk is coming (2000\r\n), and then provides less
than 8k data and starts another chunk (with the original 8k terminating in
the middle of the second chunk). I am basing this entirely on
assumptions
about the HTTP protocol gleaned from assumptions and reading
http_parser.c,
though, so I could be quite mistaken =).
The HTTP parser works across TCP packet boundaries. A request or
response doesn't necessarily fit in a single packet.
Unless you're benchmarking with a concurrency of 1 (a single client
that issues sequential requests), you'll see HTTP requests and
responses getting interleaved.
--
Job Board:
http://jobs.nodejs.org/Posting guidelines:
https://github.com/joyent/node/wiki/Mailing-List-Posting-GuidelinesYou received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to
[email protected]To unsubscribe from this group, send email to
nodejs+
[email protected]For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en