|| at Apr 13, 2011 at 8:07 pm
Thanks for replying. By now we are also pretty sure it's an issue in the
hardware layer. We have updated the system (kernel/NIC drivers)
therefore eliminating any possible bugs in there. But still encountering
timeouts and dropped packets.
My bad, I was not aware that cloudera releases could not be discussed
here at all. I was thinking that even though cloudera releases are
somewhat different, issues that are probably generic could still
discussed here. (Surely I would use the cloudera lists when I'm pretty
sure it's absolutely specific to cloudera).
Anyway, I will update the list when we have figured the problem out. The
right list, cdh-user ;)
On 04/13/2011 09:22 PM, Eli Collins wrote:
If you're seeing this after bumping fs.datanode.max.xcievers and the
nfiles ulimit, and you're also seeing dropped packets it sounds like
you're having networking issues.
See the following as well:https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/1d3a377bd605e1bd/d3d8ec0d14c065bb?#d3d8ec0d14c065bb
On Tue, Apr 12, 2011 at 10:37 AM, Ferdy Galemawrote:
We're running into issues were we are seeing timeouts when writing/reading a
lot of hdfs data. (Hadoop is version CDH4B3 and hdfs appending is enabled).
The type of exceptions vary a lot, but most of the times it's whenever a
DFSClient writes data into the datanodes pipeline.
For example, one datanode logs "Exception in receiveBlock for block
blk_5476601577216704980_62953994 java.io.EOFException: while trying to read
65557 bytes" and the other side logs "writeBlock
blk_5476601577216704980_62953994 received exception
java.net.SocketTimeoutException: Read timed out". That's it.
We cannot seem to determine the exact problem. The read timeout is default
(60 sec). The open files limit and the number of xceivers is upped a lot. A
full GC never takes longer than a second.
However, we are seeing a lot of dropped packages on the networking
interface. Could these problems be related?
Any advice will be helpful.