On Wed, 24 Nov 2010 10:30:09 +0100 Erik Forsberg wrote:
Hi!
I'm having some trouble with Map/Reduce jobs failing due to HDFS
errors. I've been digging around the logs trying to figure out what's
happening, and I see the following in the datanode logs:
2010-11-19 10:27:01,059 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in
BlockReceiver.lastNodeRun: java.io.IOException: No temporary
file /opera/log4/hadoop/dfs/data/tmp/blk_-8143694940938019938 for
block blk_-8143694940938019938_6144372 at <snip>
What would be the possible causes of such exceptions?
Hi!
I'm having some trouble with Map/Reduce jobs failing due to HDFS
errors. I've been digging around the logs trying to figure out what's
happening, and I see the following in the datanode logs:
2010-11-19 10:27:01,059 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in
BlockReceiver.lastNodeRun: java.io.IOException: No temporary
file /opera/log4/hadoop/dfs/data/tmp/blk_-8143694940938019938 for
block blk_-8143694940938019938_6144372 at <snip>
What would be the possible causes of such exceptions?
the datanode was already running, which caused it to try to start a
second datanode. That in turn seems to cause tmp directories to be
cleaned before the second datanode finds out that the storage
directories are locked. Some kind of race condition I would guess,
because it only happens on systems with high load.
More details here:
https://groups.google.com/a/cloudera.org/group/cdh-user/browse_frm/thread/d4572d2d1191be91#
\EF