On Wed, 20 Oct 2010 19:49:09 +0200 Erik Forsberg wrote:
Hi!
I'm running Cloudera CDH2 update 2 (hadoop-0.20 0.20.1+169.113), and
after the upgrade I'm getting the following error in the reducers
during the copy phase in one of my larger jobs:
2010-10-20 17:43:22,343 INFO org.apache.hadoop.mapred.ReduceTask:
Initiating in-memory merge with 12 segments... 2010-10-20 17:43:22,344
INFO org.apache.hadoop.mapred.Merger: Merging 12 sorted segments
2010-10-20 17:43:22,344 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 12 segments left of total size: 382660295
bytes 2010-10-20 17:43:22,517 WARN
org.apache.hadoop.mapred.ReduceTask:
attempt_201010201640_0001_r_000000_0 Merging of the local FS files
threw an exception: java.io.IOException: java.lang.RuntimeException:
java.io.EOFException at
org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:128)
Hi!
I'm running Cloudera CDH2 update 2 (hadoop-0.20 0.20.1+169.113), and
after the upgrade I'm getting the following error in the reducers
during the copy phase in one of my larger jobs:
2010-10-20 17:43:22,343 INFO org.apache.hadoop.mapred.ReduceTask:
Initiating in-memory merge with 12 segments... 2010-10-20 17:43:22,344
INFO org.apache.hadoop.mapred.Merger: Merging 12 sorted segments
2010-10-20 17:43:22,344 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 12 segments left of total size: 382660295
bytes 2010-10-20 17:43:22,517 WARN
org.apache.hadoop.mapred.ReduceTask:
attempt_201010201640_0001_r_000000_0 Merging of the local FS files
threw an exception: java.io.IOException: java.lang.RuntimeException:
java.io.EOFException at
org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:128)
other error that could tell me more about what's wrong?
I'm seeing quite a few of these in my datanode logs:
2010-10-21 10:21:01,149 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(10.20.11.66:50010,
storageID=DS-71308762-10.20.11.66-50010-1269957604444, infoPort= 50075,
ipcPort=50020):Got exception while serving
blk_1081044479123523815_4852013 to /10.20.11.88:
java.net.SocketTimeoutException: 480000 millis timeout while waiting
for channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/10.20.11.66:50010
remote =/10.20.11.88:41347] at
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
at
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
at
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:401)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
at java.lang.Thread.run(Thread.java:619)
Could that be related somehow?
I'm also seeing large amounts of mortbay exceptions, but MAPREDUCE-5 says they are harmless.
*) Running with and without compressed map output, no difference.
*) With -Xmx512m and -Xmx768m, no difference.
*) Decreasing number of mappers and reducers on all nodes to decrease
overall load.
*) Decreasing mapred.reduce.parallel.copies from 16 to 5 (default)
Also tried doubling the number of reducers to get each reducer to*) With -Xmx512m and -Xmx768m, no difference.
*) Decreasing number of mappers and reducers on all nodes to decrease
overall load.
*) Decreasing mapred.reduce.parallel.copies from 16 to 5 (default)
process less data, but that didn't help either :-(
\EF