FAQ
Hi!

After a rewrite of one of my larger jobs, there's always a few reducer fails. The job usually manages to get through due to hadoop restarting the failing jobs, but obviously it's slowing down my total execution time.

I'm running cloudera's distribution, CDH2, Hadoop 0.20.1+169.89.

Copy and shuffle phase are OK, but after a while the reduce phase gets into HDFS trouble. I've followed the error messages for a couple of blocks below. So, given the errors below, any ideas on what to try?

*) Am I overloading my cluster somehow? What to look for to find out?
*) Should I maybe decrease my number of reducer slots?
*) Can I decide on a per-job basis the max number of reducers to run concurrently? It's only this job that's
causing trouble, other jobs that write less data have no problems.
*) The job is writing its output to MultipleSequenceFileOutputFormat, could that be related?

I see two different types of errors. The first seems to trigger java.io.IOException: No temporary file /opera/log1/hadoop/dfs/data/tmp/<file> on the datanodes, while the second seems to trigger java.io.IOException: Meta file not found, blockFile=/opera/log4/hadoop/dfs/data/tmp

Thanks,
\EF

Here's an example of the first type, following the logs for the block. Reducer log (job ran on tasktracker on 10.20.11.89):

2010-10-28 10:43:12,392 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_1784371483411986483_5140381java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at java.io.DataInputStream.readLong(DataInputStream.java:399)
at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:119)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2410)

2010-10-28 10:43:12,395 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_1784371483411986483_5140381 bad datanode[0] 10.20.11.89:50010
2010-10-28 10:43:12,395 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_1784371483411986483_5140381 in pipeline 10.20.11.89:50010, 10.20.11.65:50010, 10.20.11.56:50010: bad datanode 10.20.11.89:50010

Namenode log for block:

2010-10-28 10:31:37,431 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /outputdir/daily=20101027/part-00057. blk_1784371483411986483_5140381
2010-10-28 10:43:12,409 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.20.11.56:50010 is added to blk_1784371483411986483_5140381 size 99916481
2010-10-28 10:43:12,409 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.20.11.65:50010 is added to blk_1784371483411986483_5140381 size 99916481
2010-10-28 10:43:12,458 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(lastblock=blk_1784371483411986483_5140381, newgenerationstamp=5141711, newlength=99916481, newtargets=[10.20.11.65:50010, 10.20.11.56:50010], closeFile=false, deleteBlock=false)

Datanode log on 10.20.11.89:

2010-10-28 10:31:37,412 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_1784371483411986483_5140381 src: /10.20.11.89:54906 dest: /10.20.11.89:50010
java.io.IOException: No temporary file /opera/log1/hadoop/dfs/data/tmp/blk_1784371483411986483 for block blk_1784371483411986483_5140381
at org.apache.hadoop.hdfs.server.datanode.FSDataset.finalizeBlock(FSDataset.java:1240)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:955)
at java.lang.Thread.run(Thread.java:619)
2010-10-28 10:43:12,390 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for block blk_1784371483411986483_5140381 terminating

Datanode log on 10.20.11.65 has no mention on block. Datanode log on 10.20.11.89:

2010-10-28 10:31:37,443 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_1784371483411986483_5140381 src: /10.20.11.65:58194 dest: /10.20.11.56:50010
2010-10-28 10:43:12,373 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.20.11.65:58194, dest: /10.20.11.56:50010, bytes: 99916481, op: HDFS_WRITE, cliID: DFSClient_attempt_201010251211_4053_r_000057_1, offset: 0, srvID: DS-869973382-10.20.11.56-50010-1254491292131, blockid: blk_1784371483411986483_5140381, duration: 694916828459
2010-10-28 10:43:12,373 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_1784371483411986483_5140381 terminating
2010-10-28 10:43:12,408 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Client calls recoverBlock(block=blk_1784371483411986483_5140381, targets=[10.20.11.65:50010, 10.20.11.56:50010])
2010-10-28 10:43:12,422 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: oldblock=blk_1784371483411986483_5140381(length=99916481), newblock=blk_1784371483411986483_5141711(length=99916481), datanode=10.20.11.56:50010


And then there's the second type which starts in the same reducer log as:

2010-10-28 10:43:14,138 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_4986602050098481381_5140430java.io.IOException: Bad response 1 for block blk_4986602050098481381_5140430 from datanode 10.20.11.86:50010
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2440)

2010-10-28 10:43:14,139 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4986602050098481381_5140430 bad datanode[2] 10.20.11.86:50010
2010-10-28 10:43:14,139 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4986602050098481381_5140430 in pipeline 10.20.11.89:50010, 10.20.11.88:50010, 10.20.11.86:50010: bad datanode 10.20.11.86:50010
2010-10-28 10:43:14,149 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4986602050098481381_5140430 failed because recovery from primary datanode 10.20.11.88:50010 failed 1 times. Pipeline was 10.20.11.89:50010, 10.20.11.88:50010, 10.20.11.86:50010. Will retry...
2010-10-28 10:43:15,151 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4986602050098481381_5140430 bad datanode[2] 10.20.11.86:50010
2010-10-28 10:43:15,151 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4986602050098481381_5140430 in pipeline 10.20.11.89:50010, 10.20.11.88:50010, 10.20.11.86:50010: bad datanode 10.20.11.86:50010
2010-10-28 10:43:15,164 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4986602050098481381_5140430 failed because recovery from primary datanode 10.20.11.88:50010 failed 2 times. Pipeline was 10.20.11.89:50010, 10.20.11.88:50010, 10.20.11.86:50010. Will retry...
2010-10-28 10:43:16,164 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4986602050098481381_5140430 bad datanode[2] 10.20.11.86:50010
2010-10-28 10:43:16,164 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4986602050098481381_5140430 in pipeline 10.20.11.89:50010, 10.20.11.88:50010, 10.20.11.86:50010: bad datanode 10.20.11.86:50010
2010-10-28 10:43:16,177 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4986602050098481381_5140430 failed because recovery from primary datanode 10.20.11.88:50010 failed 3 times. Pipeline was 10.20.11.89:50010, 10.20.11.88:50010, 10.20.11.86:50010. Will retry...
2010-10-28 10:43:17,177 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4986602050098481381_5140430 bad datanode[2] 10.20.11.86:50010
2010-10-28 10:43:17,177 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4986602050098481381_5140430 in pipeline 10.20.11.89:50010, 10.20.11.88:50010, 10.20.11.86:50010: bad datanode 10.20.11.86:50010
2010-10-28 10:43:17,192 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4986602050098481381_5140430 failed because recovery from primary datanode 10.20.11.88:50010 failed 4 times. Pipeline was 10.20.11.89:50010, 10.20.11.88:50010, 10.20.11.86:50010. Will retry...
2010-10-28 10:43:18,192 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4986602050098481381_5140430 bad datanode[2] 10.20.11.86:50010
2010-10-28 10:43:18,192 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4986602050098481381_5140430 in pipeline 10.20.11.89:50010, 10.20.11.88:50010, 10.20.11.86:50010: bad datanode 10.20.11.86:50010
2010-10-28 10:43:18,197 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4986602050098481381_5140430 failed because recovery from primary datanode 10.20.11.88:50010 failed 5 times. Pipeline was 10.20.11.89:50010, 10.20.11.88:50010, 10.20.11.86:50010. Will retry...
2010-10-28 10:43:19,197 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4986602050098481381_5140430 bad datanode[2] 10.20.11.86:50010
2010-10-28 10:43:19,198 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4986602050098481381_5140430 in pipeline 10.20.11.89:50010, 10.20.11.88:50010, 10.20.11.86:50010: bad datanode 10.20.11.86:50010
2010-10-28 10:43:19,203 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4986602050098481381_5140430 failed because recovery from primary datanode 10.20.11.88:50010 failed 6 times. Pipeline was 10.20.11.89:50010, 10.20.11.88:50010, 10.20.11.86:50010. Marking primary datanode as bad.
2010-10-28 10:43:20,213 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4986602050098481381_5140430 failed because recovery from primary datanode 10.20.11.86:50010 failed 1 times. Pipeline was 10.20.11.89:50010, 10.20.11.86:50010. Will retry...
2010-10-28 10:43:21,220 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4986602050098481381_5140430 failed because recovery from primary datanode 10.20.11.86:50010 failed 2 times. Pipeline was 10.20.11.89:50010, 10.20.11.86:50010. Will retry...
2010-10-28 10:43:22,228 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4986602050098481381_5140430 failed because recovery from primary datanode 10.20.11.86:50010 failed 3 times. Pipeline was 10.20.11.89:50010, 10.20.11.86:50010. Will retry...
2010-10-28 10:43:23,237 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4986602050098481381_5140430 failed because recovery from primary datanode 10.20.11.86:50010 failed 4 times. Pipeline was 10.20.11.89:50010, 10.20.11.86:50010. Will retry...
2010-10-28 10:43:24,243 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4986602050098481381_5140430 failed because recovery from primary datanode 10.20.11.86:50010 failed 5 times. Pipeline was 10.20.11.89:50010, 10.20.11.86:50010. Will retry...
2010-10-28 10:43:25,253 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4986602050098481381_5140430 failed because recovery from primary datanode 10.20.11.86:50010 failed 6 times. Pipeline was 10.20.11.89:50010, 10.20.11.86:50010. Marking primary datanode as bad.
2010-10-28 10:43:26,262 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4986602050098481381_5140430 failed because recovery from primary datanode 10.20.11.89:50010 failed 1 times. Pipeline was 10.20.11.89:50010. Will retry...
2010-10-28 10:43:27,268 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4986602050098481381_5140430 failed because recovery from primary datanode 10.20.11.89:50010 failed 2 times. Pipeline was 10.20.11.89:50010. Will retry...
2010-10-28 10:43:28,273 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4986602050098481381_5140430 failed because recovery from primary datanode 10.20.11.89:50010 failed 3 times. Pipeline was 10.20.11.89:50010. Will retry...
2010-10-28 10:43:29,276 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4986602050098481381_5140430 failed because recovery from primary datanode 10.20.11.89:50010 failed 4 times. Pipeline was 10.20.11.89:50010. Will retry...
2010-10-28 10:43:30,280 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4986602050098481381_5140430 failed because recovery from primary datanode 10.20.11.89:50010 failed 5 times. Pipeline was 10.20.11.89:50010. Will retry...
2010-10-28 10:43:31,284 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4986602050098481381_5140430 failed because recovery from primary datanode 10.20.11.89:50010 failed 6 times. Pipeline was 10.20.11.89:50010. Aborting...
2010-10-28 10:43:31,285 INFO org.apache.hadoop.streaming.PipeMapRed: mapRedFinished
2010-10-28 10:43:31,288 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
java.io.IOException: Error Recovery for block blk_4986602050098481381_5140430 failed because recovery from primary datanode 10.20.11.89:50010 failed 6 times. Pipeline was 10.20.11.89:50010. Aborting...
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2601)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2087)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2251)
2010-10-28 10:43:31,291 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the task

Namenode log for block:

2010-10-28 10:34:11,880 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /outputdir/_temporary/_attempt_201010251211_4053_r_000057_1/hourly=2010102703/part-00057. blk_4986602050098481381_5140430

(Yes, that's all)

Datanode log on 10.20.11.86:

2010-10-28 10:34:11,858 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_4986602050098481381_5140430 src: /10.20.11.88:60247 dest: /10.20.11.86:50010
java.io.IOException: No temporary file /opera/log3/hadoop/dfs/data/tmp/blk_4986602050098481381 for block blk_4986602050098481381_5140430
at org.apache.hadoop.hdfs.server.datanode.FSDataset.finalizeBlock(FSDataset.java:1240)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.lastDataNodeRun(BlockReceiver.java:809)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:859)
at java.lang.Thread.run(Thread.java:619)
2010-10-28 10:43:14,013 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: checkDiskError: exception:
java.io.IOException: No temporary file /opera/log3/hadoop/dfs/data/tmp/blk_4986602050098481381 for block blk_4986602050098481381_5140430
at org.apache.hadoop.hdfs.server.datanode.FSDataset.finalizeBlock(FSDataset.java:1240)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.lastDataNodeRun(BlockReceiver.java:809)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:859)
at java.lang.Thread.run(Thread.java:619)
2010-10-28 10:43:14,140 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder blk_4986602050098481381_5140430 0 Exception java.io.IOException: No temporary file /opera/log3/hadoop/dfs/data/tmp/blk_4986602050098481381 for block blk_4986602050098481381_5140430
at org.apache.hadoop.hdfs.server.datanode.FSDataset.finalizeBlock(FSDataset.java:1240)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.lastDataNodeRun(BlockReceiver.java:809)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:859)
at java.lang.Thread.run(Thread.java:619)

2010-10-28 10:43:14,140 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_4986602050098481381_5140430 terminating
2010-10-28 10:43:14,013 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in BlockReceiver.lastNodeRun:
java.io.IOException: No temporary file /opera/log3/hadoop/dfs/data/tmp/blk_4986602050098481381 for block blk_4986602050098481381_5140430
at org.apache.hadoop.hdfs.server.datanode.FSDataset.finalizeBlock(FSDataset.java:1240)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.lastDataNodeRun(BlockReceiver.java:809)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:859)
at java.lang.Thread.run(Thread.java:619)
2010-10-28 10:43:14,013 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: checkDiskError: exception:
java.io.IOException: No temporary file /opera/log3/hadoop/dfs/data/tmp/blk_4986602050098481381 for block blk_4986602050098481381_5140430
at org.apache.hadoop.hdfs.server.datanode.FSDataset.finalizeBlock(FSDataset.java:1240)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.lastDataNodeRun(BlockReceiver.java:809)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:859)
at java.lang.Thread.run(Thread.java:619)
2010-10-28 10:43:14,140 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder blk_4986602050098481381_5140430 0 Exception java.io.IOException: No temporary file /opera/log3/hadoop/dfs/data/tmp/blk_4986602050098481381 for block blk_4986602050098481381_5140430
at org.apache.hadoop.hdfs.server.datanode.FSDataset.finalizeBlock(FSDataset.java:1240)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.lastDataNodeRun(BlockReceiver.java:809)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:859)
at java.lang.Thread.run(Thread.java:619)

2010-10-28 10:43:14,140 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_4986602050098481381_5140430 terminating
2010-10-28 10:43:20,211 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Client calls recoverBlock(block=blk_4986602050098481381_5140430, targets=[10.20.11.89:50010, 10.20.11.86:50010])
2010-10-28 10:43:20,215 WARN org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to getBlockMetaDataInfo for block (=blk_4986602050098481381_5140430) from datanode (=10.20.11.89:50010)
org.apache.hadoop.ipc.RemoteException: java.io.IOException: Meta file not found, blockFile=/opera/log4/hadoop/dfs/data/tmp/blk_4986602050098481381
at org.apache.hadoop.hdfs.server.datanode.FSDataset.findMetaFile(FSDataset.java:674)
at org.apache.hadoop.hdfs.server.datanode.FSDataset.getStoredBlock(FSDataset.java:723)
at org.apache.hadoop.hdfs.server.datanode.DataNode.getBlockMetaDataInfo(DataNode.java:1431)
at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:966)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:962)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:960)

at org.apache.hadoop.ipc.Client.call(Client.java:740)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
at $Proxy5.getBlockMetaDataInfo(Unknown Source)
at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1542)
at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1637)
at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:966)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:962)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:960)
2010-10-28 10:43:20,215 WARN org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to getBlockMetaDataInfo for block (=blk_4986602050098481381_5140430) from datanode (=10.20.11.86:50010)
2010-10-28 10:43:20,216 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 50020, call recoverBlock(blk_4986602050098481381_5140430, false, [Lorg.apache.hadoop.hdfs.protocol.DatanodeInfo;@3956cce8) from 10.20.11.89:43120: error: java.io.IOException: All datanodes failed: block=blk_4986602050098481381_5140430, datanodeids=[10.20.11.89:50010, 10.20.11.86:50010]
java.io.IOException: All datanodes failed: block=blk_4986602050098481381_5140430, datanodeids=[10.20.11.89:50010, 10.20.11.86:50010]
at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1565)
at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1637)
at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:966)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:962)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:960)
2010-10-28 10:43:21,218 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Client calls recoverBlock(block=blk_4986602050098481381_5140430, targets=[10.20.11.89:50010, 10.20.11.86:50010])
2010-10-28 10:43:21,222 WARN org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to getBlockMetaDataInfo for block (=blk_4986602050098481381_5140430) from datanode (=10.20.11.89:50010)
org.apache.hadoop.ipc.RemoteException: java.io.IOException: Meta file not found, blockFile=/opera/log4/hadoop/dfs/data/tmp/blk_4986602050098481381
at org.apache.hadoop.hdfs.server.datanode.FSDataset.findMetaFile(FSDataset.java:674)
at org.apache.hadoop.hdfs.server.datanode.FSDataset.getStoredBlock(FSDataset.java:723)
at org.apache.hadoop.hdfs.server.datanode.DataNode.getBlockMetaDataInfo(DataNode.java:1431)
at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:966)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:962)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:960)

at org.apache.hadoop.ipc.Client.call(Client.java:740)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
at $Proxy5.getBlockMetaDataInfo(Unknown Source)
at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1542)
at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1637)
at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:966)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:962)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:960)
2010-10-28 10:43:21,222 WARN org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to getBlockMetaDataInfo for block (=blk_4986602050098481381_5140430) from datanode (=10.20.11.86:50010)
java.io.IOException: Meta file not found, blockFile=/opera/log3/hadoop/dfs/data/tmp/blk_4986602050098481381
at org.apache.hadoop.hdfs.server.datanode.FSDataset.findMetaFile(FSDataset.java:674)
at org.apache.hadoop.hdfs.server.datanode.FSDataset.getStoredBlock(FSDataset.java:723)
at org.apache.hadoop.hdfs.server.datanode.DataNode.getBlockMetaDataInfo(DataNode.java:1431)
at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1542)
at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1637)
at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:966)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:962)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:960)
2010-10-28 10:43:21,222 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 50020, call recoverBlock(blk_4986602050098481381_5140430, false, [Lorg.apache.hadoop.hdfs.protocol.DatanodeInfo;@726d0fae) from 10.20.11.89:43120: error: java.io.IOException: All datanodes failed: block=blk_4986602050098481381_5140430, datanodeids=[10.20.11.89:50010, 10.20.11.86:50010]
java.io.IOException: All datanodes failed: block=blk_4986602050098481381_5140430, datanodeids=[10.20.11.89:50010, 10.20.11.86:50010]
at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1565)
at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1637)
at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:966)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:962)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:960)

(There are more of these message in that log)

Datanode log on 10.20.11.89:

2010-10-28 10:34:11,851 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_4986602050098481381_5140430 src: /10.20.11.89:58621 dest: /10.20.11.89:50010

2010-10-28 10:43:14,138 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for block blk_4986602050098481381_5140430 terminating
2010-10-28 10:43:14,146 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 50020, call getBlockMetaDataInfo(blk_4986602050098481381_5140430) from 10.20.11.88:37156: error: java.io.IOException: Meta file not found, blockFile=/opera/log4/hadoop/dfs/data/tmp/blk_4986602050098481381
java.io.IOException: Meta file not found, blockFile=/opera/log4/hadoop/dfs/data/tmp/blk_4986602050098481381
at org.apache.hadoop.hdfs.server.datanode.FSDataset.findMetaFile(FSDataset.java:674)
at org.apache.hadoop.hdfs.server.datanode.FSDataset.getStoredBlock(FSDataset.java:723)
at org.apache.hadoop.hdfs.server.datanode.DataNode.getBlockMetaDataInfo(DataNode.java:1431)
at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:966)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:962)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:960)
2010-10-28 10:43:15,158 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 50020, call getBlockMetaDataInfo(blk_4986602050098481381_5140430) from 10.20.11.88:37156: error: java.io.IOException: Meta file not found, blockFile=/opera/log4/hadoop/dfs/data/tmp/blk_4986602050098481381
java.io.IOException: Meta file not found, blockFile=/opera/log4/hadoop/dfs/data/tmp/blk_4986602050098481381
at org.apache.hadoop.hdfs.server.datanode.FSDataset.findMetaFile(FSDataset.java:674)
at org.apache.hadoop.hdfs.server.datanode.FSDataset.getStoredBlock(FSDataset.java:723)
at org.apache.hadoop.hdfs.server.datanode.DataNode.getBlockMetaDataInfo(DataNode.java:1431)
at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:966)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:962)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:960)

(And then a bunch of similar ones)

Datanode log on 10.20.11.88:

2010-10-28 10:34:11,858 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_4986602050098481381_5140430 src: /10.20.11.89:43997 dest: /10.20.11.88:50010
2010-10-28 10:43:14,143 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder blk_4986602050098481381_5140430 1 Exception java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at java.io.DataInputStream.readLong(DataInputStream.java:399)
at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:119)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:882)
at java.lang.Thread.run(Thread.java:619)

2010-10-28 10:43:14,143 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 1 for block blk_4986602050098481381_5140430 terminating
2010-10-28 10:43:14,147 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Client calls recoverBlock(block=blk_4986602050098481381_5140430, targets=[10.20.11.89:50010, 10.20.11.88:50010])
2010-10-28 10:43:14,152 WARN org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to getBlockMetaDataInfo for block (=blk_4986602050098481381_5140430) from datanode (=10.20.11.89:50010)
org.apache.hadoop.ipc.RemoteException: java.io.IOException: Meta file not found, blockFile=/opera/log4/hadoop/dfs/data/tmp/blk_4986602050098481381
at org.apache.hadoop.hdfs.server.datanode.FSDataset.findMetaFile(FSDataset.java:674)
at org.apache.hadoop.hdfs.server.datanode.FSDataset.getStoredBlock(FSDataset.java:723)
at org.apache.hadoop.hdfs.server.datanode.DataNode.getBlockMetaDataInfo(DataNode.java:1431)
at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:966)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:962)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:960)

at org.apache.hadoop.ipc.Client.call(Client.java:740)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
at $Proxy5.getBlockMetaDataInfo(Unknown Source)
at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1542)
at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1637)
at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:966)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:962)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:960)
2010-10-28 10:43:14,153 WARN org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to getBlockMetaDataInfo for block (=blk_4986602050098481381_5140430) from datanode (=10.20.11.88:50010)
java.io.IOException: Meta file not found, blockFile=/opera/log3/hadoop/dfs/data/tmp/blk_4986602050098481381
at org.apache.hadoop.hdfs.server.datanode.FSDataset.findMetaFile(FSDataset.java:674)
at org.apache.hadoop.hdfs.server.datanode.FSDataset.getStoredBlock(FSDataset.java:723)
at org.apache.hadoop.hdfs.server.datanode.DataNode.getBlockMetaDataInfo(DataNode.java:1431)
at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1542)
at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1637)
at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:966)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:962)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:960)
2010-10-28 10:43:14,153 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 50020, call recoverBlock(blk_4986602050098481381_5140430, false, [Lorg.apache.hadoop.hdfs.protocol.DatanodeInfo;@58245b57) from 10.20.11.89:34684: error: java.io.IOException: All datanodes failed: block=blk_4986602050098481381_5140430, datanodeids=[10.20.11.89:50010, 10.20.11.88:50010]
2010-10-28 10:43:14,143 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder blk_4986602050098481381_5140430 1 Exception java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at java.io.DataInputStream.readLong(DataInputStream.java:399)
at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:119)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:882)
at java.lang.Thread.run(Thread.java:619)

2010-10-28 10:43:14,143 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 1 for block blk_4986602050098481381_5140430 terminating
2010-10-28 10:43:14,147 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Client calls recoverBlock(block=blk_4986602050098481381_5140430, targets=[10.20.11.89:50010, 10.20.11.88:50010])
2010-10-28 10:43:14,152 WARN org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to getBlockMetaDataInfo for block (=blk_4986602050098481381_5140430) from datanode (=10.20.11.89:50010)
org.apache.hadoop.ipc.RemoteException: java.io.IOException: Meta file not found, blockFile=/opera/log4/hadoop/dfs/data/tmp/blk_4986602050098481381
at org.apache.hadoop.hdfs.server.datanode.FSDataset.findMetaFile(FSDataset.java:674)
at org.apache.hadoop.hdfs.server.datanode.FSDataset.getStoredBlock(FSDataset.java:723)
at org.apache.hadoop.hdfs.server.datanode.DataNode.getBlockMetaDataInfo(DataNode.java:1431)
at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:966)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:962)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:960)

at org.apache.hadoop.ipc.Client.call(Client.java:740)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
at $Proxy5.getBlockMetaDataInfo(Unknown Source)
at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1542)
at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1637)
at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:966)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:962)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:960)
2010-10-28 10:43:14,153 WARN org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to getBlockMetaDataInfo for block (=blk_4986602050098481381_5140430) from datanode (=10.20.11.88:50010)
java.io.IOException: Meta file not found, blockFile=/opera/log3/hadoop/dfs/data/tmp/blk_4986602050098481381
at org.apache.hadoop.hdfs.server.datanode.FSDataset.findMetaFile(FSDataset.java:674)
at org.apache.hadoop.hdfs.server.datanode.FSDataset.getStoredBlock(FSDataset.java:723)
at org.apache.hadoop.hdfs.server.datanode.DataNode.getBlockMetaDataInfo(DataNode.java:1431)
at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1542)
at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1637)
at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:966)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:962)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:960)
2010-10-28 10:43:14,153 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 50020, call recoverBlock(blk_4986602050098481381_5140430, false, [Lorg.apache.hadoop.hdfs.protocol.DatanodeInfo;@58245b57) from 10.20.11.89:34684: error: java.io.IOException: All datanodes failed: block=blk_4986602050098481381_5140430, datanodeids=[10.20.11.89:50010, 10.20.11.88:50010]


--
Erik Forsberg <forsberg@opera.com>
Developer, Opera Software - http://www.opera.com/

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 2 | next ›
Discussion Overview
groupcommon-user @
categorieshadoop
postedOct 28, '10 at 12:22p
activeNov 15, '10 at 10:37a
posts2
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Erik Forsberg: 2 posts

People

Translate

site design / logo © 2022 Grokbase