FAQ
Hi,
I am running a certain job which constantly cause dead data nodes (who come back later, spontaneously ).

This causes some tasks to fail. my max fd limit is 64k.

Can any one identify what is causing this?

here are logs from the failed task and corresponding Data node log snippets:


Slave running the map task:


2011-06-30 18:49:00,338 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=

2011-06-30 18:49:00,583 WARN org.apache.hadoop.conf.Configuration:
/mnt1/tmp/hadoop-0.20/cache/root/mapred/local/taskTracker/jobcache/job_201106290946_0605/attempt_201106290946_0605_m_004201_0/job.xml:a
attempt to override final parameter: dfs.hosts.exclude;
Ignoring.

2011-06-30 18:50:00,845 INFO org.apache.hadoop.hdfs.DFSClient: Could not obtain block blk_-3550309578268660022_44008639 from any node: java.io.IOException: No live nodes contain current block

2011-06-30 18:51:03,865 INFO org.apache.hadoop.hdfs.DFSClient: Could not
obtain block blk_-3550309578268660022_44008639 from any node:
java.io.IOException: No live nodes contain current block

2011-06-30 18:52:07,003 INFO org.apache.hadoop.hdfs.DFSClient: Could not
obtain block blk_-3550309578268660022_44008639 from any node:
java.io.IOException: No live nodes contain current block

2011-06-30 18:54:10,075 WARN org.apache.hadoop.hdfs.DFSClient: DFS Read:
java.io.IOException: Could not obtain block:
blk_-3550309578268660022_44008639
file=/user/root/user_login_log/distinct_data/20110629/part-00124

at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1797)

at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1623)

at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1752)

at java.io.DataInputStream.readFully(DataInputStream.java:178)

at java.io.DataInputStream.readFully(DataInputStream.java:152)

at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450)

at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1417)

at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFileRecordReader.java:43)

at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:63)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:338)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)

at org.apache.hadoop.mapred.Child.main(Child.java:170)



2011-06-30 18:54:12,576 WARN org.apache.hadoop.mapred.TaskTracker: Error running child

java.io.IOException: Could not obtain block:
blk_-3550309578268660022_44008639
file=/user/root/user_login_log/distinct_data/20110629/part-00124

at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1797)

at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1623)

at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1752)

at java.io.DataInputStream.readFully(DataInputStream.java:178)

at java.io.DataInputStream.readFully(DataInputStream.java:152)

at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450)

at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1417)

at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFileRecordReader.java:43)

at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:63)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:338)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)

at org.apache.hadoop.mapred.Child.main(Child.java:170)

2011-06-30 18:54:14,635 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the task







DataNode on slave25





2011-06-30 18:52:38,126 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/172.28.1.125:50010, dest: /172.28.1.119:48947, bytes: 0, op: HDFS_READ,
cliID: DFSClient_attempt_201106290946_0605_m_004201_0, offset: 0,
srvID: DS-2025332107-172.28.1.125-50010-1300893119361,
blockid: blk_-3550309578268660022_44008639, duration: 146739000

2011-06-30 18:52:38,126 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/172.28.1.125:50010, dest: /172.28.1.119:55630, bytes: 0, op: HDFS_READ,
cliID: DFSClient_attempt_201106290946_0605_m_004201_0, offset: 0,
srvID: DS-2025332107-172.28.1.125-50010-1300893119361,
blockid: blk_-3550309578268660022_44008639, duration: 147680000

2011-06-30 18:52:38,126 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/172.28.1.125:50010, dest: /172.28.1.119:48589, bytes: 0, op: HDFS_READ,
cliID: DFSClient_attempt_201106290946_0605_m_004201_0, offset: 0,
srvID: DS-2025332107-172.28.1.125-50010-1300893119361,
blockid: blk_-3550309578268660022_44008639, duration: 147021000

2011-06-30 18:52:38,126 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/172.28.1.125:50010, dest: /172.28.1.119:49278, bytes: 0, op: HDFS_READ,
cliID: DFSClient_attempt_201106290946_0605_m_004201_0, offset: 0,
srvID: DS-2025332107-172.28.1.125-50010-1300893119361,
blockid: blk_-3550309578268660022_44008639, duration: 137582000

2011-06-30 18:52:38,135 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/172.28.1.125:50010, dest: /172.28.1.138:47818, bytes: 0, op: HDFS_READ,
cliID: DFSClient_attempt_201106290946_0605_m_004252_0, offset: 0,
srvID: DS-2025332107-172.28.1.125-50010-1300893119361,
blockid: blk_341656939579646802_43116464, duration: 30340000

2011-06-30 18:52:38,184 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/172.28.1.125:50010, dest: /172.28.1.115:51798, bytes: 0, op: HDFS_READ,
cliID: DFSClient_attempt_201106290946_0605_m_004932_0, offset: 0,
srvID: DS-2025332107-172.28.1.125-50010-1300893119361,
blockid: blk_1807082107808304792_41330622, duration: 195543000

2011-06-30 18:52:38,212 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in
receiveBlock for block blk_-7111600897613399673_44069484
java.io.EOFException: while trying to read 65557 bytes

2011-06-30 18:52:38,212 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in
receiveBlock for block blk_-3883253381457682039_44069367
java.io.EOFException: while trying to read 65557 bytes

2011-06-30 18:52:38,215 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_-3883253381457682039_44069367 1 Exception
java.nio.channels.ClosedByInterruptException

at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)

at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:263)

at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)

at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)

at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)

at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)

at java.io.DataInputStream.readFully(DataInputStream.java:178)

at java.io.DataInputStream.readLong(DataInputStream.java:399)

at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:119)

at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:882)

at java.lang.Thread.run(Thread.java:619)



2011-06-30 18:52:38,215
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_-7111600897613399673_44069484 1 Exception
java.nio.channels.ClosedByInterruptException

at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)

at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:263)

at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)

at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)

at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)

at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)

at java.io.DataInputStream.readFully(DataInputStream.java:178)

at java.io.DataInputStream.readLong(DataInputStream.java:399)

at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:119)

at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:882)

at java.lang.Thread.run(Thread.java:619)



2011-06-30 18:52:38,215
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_-3883253381457682039_44069367 1 : Thread is interrupted.

2011-06-30 18:52:38,215 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_-7111600897613399673_44069484 1 : Thread is interrupted.

2011-06-30 18:52:38,215 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 1 for
block blk_-3883253381457682039_44069367 terminating

2011-06-30 18:52:38,216 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 1 for
block blk_-7111600897613399673_44069484 terminating

2011-06-30 18:52:38,216 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_-3883253381457682039_44069367 received exception
java.io.EOFException: while trying

to read 65557 bytes

2011-06-30 18:52:38,216 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_-7111600897613399673_44069484 received exception
java.io.EOFException: while trying

to read 65557 bytes

2011-06-30 18:52:38,216 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(172.28.1.125:50010,
storageID=DS-2025332107-172.28.1.125-50010-1300893119361

, infoPort=50075, ipcPort=50020):DataXceiver

java.io.EOFException: while trying to read 65557 bytes

at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:268)

at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:312)

at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:376)

at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:528)

at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)

at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)

at java.lang.Thread.run(Thread.java:619)

2011-06-30 18:52:38,216 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(172.28.1.125:50010,
storageID=DS-2025332107-172.28.1.125-50010-1300893119361

, infoPort=50075, ipcPort=50020):DataXceiver

java.io.EOFException: while trying to read 65557 bytes

at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:268)

at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:312)

at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:376)

at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:528)

at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)

at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)

at java.lang.Thread.run(Thread.java:619)

2011-06-30 18:52:38,223 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in
receiveBlock for block blk_-9189210190114502992_32988083
java.io.EOFException: while


trying to read 2730 bytes

2011-06-30 18:52:38,305 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in
receiveBlock for block blk_-494032072952213794_44069458
java.io.EOFException: while t

rying to read 65557 bytes

2011-06-30 18:52:38,305 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_-494032072952213794_44069458 2 Exception
java.nio.channels.ClosedByInterruptEx

ception

at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)

at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:263)

at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)

at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)

at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)

at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)

at java.io.DataInputStream.readFully(DataInputStream.java:178)

at java.io.DataInputStream.readLong(DataInputStream.java:399)

at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:119)

at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:882)

at java.lang.Thread.run(Thread.java:619)



2011-06-30 18:52:38,305
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_-494032072952213794_44069458 2 : Thread is interrupted.

2011-06-30 18:52:38,305 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for
block blk_-494032072952213794_44069458 terminating

2011-06-30 18:52:38,306 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_-494032072952213794_44069458 received exception
java.io.EOFException: while trying


to read 65557 bytes

2011-06-30 18:52:38,306 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(172.28.1.125:50010,
storageID=DS-2025332107-172.28.1.125-50010-1300893119361

, infoPort=50075, ipcPort=50020):DataXceiver

java.io.EOFException: while trying to read 65557 bytes

at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:268)

at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:312)

at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:376)

at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:528)

at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)

at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)

at java.lang.Thread.run(Thread.java:619)

2011-06-30 18:52:38,471 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: Block
blk_-9220320572849103952_19298441 unfinalized and removed.


2011-06-30 18:52:38,471 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/172.28.1.125:50010, dest: /172.28.1.142:53133, bytes: 0, op: HDFS_READ,
cliID: D

FSClient_attempt_201106290946_0608_m_000004_1, offset: 0, srvID:
DS-2025332107-172.28.1.125-50010-1300893119361, blockid:
blk_6335845677046126634_44068668, duration: 296000

2011-06-30 18:52:38,472 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_-9220320572849103952_19298441 received exception
java.io.EOFException: while trying

to read 25370 bytes

2011-06-30 18:52:38,472 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(172.28.1.125:50010,
storageID=DS-2025332107-172.28.1.125-50010-1300893119361

, infoPort=50075, ipcPort=50020):DataXceiver

java.io.EOFException: while trying to read 25370 bytes

at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:268)

at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:355)

at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:376)

at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:528)

at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)

at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)

at java.lang.Thread.run(Thread.java:619)

2011-06-30 18:52:38,472 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Received block
blk_-9080487810231223837_12934481 src: /172.28.1.128:48816 dest:
/172.28.1.125:5001

0 of size 934228

2011-06-30 18:52:38,472 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Received block
blk_-9047841941671274596_33117184 src: /172.28.1.139:57972 dest:
/172.28.1.125:5001

0 of size 436507

2011-06-30 18:52:38,473 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: Block
blk_-9189210190114502992_32988083 unfinalized and removed.


2011-06-30 18:52:38,473 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_-9189210190114502992_32988083 received exception
java.io.EOFException: while trying

to read 2730 bytes

2011-06-30 18:52:38,473 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(172.28.1.125:50010,
storageID=DS-2025332107-172.28.1.125-50010-1300893119361

, infoPort=50075, ipcPort=50020):DataXceiver

java.io.EOFException: while trying to read 2730 bytes

at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:268)

at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:355)

at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:376)

at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:528)

at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)

at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)

at java.lang.Thread.run(Thread.java:619)

2011-06-30 18:52:38,473 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Received block
blk_-9167383395209195324_43113407 src: /172.28.1.129:42479 dest:
/172.28.1.125:5001

0 of size 306521

2011-06-30 18:52:38,474 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: Block
blk_-9222760032390209760_33640525 unfinalized and removed.


2011-06-30 18:52:38,474 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_-9222760032390209760_33640525 received exception
java.io.EOFException: while trying

to read 5034 bytes

2011-06-30 18:52:38,474 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(172.28.1.125:50010,
storageID=DS-2025332107-172.28.1.125-50010-1300893119361

, infoPort=50075, ipcPort=50020):DataXceiver

java.io.EOFException: while trying to read 5034 bytes

at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:268)

at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:355)

at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:376)

at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:528)

at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)

at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)

at java.lang.Thread.run(Thread.java:619)

2011-06-30 18:52:38,475 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Received block
blk_-4835040186380340610_22556835 src: /172.28.1.129:42385 dest:
/172.28.1.125:5001

0 of size 1873718

2011-06-30 18:52:38,500 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/172.28.1.125:50010, dest: /172.28.1.103:48797, bytes: 0, op: HDFS_READ,
cliID: D

FSClient_attempt_201106220308_2593_r_000103_0, offset: 0, srvID:
DS-2025332107-172.28.1.125-50010-1300893119361, blockid:
blk_-7869815404835809891_44068763, duration: 26414000

2011-06-30 18:52:38,543 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Received block
blk_-8830898802917406976_9670906 src: /172.28.1.135:38108 dest:
/172.28.1.125:50010

of size 1586179

2011-06-30 18:52:38,545 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/172.28.1.125:50010, dest: /172.28.1.108:49754, bytes: 0, op: HDFS_READ,
cliID: D

FSClient_attempt_201106290946_0608_m_000117_1, offset: 0, srvID:
DS-2025332107-172.28.1.125-50010-1300893119361, blockid:
blk_8222968190463524631_44068718, duration: 71801000

2011-06-30 18:52:38,557 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/172.28.1.125:50010, dest: /172.28.1.118:34031, bytes: 0, op: HDFS_READ,
cliID: D

FSClient_attempt_201106290946_0605_m_005176_0, offset: 0, srvID:
DS-2025332107-172.28.1.125-50010-1300893119361, blockid:
blk_1403998834571958702_41330639, duration: 85592000

2011-06-30 18:52:38,587 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Received block
blk_-3977598016817417325_33983630 src: /172.28.1.107:59369 dest:
/172.28.1.125:5001

0 of size 5787499

2011-06-30 18:52:38,614 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/172.28.1.125:50010, dest: /172.28.1.138:47611, bytes: 0, op: HDFS_READ,
cliID: D

FSClient_attempt_201106290946_0605_m_004132_0, offset: 0, srvID:
DS-2025332107-172.28.1.125-50010-1300893119361, blockid:
blk_-2960100665116847711_42689744, duration: 141415000

2011-06-30 18:52:38,644 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/172.28.1.125:50010, dest: /172.28.1.112:55765, bytes: 0, op: HDFS_READ,
cliID: D

FSClient_attempt_201106290946_0608_m_000303_1, offset: 0, srvID:
DS-2025332107-172.28.1.125-50010-1300893119361, blockid:
blk_8063669434333395104_44065929, duration: 1272000

2011-06-30 18:52:38,655 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_3068907439054059419_27968064

2011-06-30 18:52:38,914 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Received block
blk_-8758412301877668079_31392851 src: /172.28.1.139:60920 dest:
/172.28.1.125:5001

0 of size 3166291

:

Search Discussions

  • Allen Wittenauer at Jun 30, 2011 at 6:46 pm

    On Jun 30, 2011, at 10:01 AM, David Ginzburg wrote:


    Hi,
    I am running a certain job which constantly cause dead data nodes (who come back later, spontaneously ).
    Check your memory usage during the job run. Chances are good the DataNode is getting swapped out.
  • David Ginzburg at Jun 30, 2011 at 7:37 pm
    Is it possible though the server runs with vm.swappiness =5


    Subject: Re: Dead data nodes during job excution and failed tasks.
    From: aw@apache.org
    Date: Thu, 30 Jun 2011 11:46:25 -0700
    To: mapreduce-user@hadoop.apache.org

    On Jun 30, 2011, at 10:01 AM, David Ginzburg wrote:


    Hi,
    I am running a certain job which constantly cause dead data nodes (who come back later, spontaneously ).
    Check your memory usage during the job run. Chances are good the DataNode is getting swapped out.
  • Allen Wittenauer at Jul 1, 2011 at 1:19 am

    On Jun 30, 2011, at 12:36 PM, David Ginzburg wrote:


    Is it possible though the server runs with vm.swappiness =5
    That only controls how aggressive the system swaps. If you eat all the RAM in user space, the system is going to start paging memory regardless of swappiness.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedJun 30, '11 at 5:02p
activeJul 1, '11 at 1:19a
posts4
users2
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase