FAQ
Hi,

I am really having a hard time debugging this. I have a hadoop cluster and
one of the maps is taking time. I checked the "datanode" logs and can see no
activity for around 10 minutes!

2011-06-03 10:09:06,772 DEBUG
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
10.240.222.218:50010,
storageID=DS-1909388466-10.240.222.218-50010-1307002238331, infoPort=50075,
ipcPort=50020):Number of active connections is: 2
2011-06-03 10:19:41,033 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block
blk_-9115985339102075853_6140 file
/mnt/hadoop/hadoop-hadoop/dfs/data/current/blk_-9115985339102075853

The task running on this node gets killed finally.


Job:
attempt_201106011013_0023_m_000013_0Task attempt:
/default-rack/domU-12-31-39-04-D9-2C.compute-1.internal<http://domu-12-31-39-04-d9-2c.compute-1.internal:50060/>
Cleanup Attempt:
/default-rack/domU-12-31-39-04-D9-2C.compute-1.internal<http://domu-12-31-39-04-d9-2c.compute-1.internal:50060/>
KILLED100.00%3-Jun-2011 10:09:413-Jun-2011 10:19:26 (9mins, 44sec)



Task attempt:
Last 4KB<http://domu-12-31-39-04-d9-2c.compute-1.internal:50060/tasklog?taskid=attempt_201106011013_0023_m_000013_0&start=-4097>
Last 8KB<http://domu-12-31-39-04-d9-2c.compute-1.internal:50060/tasklog?taskid=attempt_201106011013_0023_m_000013_0&start=-8193>
All<http://domu-12-31-39-04-d9-2c.compute-1.internal:50060/tasklog?taskid=attempt_201106011013_0023_m_000013_0&all=true>
Cleanup attempt:
Last 4KB<http://domu-12-31-39-04-d9-2c.compute-1.internal:50060/tasklog?taskid=attempt_201106011013_0023_m_000013_0&start=-4097&cleanup=true>
Last 8KB<http://domu-12-31-39-04-d9-2c.compute-1.internal:50060/tasklog?taskid=attempt_201106011013_0023_m_000013_0&start=-8193&cleanup=true>
All<http://domu-12-31-39-04-d9-2c.compute-1.internal:50060/tasklog?taskid=attempt_201106011013_0023_m_000013_0&all=true&cleanup=true>
7<http://ec2-50-19-143-103.compute-1.amazonaws.com:50030/taskstats.jsp?jobid=job_201106011013_0023&tipid=task_201106011013_0023_m_000013&taskid=attempt_201106011013_0023_m_000013_0>
This node was added by me newly and I am trying to prove the analogy that it
doesnt have the required data, and it will bring it over the network. How do
I prove that this is the case here? I cannot see anything in the log. Is
there a way to identify the blocks that it was querying, and verify that it
wasn't present on the machine when the map ran?

Thanks in advance.

Regards,
Mayuresh

Search Discussions

  • Steve Loughran at Jun 5, 2011 at 10:09 am

    On 03/06/2011 12:24, Mayuresh wrote:
    Hi,

    I am really having a hard time debugging this. I have a hadoop cluster and
    one of the maps is taking time. I checked the "datanode" logs and can see no
    activity for around 10 minutes!
    The usual cause here is imminent disk failure, as reads start to take
    longer and longer. look at your SMART disk logs, do some performance
    tests of all the drives

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 3, '11 at 11:24a
activeJun 5, '11 at 10:09a
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Steve Loughran: 1 post Mayuresh: 1 post

People

Translate

site design / logo © 2022 Grokbase