FAQ
Hi all,

I am running a series of jobs one after another. While executing the
4th job, the job fails. It fails in the reducer --- the progress percentage
would be map 100%, reduce 99%. It gives out the following message

10/04/01 01:04:15 INFO mapred.JobClient: Task Id :
attempt_201003240138_0110_r_000018_1, Status : FAILED
Task attempt_201003240138_0110_r_000018_1 failed to report status for 602
seconds. Killing!

It makes several attempts again to execute it but fails with similar
message. I couldn't get anything from this error message and wanted to look
at logs (located in the default dir of ${HADOOP_HOME/logs}). But I don't
find any files which match the timestamp of the job. Also I did not find
history and userlogs in the logs folder. Should I look at some other place
for the logs? What could be the possible causes for the above error?

I am using Hadoop 0.20.2 and I am running it on a cluster with 14
nodes.

Thank you.

Regards,
Raghava.

Search Discussions

  • Raghava Mutharaju at Apr 3, 2010 at 4:16 am
    Hi all,

    I have found the log files on the DataNodes. I have checked the
    userlogs, but they do not contain any exception related to the error I have
    mentioned in the previous email (I am putting it here again).

    10/04/01 01:04:15 INFO mapred.JobClient: Task Id :
    attempt_201003240138_0110_r_
    000018_1, Status : FAILED
    Task attempt_201003240138_0110_r_000018_1 failed to report status for 602
    seconds. Killing!

    I have also done some tests by changing the order of the jobs. After the 3rd
    job, any job which is run after it fails at reduce 99% with the above
    message (with different attempt IDs ofcourse). I guess the number of input
    files does not matter -- at that point there are 130 input files (which is
    taken as input for 4th job).

    I am at a loss on how to proceed with this. Happy to get any pointers :)

    Thank you.

    Regards,
    Raghava.
    On Thu, Apr 1, 2010 at 2:24 AM, Raghava Mutharaju wrote:

    Hi all,

    I am running a series of jobs one after another. While executing the
    4th job, the job fails. It fails in the reducer --- the progress percentage
    would be map 100%, reduce 99%. It gives out the following message


    10/04/01 01:04:15 INFO mapred.JobClient: Task Id :
    attempt_201003240138_0110_r_000018_1, Status : FAILED
    Task attempt_201003240138_0110_r_000018_1 failed to report status for 602
    seconds. Killing!

    It makes several attempts again to execute it but fails with similar
    message. I couldn't get anything from this error message and wanted to look
    at logs (located in the default dir of ${HADOOP_HOME/logs}). But I don't
    find any files which match the timestamp of the job. Also I did not find
    history and userlogs in the logs folder. Should I look at some other place
    for the logs? What could be the possible causes for the above error?

    I am using Hadoop 0.20.2 and I am running it on a cluster with 14
    nodes.

    Thank you.

    Regards,
    Raghava.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedApr 1, '10 at 6:25a
activeApr 3, '10 at 4:16a
posts2
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Raghava Mutharaju: 2 posts

People

Translate

site design / logo © 2022 Grokbase