FAQ
I'm seeing this error when I try to run my job.

java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
From what I can find by doing some Google searches, this means the mapred task JVM has crashed. Not many suggestions about what to do about it. Some suggestions about increasing max heap. I tried that, although I don't think that's the issue because it's not a particularly memory intensive process and I've even tried it with a super small input data set of only a few records. Still see the same issue.
Can't find anything else in the logs. I don't think my task even started, because there are no user logs created at all. Seems to fail during Job Setup.

A little more background. This job was working fine for weeks, running hourly, and then failed on Saturday morning and hasn't worked since. Obviously, I looked for something that changed at that point, but no one was working at that time... can't find anything that changed. I tried the job with different input data sets, doesn't seem to matter, unless I run it with no data at all. The job does run with no input data, but if I have even a few input records it fails-doesn't seem to matter which records. I suspected some corruption in HDFS, but I was able to extract the data from HDFS (hadoop dfs -get ...) and the data looks ok. I also copied this data set to our TEST cluster and ran the job there... and it WORKED!

Ran one of our other jobs and it failed as well, so it doesn't seem to be job specific either; looks like every job fails the same way.

Did a complete reboot of the cluster-no impact.

We're using Hadoop 0.20.0, and Java 1.6 update 16 on CentOS 5.2 64bit.

Any suggestions on what could be wrong or where to look for more information would be appreciated.



Marc Limotte
Feeva Technology

PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.

Search Discussions

  • Edward Capriolo at Sep 24, 2009 at 2:51 pm

    On Wed, Sep 23, 2009 at 2:06 PM, Marc Limotte wrote:
    I'm seeing this error when I try to run my job.

    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)

    From what I can find by doing some Google searches, this means the mapred task JVM has crashed.  Not many suggestions about what to do about it.  Some suggestions about increasing max heap.  I tried that, although I don't think that's the issue because it's not a particularly memory intensive process and I've even tried it with a super small input data set of only a few records.  Still see the same issue.

    Can't find anything else in the logs.  I don't think my task even started, because there are no user logs created at all. Seems to fail during Job Setup.

    A little more background.  This job was working fine for weeks, running hourly, and then failed on Saturday morning and hasn't worked since.  Obviously, I looked for something that changed at that point, but no one was working at that time... can't find anything that changed.  I tried the job with different input data sets, doesn't seem to matter, unless I run it with no data at all.  The job does run with no input data, but if I have even a few input records it fails-doesn't seem to matter which records.  I suspected some corruption in HDFS, but I was able to extract the data from HDFS (hadoop dfs -get ...) and the data looks ok.  I also copied this data set to our TEST cluster and ran the job there... and it WORKED!

    Ran one of our other jobs and it failed as well, so it doesn't seem to be job specific either; looks like every job fails the same way.

    Did a complete reboot of the cluster-no impact.

    We're using Hadoop 0.20.0, and Java 1.6 update 16 on CentOS 5.2 64bit.

    Any suggestions on what could be wrong or where to look for more information would be appreciated.



    Marc Limotte
    Feeva Technology

    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
    Just a shot in the dark....

    Did you update java recently

    http://www.koopman.me/2009/04/hadoop-0183-could-not-create-the-java-virtual-machine/
  • Marc Limotte at Sep 24, 2009 at 4:58 pm
    Thanks for the suggestion, Edward. I only upgraded the JVM after the problem occurred to see if it would help, but it made no difference.

    Marc

    -----Original Message-----
    From: Edward Capriolo
    Sent: Thursday, September 24, 2009 7:50 AM
    To: common-user@hadoop.apache.org
    Subject: Re: Task process exit with nonzero status of 1
    On Wed, Sep 23, 2009 at 2:06 PM, Marc Limotte wrote:
    I'm seeing this error when I try to run my job.

    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)

    From what I can find by doing some Google searches, this means the mapred task JVM has crashed. Not many suggestions about what to do about it. Some suggestions about increasing max heap. I tried that, although I don't think that's the issue because it's not a particularly memory intensive process and I've even tried it with a super small input data set of only a few records. Still see the same issue.

    Can't find anything else in the logs. I don't think my task even started, because there are no user logs created at all. Seems to fail during Job Setup.

    A little more background. This job was working fine for weeks, running hourly, and then failed on Saturday morning and hasn't worked since. Obviously, I looked for something that changed at that point, but no one was working at that time... can't find anything that changed. I tried the job with different input data sets, doesn't seem to matter, unless I run it with no data at all. The job does run with no input data, but if I have even a few input records it fails-doesn't seem to matter which records. I suspected some corruption in HDFS, but I was able to extract the data from HDFS (hadoop dfs -get ...) and the data looks ok. I also copied this data set to our TEST cluster and ran the job there... and it WORKED!

    Ran one of our other jobs and it failed as well, so it doesn't seem to be job specific either; looks like every job fails the same way.

    Did a complete reboot of the cluster-no impact.

    We're using Hadoop 0.20.0, and Java 1.6 update 16 on CentOS 5.2 64bit.

    Any suggestions on what could be wrong or where to look for more information would be appreciated.



    Marc Limotte
    Feeva Technology

    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
    Just a shot in the dark....

    Did you update java recently

    http://www.koopman.me/2009/04/hadoop-0183-could-not-create-the-java-virtual-machine/

    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
  • Todd Lipcon at Sep 24, 2009 at 5:19 pm
    Hi Marc,

    Exit status 1 usually means some kind of controlled exit by the mapreduce
    child task. Things like JVM crashes usually are indicated by other exit
    codes (134 seems to be the code most commonly reported).

    If you look at the stderr and stdout from your task (in the userlogs/
    directory on the task tracker that ran them) do you see any output?
    Additionally, is there anything in the logs for the task tracker itself?
    That log is hadoop.log.dir/hadoop-<username>-tasktracker*log

    If that log is pretty long, try grepping for WARN, ERROR, or Exception

    -Todd
    On Thu, Sep 24, 2009 at 9:57 AM, Marc Limotte wrote:

    Thanks for the suggestion, Edward. I only upgraded the JVM after the
    problem occurred to see if it would help, but it made no difference.

    Marc

    -----Original Message-----
    From: Edward Capriolo
    Sent: Thursday, September 24, 2009 7:50 AM
    To: common-user@hadoop.apache.org
    Subject: Re: Task process exit with nonzero status of 1
    On Wed, Sep 23, 2009 at 2:06 PM, Marc Limotte wrote:
    I'm seeing this error when I try to run my job.

    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)

    From what I can find by doing some Google searches, this means the mapred
    task JVM has crashed. Not many suggestions about what to do about it. Some
    suggestions about increasing max heap. I tried that, although I don't think
    that's the issue because it's not a particularly memory intensive process
    and I've even tried it with a super small input data set of only a few
    records. Still see the same issue.
    Can't find anything else in the logs. I don't think my task even
    started, because there are no user logs created at all. Seems to fail during
    Job Setup.
    A little more background. This job was working fine for weeks, running
    hourly, and then failed on Saturday morning and hasn't worked since.
    Obviously, I looked for something that changed at that point, but no one
    was working at that time... can't find anything that changed. I tried the
    job with different input data sets, doesn't seem to matter, unless I run it
    with no data at all. The job does run with no input data, but if I have
    even a few input records it fails-doesn't seem to matter which records. I
    suspected some corruption in HDFS, but I was able to extract the data from
    HDFS (hadoop dfs -get ...) and the data looks ok. I also copied this data
    set to our TEST cluster and ran the job there... and it WORKED!
    Ran one of our other jobs and it failed as well, so it doesn't seem to be
    job specific either; looks like every job fails the same way.
    Did a complete reboot of the cluster-no impact.

    We're using Hadoop 0.20.0, and Java 1.6 update 16 on CentOS 5.2 64bit.

    Any suggestions on what could be wrong or where to look for more
    information would be appreciated.


    Marc Limotte
    Feeva Technology

    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR
    ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION
    PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE,
    DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY
    PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND
    PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
    Just a shot in the dark....

    Did you update java recently


    http://www.koopman.me/2009/04/hadoop-0183-could-not-create-the-java-virtual-machine/

    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR
    ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION
    PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE,
    DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY
    PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND
    PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
  • Marc Limotte at Sep 24, 2009 at 6:24 pm
    Hi Todd.

    No userlogs seem to be created. I'm guessing, because the map task never actually starts.

    I don't see any other errors in the tasktracker log, other than the one I put in the first message ("java.io.IOException: Task process exit with nonzero status of 1..."). I've included the output from one of the nodes' tasktracker logs below.

    Any other suggestions?

    Marc

    2009-09-24 18:15:36,955 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_200909221656_0006_m_000003_0 task's state:UNASSIGNED
    2009-09-24 18:15:36,959 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_200909221656_0006_m_000003_0
    2009-09-24 18:15:36,960 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch
    attempt_200909221656_0006_m_000003_02009-09-24 18:15:37,483 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_200909221656_0006_m_-145
    18051982009-09-24 18:15:37,483 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_200909221656_0006_m_-1451805198 spawned.
    2009-09-24 18:15:37,511 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_200909221656_0006_m_-1451805198 exited. Number of t
    asks it ran: 02009-09-24 18:15:37,512 WARN org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_m_000003_0 Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
    2009-09-24 18:15:40,518 INFO org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_m_000003_0 done; removing files.
    2009-09-24 18:15:40,519 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2
    2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_200909221656_0006_r
    _000001_0 task's state:UNASSIGNED2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_200909221656_0006_r_000001_0
    2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch
    attempt_200909221656_0006_r_000001_02009-09-24 18:15:43,000 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_200909221656_0006_r_7885
    020722009-09-24 18:15:43,000 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_200909221656_0006_r_788502072 spawned.
    2009-09-24 18:15:43,026 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_200909221656_0006_r_788502072 exited. Number of tas
    ks it ran: 0
    2009-09-24 18:15:43,026 WARN org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_r_000001_0 Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)2009-09-24 18:15:46,034 INFO org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_r_000001_0 done; removing files.
    2009-09-24 18:15:46,039 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2
    2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_200909221656_0006_m
    _000002_1 task's state:UNASSIGNED
    2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_200909221656_0006_m_000002_1
    2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_200909221656_0006_m_000002_1
    2009-09-24 18:16:34,060 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_200909221656_0006_m_-2120349138
    2009-09-24 18:16:34,060 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_200909221656_0006_m_-2120349138 spawned.
    2009-09-24 18:16:34,086 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_200909221656_0006_m_-2120349138 exited. Number of tasks it ran: 0
    2009-09-24 18:16:34,087 WARN org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_m_000002_1 Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
    2009-09-24 18:16:37,094 INFO org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_m_000002_1 done; removing files.
    2009-09-24 18:16:37,095 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2
    2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_200909221656_0006_r_000000_1 task's state:UNASSIGNED
    2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_200909221656_0006_r_000000_1
    2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_200909221656_0006_r_000000_1
    2009-09-24 18:16:40,057 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_200909221656_0006_r_-1417908695
    2009-09-24 18:16:40,057 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_200909221656_0006_r_-1417908695 spawned.
    2009-09-24 18:16:40,084 WARN org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_r_000000_1 Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
    2009-09-24 18:16:40,084 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_200909221656_0006_r_-1417908695 exited. Number of tasks it ran: 0
    2009-09-24 18:16:43,091 INFO org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_r_000000_1 done; removing files.
    2009-09-24 18:16:43,092 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2
    2009-09-24 18:17:07,057 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_200909221656_0006


    -----Original Message-----
    From: Todd Lipcon
    Sent: Thursday, September 24, 2009 10:19 AM
    To: common-user@hadoop.apache.org
    Subject: Re: Task process exit with nonzero status of 1

    Hi Marc,

    Exit status 1 usually means some kind of controlled exit by the mapreduce
    child task. Things like JVM crashes usually are indicated by other exit
    codes (134 seems to be the code most commonly reported).

    If you look at the stderr and stdout from your task (in the userlogs/
    directory on the task tracker that ran them) do you see any output?
    Additionally, is there anything in the logs for the task tracker itself?
    That log is hadoop.log.dir/hadoop-<username>-tasktracker*log

    If that log is pretty long, try grepping for WARN, ERROR, or Exception

    -Todd
    On Thu, Sep 24, 2009 at 9:57 AM, Marc Limotte wrote:

    Thanks for the suggestion, Edward. I only upgraded the JVM after the
    problem occurred to see if it would help, but it made no difference.

    Marc

    -----Original Message-----
    From: Edward Capriolo
    Sent: Thursday, September 24, 2009 7:50 AM
    To: common-user@hadoop.apache.org
    Subject: Re: Task process exit with nonzero status of 1
    On Wed, Sep 23, 2009 at 2:06 PM, Marc Limotte wrote:
    I'm seeing this error when I try to run my job.

    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)

    From what I can find by doing some Google searches, this means the mapred
    task JVM has crashed. Not many suggestions about what to do about it. Some
    suggestions about increasing max heap. I tried that, although I don't think
    that's the issue because it's not a particularly memory intensive process
    and I've even tried it with a super small input data set of only a few
    records. Still see the same issue.
    Can't find anything else in the logs. I don't think my task even
    started, because there are no user logs created at all. Seems to fail during
    Job Setup.
    A little more background. This job was working fine for weeks, running
    hourly, and then failed on Saturday morning and hasn't worked since.
    Obviously, I looked for something that changed at that point, but no one
    was working at that time... can't find anything that changed. I tried the
    job with different input data sets, doesn't seem to matter, unless I run it
    with no data at all. The job does run with no input data, but if I have
    even a few input records it fails-doesn't seem to matter which records. I
    suspected some corruption in HDFS, but I was able to extract the data from
    HDFS (hadoop dfs -get ...) and the data looks ok. I also copied this data
    set to our TEST cluster and ran the job there... and it WORKED!
    Ran one of our other jobs and it failed as well, so it doesn't seem to be
    job specific either; looks like every job fails the same way.
    Did a complete reboot of the cluster-no impact.

    We're using Hadoop 0.20.0, and Java 1.6 update 16 on CentOS 5.2 64bit.

    Any suggestions on what could be wrong or where to look for more
    information would be appreciated.


    Marc Limotte
    Feeva Technology

    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR
    ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION
    PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE,
    DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY
    PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND
    PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
    Just a shot in the dark....

    Did you update java recently


    http://www.koopman.me/2009/04/hadoop-0183-could-not-create-the-java-virtual-machine/

    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR
    ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION
    PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE,
    DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY
    PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND
    PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
  • Todd Lipcon at Sep 24, 2009 at 6:28 pm
    Odd...

    Try bumping up the logs to debug level on that tasktracker, see what you can
    determine?

    You could also strace -f -p <tasktracker pid> -o /tmp/tt_log and then grep
    through those logs later to see what might be going on.

    -Todd
    On Thu, Sep 24, 2009 at 11:24 AM, Marc Limotte wrote:

    Hi Todd.

    No userlogs seem to be created. I'm guessing, because the map task never
    actually starts.

    I don't see any other errors in the tasktracker log, other than the one I
    put in the first message ("java.io.IOException: Task process exit with
    nonzero status of 1..."). I've included the output from one of the nodes'
    tasktracker logs below.

    Any other suggestions?

    Marc

    2009-09-24 18:15:36,955 INFO org.apache.hadoop.mapred.TaskTracker:
    LaunchTaskAction (registerTask): attempt_200909221656_0006_m_000003_0 task's
    state:UNASSIGNED
    2009-09-24 18:15:36,959 INFO org.apache.hadoop.mapred.TaskTracker: Trying
    to launch : attempt_200909221656_0006_m_000003_0
    2009-09-24 18:15:36,960 INFO org.apache.hadoop.mapred.TaskTracker: In
    TaskLauncher, current free slots : 2 and trying to launch
    attempt_200909221656_0006_m_000003_02009-09-24 18:15:37,483 INFO
    org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID:
    jvm_200909221656_0006_m_-145
    18051982009-09-24 18:15:37,483 INFO org.apache.hadoop.mapred.JvmManager:
    JVM Runner jvm_200909221656_0006_m_-1451805198 spawned.
    2009-09-24 18:15:37,511 INFO org.apache.hadoop.mapred.JvmManager: JVM :
    jvm_200909221656_0006_m_-1451805198 exited. Number of t
    asks it ran: 02009-09-24 18:15:37,512 WARN
    org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_m_000003_0
    Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
    2009-09-24 18:15:40,518 INFO org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_m_000003_0 done; removing files.
    2009-09-24 18:15:40,519 INFO org.apache.hadoop.mapred.TaskTracker:
    addFreeSlot : current free slots : 2
    2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker:
    LaunchTaskAction (registerTask): attempt_200909221656_0006_r
    _000001_0 task's state:UNASSIGNED2009-09-24 18:15:42,964 INFO
    org.apache.hadoop.mapred.TaskTracker: Trying to launch :
    attempt_200909221656_0006_r_000001_0
    2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker: In
    TaskLauncher, current free slots : 2 and trying to launch
    attempt_200909221656_0006_r_000001_02009-09-24 18:15:43,000 INFO
    org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID:
    jvm_200909221656_0006_r_7885
    020722009-09-24 18:15:43,000 INFO org.apache.hadoop.mapred.JvmManager: JVM
    Runner jvm_200909221656_0006_r_788502072 spawned.
    2009-09-24 18:15:43,026 INFO org.apache.hadoop.mapred.JvmManager: JVM :
    jvm_200909221656_0006_r_788502072 exited. Number of tas
    ks it ran: 0
    2009-09-24 18:15:43,026 WARN org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_r_000001_0 Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at
    org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)2009-09-24
    18:15:46,034 INFO org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_r_000001_0 done; removing files.
    2009-09-24 18:15:46,039 INFO org.apache.hadoop.mapred.TaskTracker:
    addFreeSlot : current free slots : 2
    2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker:
    LaunchTaskAction (registerTask): attempt_200909221656_0006_m
    _000002_1 task's state:UNASSIGNED
    2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: Trying
    to launch : attempt_200909221656_0006_m_000002_1
    2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: In
    TaskLauncher, current free slots : 2 and trying to launch
    attempt_200909221656_0006_m_000002_1
    2009-09-24 18:16:34,060 INFO org.apache.hadoop.mapred.JvmManager: In
    JvmRunner constructed JVM ID: jvm_200909221656_0006_m_-2120349138
    2009-09-24 18:16:34,060 INFO org.apache.hadoop.mapred.JvmManager: JVM
    Runner jvm_200909221656_0006_m_-2120349138 spawned.
    2009-09-24 18:16:34,086 INFO org.apache.hadoop.mapred.JvmManager: JVM :
    jvm_200909221656_0006_m_-2120349138 exited. Number of tasks it ran: 0
    2009-09-24 18:16:34,087 WARN org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_m_000002_1 Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
    2009-09-24 18:16:37,094 INFO org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_m_000002_1 done; removing files.
    2009-09-24 18:16:37,095 INFO org.apache.hadoop.mapred.TaskTracker:
    addFreeSlot : current free slots : 2
    2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker:
    LaunchTaskAction (registerTask): attempt_200909221656_0006_r_000000_1 task's
    state:UNASSIGNED
    2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker: Trying
    to launch : attempt_200909221656_0006_r_000000_1
    2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker: In
    TaskLauncher, current free slots : 2 and trying to launch
    attempt_200909221656_0006_r_000000_1
    2009-09-24 18:16:40,057 INFO org.apache.hadoop.mapred.JvmManager: In
    JvmRunner constructed JVM ID: jvm_200909221656_0006_r_-1417908695
    2009-09-24 18:16:40,057 INFO org.apache.hadoop.mapred.JvmManager: JVM
    Runner jvm_200909221656_0006_r_-1417908695 spawned.
    2009-09-24 18:16:40,084 WARN org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_r_000000_1 Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
    2009-09-24 18:16:40,084 INFO org.apache.hadoop.mapred.JvmManager: JVM :
    jvm_200909221656_0006_r_-1417908695 exited. Number of tasks it ran: 0
    2009-09-24 18:16:43,091 INFO org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_r_000000_1 done; removing files.
    2009-09-24 18:16:43,092 INFO org.apache.hadoop.mapred.TaskTracker:
    addFreeSlot : current free slots : 2
    2009-09-24 18:17:07,057 INFO org.apache.hadoop.mapred.TaskTracker: Received
    'KillJobAction' for job: job_200909221656_0006


    -----Original Message-----
    From: Todd Lipcon
    Sent: Thursday, September 24, 2009 10:19 AM
    To: common-user@hadoop.apache.org
    Subject: Re: Task process exit with nonzero status of 1

    Hi Marc,

    Exit status 1 usually means some kind of controlled exit by the mapreduce
    child task. Things like JVM crashes usually are indicated by other exit
    codes (134 seems to be the code most commonly reported).

    If you look at the stderr and stdout from your task (in the userlogs/
    directory on the task tracker that ran them) do you see any output?
    Additionally, is there anything in the logs for the task tracker itself?
    That log is hadoop.log.dir/hadoop-<username>-tasktracker*log

    If that log is pretty long, try grepping for WARN, ERROR, or Exception

    -Todd
    On Thu, Sep 24, 2009 at 9:57 AM, Marc Limotte wrote:

    Thanks for the suggestion, Edward. I only upgraded the JVM after the
    problem occurred to see if it would help, but it made no difference.

    Marc

    -----Original Message-----
    From: Edward Capriolo
    Sent: Thursday, September 24, 2009 7:50 AM
    To: common-user@hadoop.apache.org
    Subject: Re: Task process exit with nonzero status of 1
    On Wed, Sep 23, 2009 at 2:06 PM, Marc Limotte wrote:
    I'm seeing this error when I try to run my job.

    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)

    From what I can find by doing some Google searches, this means the
    mapred
    task JVM has crashed. Not many suggestions about what to do about it. Some
    suggestions about increasing max heap. I tried that, although I don't think
    that's the issue because it's not a particularly memory intensive process
    and I've even tried it with a super small input data set of only a few
    records. Still see the same issue.
    Can't find anything else in the logs. I don't think my task even
    started, because there are no user logs created at all. Seems to fail during
    Job Setup.
    A little more background. This job was working fine for weeks, running
    hourly, and then failed on Saturday morning and hasn't worked since.
    Obviously, I looked for something that changed at that point, but no one
    was working at that time... can't find anything that changed. I tried the
    job with different input data sets, doesn't seem to matter, unless I run it
    with no data at all. The job does run with no input data, but if I have
    even a few input records it fails-doesn't seem to matter which records. I
    suspected some corruption in HDFS, but I was able to extract the data from
    HDFS (hadoop dfs -get ...) and the data looks ok. I also copied this data
    set to our TEST cluster and ran the job there... and it WORKED!
    Ran one of our other jobs and it failed as well, so it doesn't seem to
    be
    job specific either; looks like every job fails the same way.
    Did a complete reboot of the cluster-no impact.

    We're using Hadoop 0.20.0, and Java 1.6 update 16 on CentOS 5.2 64bit.

    Any suggestions on what could be wrong or where to look for more
    information would be appreciated.


    Marc Limotte
    Feeva Technology

    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT
    FOR
    ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A
    COMMUNICATION
    PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE,
    DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY
    PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND
    PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
    Just a shot in the dark....

    Did you update java recently


    http://www.koopman.me/2009/04/hadoop-0183-could-not-create-the-java-virtual-machine/
    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR
    ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A
    COMMUNICATION
    PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE,
    DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY
    PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND
    PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR
    ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION
    PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE,
    DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY
    PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND
    PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
  • Marc Limotte at Sep 24, 2009 at 9:19 pm
    Added DEBUG, but don't see anything interesting. The only new tasktracker log entries are about receiving a heartbeat from the JobTracker, or about cleaning up the task afterward.

    Tried the strace. It produces over 6mm lines of output. Not sure what I should be looking for.

    I'm thinking I might try the Cloudera Hadoop 0.20.0 distribution and see if the behavior is any different.

    Marc

    -----Original Message-----
    From: Todd Lipcon
    Sent: Thursday, September 24, 2009 11:28 AM
    To: common-user@hadoop.apache.org
    Subject: Re: Task process exit with nonzero status of 1

    Odd...

    Try bumping up the logs to debug level on that tasktracker, see what you can
    determine?

    You could also strace -f -p <tasktracker pid> -o /tmp/tt_log and then grep
    through those logs later to see what might be going on.

    -Todd
    On Thu, Sep 24, 2009 at 11:24 AM, Marc Limotte wrote:

    Hi Todd.

    No userlogs seem to be created. I'm guessing, because the map task never
    actually starts.

    I don't see any other errors in the tasktracker log, other than the one I
    put in the first message ("java.io.IOException: Task process exit with
    nonzero status of 1..."). I've included the output from one of the nodes'
    tasktracker logs below.

    Any other suggestions?

    Marc

    2009-09-24 18:15:36,955 INFO org.apache.hadoop.mapred.TaskTracker:
    LaunchTaskAction (registerTask): attempt_200909221656_0006_m_000003_0 task's
    state:UNASSIGNED
    2009-09-24 18:15:36,959 INFO org.apache.hadoop.mapred.TaskTracker: Trying
    to launch : attempt_200909221656_0006_m_000003_0
    2009-09-24 18:15:36,960 INFO org.apache.hadoop.mapred.TaskTracker: In
    TaskLauncher, current free slots : 2 and trying to launch
    attempt_200909221656_0006_m_000003_02009-09-24 18:15:37,483 INFO
    org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID:
    jvm_200909221656_0006_m_-145
    18051982009-09-24 18:15:37,483 INFO org.apache.hadoop.mapred.JvmManager:
    JVM Runner jvm_200909221656_0006_m_-1451805198 spawned.
    2009-09-24 18:15:37,511 INFO org.apache.hadoop.mapred.JvmManager: JVM :
    jvm_200909221656_0006_m_-1451805198 exited. Number of t
    asks it ran: 02009-09-24 18:15:37,512 WARN
    org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_m_000003_0
    Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
    2009-09-24 18:15:40,518 INFO org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_m_000003_0 done; removing files.
    2009-09-24 18:15:40,519 INFO org.apache.hadoop.mapred.TaskTracker:
    addFreeSlot : current free slots : 2
    2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker:
    LaunchTaskAction (registerTask): attempt_200909221656_0006_r
    _000001_0 task's state:UNASSIGNED2009-09-24 18:15:42,964 INFO
    org.apache.hadoop.mapred.TaskTracker: Trying to launch :
    attempt_200909221656_0006_r_000001_0
    2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker: In
    TaskLauncher, current free slots : 2 and trying to launch
    attempt_200909221656_0006_r_000001_02009-09-24 18:15:43,000 INFO
    org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID:
    jvm_200909221656_0006_r_7885
    020722009-09-24 18:15:43,000 INFO org.apache.hadoop.mapred.JvmManager: JVM
    Runner jvm_200909221656_0006_r_788502072 spawned.
    2009-09-24 18:15:43,026 INFO org.apache.hadoop.mapred.JvmManager: JVM :
    jvm_200909221656_0006_r_788502072 exited. Number of tas
    ks it ran: 0
    2009-09-24 18:15:43,026 WARN org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_r_000001_0 Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at
    org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)2009-09-24
    18:15:46,034 INFO org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_r_000001_0 done; removing files.
    2009-09-24 18:15:46,039 INFO org.apache.hadoop.mapred.TaskTracker:
    addFreeSlot : current free slots : 2
    2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker:
    LaunchTaskAction (registerTask): attempt_200909221656_0006_m
    _000002_1 task's state:UNASSIGNED
    2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: Trying
    to launch : attempt_200909221656_0006_m_000002_1
    2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: In
    TaskLauncher, current free slots : 2 and trying to launch
    attempt_200909221656_0006_m_000002_1
    2009-09-24 18:16:34,060 INFO org.apache.hadoop.mapred.JvmManager: In
    JvmRunner constructed JVM ID: jvm_200909221656_0006_m_-2120349138
    2009-09-24 18:16:34,060 INFO org.apache.hadoop.mapred.JvmManager: JVM
    Runner jvm_200909221656_0006_m_-2120349138 spawned.
    2009-09-24 18:16:34,086 INFO org.apache.hadoop.mapred.JvmManager: JVM :
    jvm_200909221656_0006_m_-2120349138 exited. Number of tasks it ran: 0
    2009-09-24 18:16:34,087 WARN org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_m_000002_1 Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
    2009-09-24 18:16:37,094 INFO org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_m_000002_1 done; removing files.
    2009-09-24 18:16:37,095 INFO org.apache.hadoop.mapred.TaskTracker:
    addFreeSlot : current free slots : 2
    2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker:
    LaunchTaskAction (registerTask): attempt_200909221656_0006_r_000000_1 task's
    state:UNASSIGNED
    2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker: Trying
    to launch : attempt_200909221656_0006_r_000000_1
    2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker: In
    TaskLauncher, current free slots : 2 and trying to launch
    attempt_200909221656_0006_r_000000_1
    2009-09-24 18:16:40,057 INFO org.apache.hadoop.mapred.JvmManager: In
    JvmRunner constructed JVM ID: jvm_200909221656_0006_r_-1417908695
    2009-09-24 18:16:40,057 INFO org.apache.hadoop.mapred.JvmManager: JVM
    Runner jvm_200909221656_0006_r_-1417908695 spawned.
    2009-09-24 18:16:40,084 WARN org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_r_000000_1 Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
    2009-09-24 18:16:40,084 INFO org.apache.hadoop.mapred.JvmManager: JVM :
    jvm_200909221656_0006_r_-1417908695 exited. Number of tasks it ran: 0
    2009-09-24 18:16:43,091 INFO org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_r_000000_1 done; removing files.
    2009-09-24 18:16:43,092 INFO org.apache.hadoop.mapred.TaskTracker:
    addFreeSlot : current free slots : 2
    2009-09-24 18:17:07,057 INFO org.apache.hadoop.mapred.TaskTracker: Received
    'KillJobAction' for job: job_200909221656_0006


    -----Original Message-----
    From: Todd Lipcon
    Sent: Thursday, September 24, 2009 10:19 AM
    To: common-user@hadoop.apache.org
    Subject: Re: Task process exit with nonzero status of 1

    Hi Marc,

    Exit status 1 usually means some kind of controlled exit by the mapreduce
    child task. Things like JVM crashes usually are indicated by other exit
    codes (134 seems to be the code most commonly reported).

    If you look at the stderr and stdout from your task (in the userlogs/
    directory on the task tracker that ran them) do you see any output?
    Additionally, is there anything in the logs for the task tracker itself?
    That log is hadoop.log.dir/hadoop-<username>-tasktracker*log

    If that log is pretty long, try grepping for WARN, ERROR, or Exception

    -Todd
    On Thu, Sep 24, 2009 at 9:57 AM, Marc Limotte wrote:

    Thanks for the suggestion, Edward. I only upgraded the JVM after the
    problem occurred to see if it would help, but it made no difference.

    Marc

    -----Original Message-----
    From: Edward Capriolo
    Sent: Thursday, September 24, 2009 7:50 AM
    To: common-user@hadoop.apache.org
    Subject: Re: Task process exit with nonzero status of 1
    On Wed, Sep 23, 2009 at 2:06 PM, Marc Limotte wrote:
    I'm seeing this error when I try to run my job.

    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)

    From what I can find by doing some Google searches, this means the
    mapred
    task JVM has crashed. Not many suggestions about what to do about it. Some
    suggestions about increasing max heap. I tried that, although I don't think
    that's the issue because it's not a particularly memory intensive process
    and I've even tried it with a super small input data set of only a few
    records. Still see the same issue.
    Can't find anything else in the logs. I don't think my task even
    started, because there are no user logs created at all. Seems to fail during
    Job Setup.
    A little more background. This job was working fine for weeks, running
    hourly, and then failed on Saturday morning and hasn't worked since.
    Obviously, I looked for something that changed at that point, but no one
    was working at that time... can't find anything that changed. I tried the
    job with different input data sets, doesn't seem to matter, unless I run it
    with no data at all. The job does run with no input data, but if I have
    even a few input records it fails-doesn't seem to matter which records. I
    suspected some corruption in HDFS, but I was able to extract the data from
    HDFS (hadoop dfs -get ...) and the data looks ok. I also copied this data
    set to our TEST cluster and ran the job there... and it WORKED!
    Ran one of our other jobs and it failed as well, so it doesn't seem to
    be
    job specific either; looks like every job fails the same way.
    Did a complete reboot of the cluster-no impact.

    We're using Hadoop 0.20.0, and Java 1.6 update 16 on CentOS 5.2 64bit.

    Any suggestions on what could be wrong or where to look for more
    information would be appreciated.


    Marc Limotte
    Feeva Technology

    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT
    FOR
    ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A
    COMMUNICATION
    PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE,
    DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY
    PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND
    PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
    Just a shot in the dark....

    Did you update java recently


    http://www.koopman.me/2009/04/hadoop-0183-could-not-create-the-java-virtual-machine/
    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR
    ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A
    COMMUNICATION
    PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE,
    DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY
    PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND
    PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR
    ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION
    PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE,
    DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY
    PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND
    PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
  • Marc Limotte at Sep 24, 2009 at 10:54 pm
    One more clue.

    If I change "mapred.job.tracker" to "local" on this cluster, then the I can run the job successfully. I guess in this case it doesn't have to launch the child JVM, which is the thing that is failing.


    Marc

    -----Original Message-----
    From: Marc Limotte
    Sent: Thursday, September 24, 2009 2:19 PM
    To: common-user@hadoop.apache.org
    Cc: Deept Kumar
    Subject: RE: Task process exit with nonzero status of 1

    Added DEBUG, but don't see anything interesting. The only new tasktracker log entries are about receiving a heartbeat from the JobTracker, or about cleaning up the task afterward.

    Tried the strace. It produces over 6mm lines of output. Not sure what I should be looking for.

    I'm thinking I might try the Cloudera Hadoop 0.20.0 distribution and see if the behavior is any different.

    Marc

    -----Original Message-----
    From: Todd Lipcon
    Sent: Thursday, September 24, 2009 11:28 AM
    To: common-user@hadoop.apache.org
    Subject: Re: Task process exit with nonzero status of 1

    Odd...

    Try bumping up the logs to debug level on that tasktracker, see what you can
    determine?

    You could also strace -f -p <tasktracker pid> -o /tmp/tt_log and then grep
    through those logs later to see what might be going on.

    -Todd
    On Thu, Sep 24, 2009 at 11:24 AM, Marc Limotte wrote:

    Hi Todd.

    No userlogs seem to be created. I'm guessing, because the map task never
    actually starts.

    I don't see any other errors in the tasktracker log, other than the one I
    put in the first message ("java.io.IOException: Task process exit with
    nonzero status of 1..."). I've included the output from one of the nodes'
    tasktracker logs below.

    Any other suggestions?

    Marc

    2009-09-24 18:15:36,955 INFO org.apache.hadoop.mapred.TaskTracker:
    LaunchTaskAction (registerTask): attempt_200909221656_0006_m_000003_0 task's
    state:UNASSIGNED
    2009-09-24 18:15:36,959 INFO org.apache.hadoop.mapred.TaskTracker: Trying
    to launch : attempt_200909221656_0006_m_000003_0
    2009-09-24 18:15:36,960 INFO org.apache.hadoop.mapred.TaskTracker: In
    TaskLauncher, current free slots : 2 and trying to launch
    attempt_200909221656_0006_m_000003_02009-09-24 18:15:37,483 INFO
    org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID:
    jvm_200909221656_0006_m_-145
    18051982009-09-24 18:15:37,483 INFO org.apache.hadoop.mapred.JvmManager:
    JVM Runner jvm_200909221656_0006_m_-1451805198 spawned.
    2009-09-24 18:15:37,511 INFO org.apache.hadoop.mapred.JvmManager: JVM :
    jvm_200909221656_0006_m_-1451805198 exited. Number of t
    asks it ran: 02009-09-24 18:15:37,512 WARN
    org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_m_000003_0
    Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
    2009-09-24 18:15:40,518 INFO org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_m_000003_0 done; removing files.
    2009-09-24 18:15:40,519 INFO org.apache.hadoop.mapred.TaskTracker:
    addFreeSlot : current free slots : 2
    2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker:
    LaunchTaskAction (registerTask): attempt_200909221656_0006_r
    _000001_0 task's state:UNASSIGNED2009-09-24 18:15:42,964 INFO
    org.apache.hadoop.mapred.TaskTracker: Trying to launch :
    attempt_200909221656_0006_r_000001_0
    2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker: In
    TaskLauncher, current free slots : 2 and trying to launch
    attempt_200909221656_0006_r_000001_02009-09-24 18:15:43,000 INFO
    org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID:
    jvm_200909221656_0006_r_7885
    020722009-09-24 18:15:43,000 INFO org.apache.hadoop.mapred.JvmManager: JVM
    Runner jvm_200909221656_0006_r_788502072 spawned.
    2009-09-24 18:15:43,026 INFO org.apache.hadoop.mapred.JvmManager: JVM :
    jvm_200909221656_0006_r_788502072 exited. Number of tas
    ks it ran: 0
    2009-09-24 18:15:43,026 WARN org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_r_000001_0 Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at
    org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)2009-09-24
    18:15:46,034 INFO org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_r_000001_0 done; removing files.
    2009-09-24 18:15:46,039 INFO org.apache.hadoop.mapred.TaskTracker:
    addFreeSlot : current free slots : 2
    2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker:
    LaunchTaskAction (registerTask): attempt_200909221656_0006_m
    _000002_1 task's state:UNASSIGNED
    2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: Trying
    to launch : attempt_200909221656_0006_m_000002_1
    2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: In
    TaskLauncher, current free slots : 2 and trying to launch
    attempt_200909221656_0006_m_000002_1
    2009-09-24 18:16:34,060 INFO org.apache.hadoop.mapred.JvmManager: In
    JvmRunner constructed JVM ID: jvm_200909221656_0006_m_-2120349138
    2009-09-24 18:16:34,060 INFO org.apache.hadoop.mapred.JvmManager: JVM
    Runner jvm_200909221656_0006_m_-2120349138 spawned.
    2009-09-24 18:16:34,086 INFO org.apache.hadoop.mapred.JvmManager: JVM :
    jvm_200909221656_0006_m_-2120349138 exited. Number of tasks it ran: 0
    2009-09-24 18:16:34,087 WARN org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_m_000002_1 Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
    2009-09-24 18:16:37,094 INFO org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_m_000002_1 done; removing files.
    2009-09-24 18:16:37,095 INFO org.apache.hadoop.mapred.TaskTracker:
    addFreeSlot : current free slots : 2
    2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker:
    LaunchTaskAction (registerTask): attempt_200909221656_0006_r_000000_1 task's
    state:UNASSIGNED
    2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker: Trying
    to launch : attempt_200909221656_0006_r_000000_1
    2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker: In
    TaskLauncher, current free slots : 2 and trying to launch
    attempt_200909221656_0006_r_000000_1
    2009-09-24 18:16:40,057 INFO org.apache.hadoop.mapred.JvmManager: In
    JvmRunner constructed JVM ID: jvm_200909221656_0006_r_-1417908695
    2009-09-24 18:16:40,057 INFO org.apache.hadoop.mapred.JvmManager: JVM
    Runner jvm_200909221656_0006_r_-1417908695 spawned.
    2009-09-24 18:16:40,084 WARN org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_r_000000_1 Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
    2009-09-24 18:16:40,084 INFO org.apache.hadoop.mapred.JvmManager: JVM :
    jvm_200909221656_0006_r_-1417908695 exited. Number of tasks it ran: 0
    2009-09-24 18:16:43,091 INFO org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_r_000000_1 done; removing files.
    2009-09-24 18:16:43,092 INFO org.apache.hadoop.mapred.TaskTracker:
    addFreeSlot : current free slots : 2
    2009-09-24 18:17:07,057 INFO org.apache.hadoop.mapred.TaskTracker: Received
    'KillJobAction' for job: job_200909221656_0006


    -----Original Message-----
    From: Todd Lipcon
    Sent: Thursday, September 24, 2009 10:19 AM
    To: common-user@hadoop.apache.org
    Subject: Re: Task process exit with nonzero status of 1

    Hi Marc,

    Exit status 1 usually means some kind of controlled exit by the mapreduce
    child task. Things like JVM crashes usually are indicated by other exit
    codes (134 seems to be the code most commonly reported).

    If you look at the stderr and stdout from your task (in the userlogs/
    directory on the task tracker that ran them) do you see any output?
    Additionally, is there anything in the logs for the task tracker itself?
    That log is hadoop.log.dir/hadoop-<username>-tasktracker*log

    If that log is pretty long, try grepping for WARN, ERROR, or Exception

    -Todd
    On Thu, Sep 24, 2009 at 9:57 AM, Marc Limotte wrote:

    Thanks for the suggestion, Edward. I only upgraded the JVM after the
    problem occurred to see if it would help, but it made no difference.

    Marc

    -----Original Message-----
    From: Edward Capriolo
    Sent: Thursday, September 24, 2009 7:50 AM
    To: common-user@hadoop.apache.org
    Subject: Re: Task process exit with nonzero status of 1
    On Wed, Sep 23, 2009 at 2:06 PM, Marc Limotte wrote:
    I'm seeing this error when I try to run my job.

    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)

    From what I can find by doing some Google searches, this means the
    mapred
    task JVM has crashed. Not many suggestions about what to do about it. Some
    suggestions about increasing max heap. I tried that, although I don't think
    that's the issue because it's not a particularly memory intensive process
    and I've even tried it with a super small input data set of only a few
    records. Still see the same issue.
    Can't find anything else in the logs. I don't think my task even
    started, because there are no user logs created at all. Seems to fail during
    Job Setup.
    A little more background. This job was working fine for weeks, running
    hourly, and then failed on Saturday morning and hasn't worked since.
    Obviously, I looked for something that changed at that point, but no one
    was working at that time... can't find anything that changed. I tried the
    job with different input data sets, doesn't seem to matter, unless I run it
    with no data at all. The job does run with no input data, but if I have
    even a few input records it fails-doesn't seem to matter which records. I
    suspected some corruption in HDFS, but I was able to extract the data from
    HDFS (hadoop dfs -get ...) and the data looks ok. I also copied this data
    set to our TEST cluster and ran the job there... and it WORKED!
    Ran one of our other jobs and it failed as well, so it doesn't seem to
    be
    job specific either; looks like every job fails the same way.
    Did a complete reboot of the cluster-no impact.

    We're using Hadoop 0.20.0, and Java 1.6 update 16 on CentOS 5.2 64bit.

    Any suggestions on what could be wrong or where to look for more
    information would be appreciated.


    Marc Limotte
    Feeva Technology

    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT
    FOR
    ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A
    COMMUNICATION
    PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE,
    DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY
    PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND
    PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
    Just a shot in the dark....

    Did you update java recently


    http://www.koopman.me/2009/04/hadoop-0183-could-not-create-the-java-virtual-machine/
    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR
    ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A
    COMMUNICATION
    PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE,
    DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY
    PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND
    PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR
    ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION
    PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE,
    DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY
    PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND
    PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.

    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.

    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
  • Vinod KV at Sep 25, 2009 at 3:23 am
    Marc,

    Looking at the logs, TaskTracker indeed did try to launch the JVM but
    failed immediately. So either it is related to the JVM or the log files,
    given your statement that your code used to work properly before. Looks
    like you came out blank w.r.t JVM Can you try and see if the
    userlogs/tasklogs directory has permissions for the user running the jvm
    to write files to? User running the JVM is same as the one that runs TT,
    unless you use LinuxTaskController.

    If that doesn't take you anywhere too, can you dump your TT
    configuration, may be after making anonymous any site specific
    information like hosts/ports, if you wish?

    Vinod

    Marc Limotte wrote:
    One more clue.

    If I change "mapred.job.tracker" to "local" on this cluster, then the I can run the job successfully. I guess in this case it doesn't have to launch the child JVM, which is the thing that is failing.


    Marc

    -----Original Message-----
    From: Marc Limotte
    Sent: Thursday, September 24, 2009 2:19 PM
    To: common-user@hadoop.apache.org
    Cc: Deept Kumar
    Subject: RE: Task process exit with nonzero status of 1

    Added DEBUG, but don't see anything interesting. The only new tasktracker log entries are about receiving a heartbeat from the JobTracker, or about cleaning up the task afterward.

    Tried the strace. It produces over 6mm lines of output. Not sure what I should be looking for.

    I'm thinking I might try the Cloudera Hadoop 0.20.0 distribution and see if the behavior is any different.

    Marc

    -----Original Message-----
    From: Todd Lipcon
    Sent: Thursday, September 24, 2009 11:28 AM
    To: common-user@hadoop.apache.org
    Subject: Re: Task process exit with nonzero status of 1

    Odd...

    Try bumping up the logs to debug level on that tasktracker, see what you can
    determine?

    You could also strace -f -p <tasktracker pid> -o /tmp/tt_log and then grep
    through those logs later to see what might be going on.

    -Todd

    On Thu, Sep 24, 2009 at 11:24 AM, Marc Limotte wrote:

    Hi Todd.

    No userlogs seem to be created. I'm guessing, because the map task never
    actually starts.

    I don't see any other errors in the tasktracker log, other than the one I
    put in the first message ("java.io.IOException: Task process exit with
    nonzero status of 1..."). I've included the output from one of the nodes'
    tasktracker logs below.

    Any other suggestions?

    Marc

    2009-09-24 18:15:36,955 INFO org.apache.hadoop.mapred.TaskTracker:
    LaunchTaskAction (registerTask): attempt_200909221656_0006_m_000003_0 task's
    state:UNASSIGNED
    2009-09-24 18:15:36,959 INFO org.apache.hadoop.mapred.TaskTracker: Trying
    to launch : attempt_200909221656_0006_m_000003_0
    2009-09-24 18:15:36,960 INFO org.apache.hadoop.mapred.TaskTracker: In
    TaskLauncher, current free slots : 2 and trying to launch
    attempt_200909221656_0006_m_000003_02009-09-24 18:15:37,483 INFO
    org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID:
    jvm_200909221656_0006_m_-145
    18051982009-09-24 18:15:37,483 INFO org.apache.hadoop.mapred.JvmManager:
    JVM Runner jvm_200909221656_0006_m_-1451805198 spawned.
    2009-09-24 18:15:37,511 INFO org.apache.hadoop.mapred.JvmManager: JVM :
    jvm_200909221656_0006_m_-1451805198 exited. Number of t
    asks it ran: 02009-09-24 18:15:37,512 WARN
    org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_m_000003_0
    Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
    2009-09-24 18:15:40,518 INFO org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_m_000003_0 done; removing files.
    2009-09-24 18:15:40,519 INFO org.apache.hadoop.mapred.TaskTracker:
    addFreeSlot : current free slots : 2
    2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker:
    LaunchTaskAction (registerTask): attempt_200909221656_0006_r
    _000001_0 task's state:UNASSIGNED2009-09-24 18:15:42,964 INFO
    org.apache.hadoop.mapred.TaskTracker: Trying to launch :
    attempt_200909221656_0006_r_000001_0
    2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker: In
    TaskLauncher, current free slots : 2 and trying to launch
    attempt_200909221656_0006_r_000001_02009-09-24 18:15:43,000 INFO
    org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID:
    jvm_200909221656_0006_r_7885
    020722009-09-24 18:15:43,000 INFO org.apache.hadoop.mapred.JvmManager: JVM
    Runner jvm_200909221656_0006_r_788502072 spawned.
    2009-09-24 18:15:43,026 INFO org.apache.hadoop.mapred.JvmManager: JVM :
    jvm_200909221656_0006_r_788502072 exited. Number of tas
    ks it ran: 0
    2009-09-24 18:15:43,026 WARN org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_r_000001_0 Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at
    org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)2009-09-24
    18:15:46,034 INFO org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_r_000001_0 done; removing files.
    2009-09-24 18:15:46,039 INFO org.apache.hadoop.mapred.TaskTracker:
    addFreeSlot : current free slots : 2
    2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker:
    LaunchTaskAction (registerTask): attempt_200909221656_0006_m
    _000002_1 task's state:UNASSIGNED
    2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: Trying
    to launch : attempt_200909221656_0006_m_000002_1
    2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: In
    TaskLauncher, current free slots : 2 and trying to launch
    attempt_200909221656_0006_m_000002_1
    2009-09-24 18:16:34,060 INFO org.apache.hadoop.mapred.JvmManager: In
    JvmRunner constructed JVM ID: jvm_200909221656_0006_m_-2120349138
    2009-09-24 18:16:34,060 INFO org.apache.hadoop.mapred.JvmManager: JVM
    Runner jvm_200909221656_0006_m_-2120349138 spawned.
    2009-09-24 18:16:34,086 INFO org.apache.hadoop.mapred.JvmManager: JVM :
    jvm_200909221656_0006_m_-2120349138 exited. Number of tasks it ran: 0
    2009-09-24 18:16:34,087 WARN org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_m_000002_1 Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
    2009-09-24 18:16:37,094 INFO org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_m_000002_1 done; removing files.
    2009-09-24 18:16:37,095 INFO org.apache.hadoop.mapred.TaskTracker:
    addFreeSlot : current free slots : 2
    2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker:
    LaunchTaskAction (registerTask): attempt_200909221656_0006_r_000000_1 task's
    state:UNASSIGNED
    2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker: Trying
    to launch : attempt_200909221656_0006_r_000000_1
    2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker: In
    TaskLauncher, current free slots : 2 and trying to launch
    attempt_200909221656_0006_r_000000_1
    2009-09-24 18:16:40,057 INFO org.apache.hadoop.mapred.JvmManager: In
    JvmRunner constructed JVM ID: jvm_200909221656_0006_r_-1417908695
    2009-09-24 18:16:40,057 INFO org.apache.hadoop.mapred.JvmManager: JVM
    Runner jvm_200909221656_0006_r_-1417908695 spawned.
    2009-09-24 18:16:40,084 WARN org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_r_000000_1 Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
    2009-09-24 18:16:40,084 INFO org.apache.hadoop.mapred.JvmManager: JVM :
    jvm_200909221656_0006_r_-1417908695 exited. Number of tasks it ran: 0
    2009-09-24 18:16:43,091 INFO org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_r_000000_1 done; removing files.
    2009-09-24 18:16:43,092 INFO org.apache.hadoop.mapred.TaskTracker:
    addFreeSlot : current free slots : 2
    2009-09-24 18:17:07,057 INFO org.apache.hadoop.mapred.TaskTracker: Received
    'KillJobAction' for job: job_200909221656_0006
  • Koji Noguchi at Sep 24, 2009 at 6:39 pm

    A little more background. This job was working fine for weeks, running
    hourly, and then failed on Saturday morning and hasn't worked since.
    Any chance that ulimit (mapred.child.ulimit) got enabled?

    Koji

    On 9/24/09 11:24 AM, "Marc Limotte" wrote:

    Hi Todd.

    No userlogs seem to be created. I'm guessing, because the map task never
    actually starts.

    I don't see any other errors in the tasktracker log, other than the one I put
    in the first message ("java.io.IOException: Task process exit with nonzero
    status of 1..."). I've included the output from one of the nodes' tasktracker
    logs below.

    Any other suggestions?

    Marc

    2009-09-24 18:15:36,955 INFO org.apache.hadoop.mapred.TaskTracker:
    LaunchTaskAction (registerTask): attempt_200909221656_0006_m_000003_0 task's
    state:UNASSIGNED
    2009-09-24 18:15:36,959 INFO org.apache.hadoop.mapred.TaskTracker: Trying to
    launch : attempt_200909221656_0006_m_000003_0
    2009-09-24 18:15:36,960 INFO org.apache.hadoop.mapred.TaskTracker: In
    TaskLauncher, current free slots : 2 and trying to launch
    attempt_200909221656_0006_m_000003_02009-09-24 18:15:37,483 INFO
    org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID:
    jvm_200909221656_0006_m_-145
    18051982009-09-24 18:15:37,483 INFO org.apache.hadoop.mapred.JvmManager: JVM
    Runner jvm_200909221656_0006_m_-1451805198 spawned.
    2009-09-24 18:15:37,511 INFO org.apache.hadoop.mapred.JvmManager: JVM :
    jvm_200909221656_0006_m_-1451805198 exited. Number of t
    asks it ran: 02009-09-24 18:15:37,512 WARN
    org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_m_000003_0
    Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
    2009-09-24 18:15:40,518 INFO org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_m_000003_0 done; removing files.
    2009-09-24 18:15:40,519 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot
    : current free slots : 2
    2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker:
    LaunchTaskAction (registerTask): attempt_200909221656_0006_r
    _000001_0 task's state:UNASSIGNED2009-09-24 18:15:42,964 INFO
    org.apache.hadoop.mapred.TaskTracker: Trying to launch :
    attempt_200909221656_0006_r_000001_0
    2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker: In
    TaskLauncher, current free slots : 2 and trying to launch
    attempt_200909221656_0006_r_000001_02009-09-24 18:15:43,000 INFO
    org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID:
    jvm_200909221656_0006_r_7885
    020722009-09-24 18:15:43,000 INFO org.apache.hadoop.mapred.JvmManager: JVM
    Runner jvm_200909221656_0006_r_788502072 spawned.
    2009-09-24 18:15:43,026 INFO org.apache.hadoop.mapred.JvmManager: JVM :
    jvm_200909221656_0006_r_788502072 exited. Number of tas
    ks it ran: 0
    2009-09-24 18:15:43,026 WARN org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_r_000001_0 Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at
    org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)2009-09-24
    18:15:46,034 INFO org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_r_000001_0 done; removing files.
    2009-09-24 18:15:46,039 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot
    : current free slots : 2
    2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker:
    LaunchTaskAction (registerTask): attempt_200909221656_0006_m
    _000002_1 task's state:UNASSIGNED
    2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: Trying to
    launch : attempt_200909221656_0006_m_000002_1
    2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: In
    TaskLauncher, current free slots : 2 and trying to launch
    attempt_200909221656_0006_m_000002_1
    2009-09-24 18:16:34,060 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner
    constructed JVM ID: jvm_200909221656_0006_m_-2120349138
    2009-09-24 18:16:34,060 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner
    jvm_200909221656_0006_m_-2120349138 spawned.
    2009-09-24 18:16:34,086 INFO org.apache.hadoop.mapred.JvmManager: JVM :
    jvm_200909221656_0006_m_-2120349138 exited. Number of tasks it ran: 0
    2009-09-24 18:16:34,087 WARN org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_m_000002_1 Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
    2009-09-24 18:16:37,094 INFO org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_m_000002_1 done; removing files.
    2009-09-24 18:16:37,095 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot
    : current free slots : 2
    2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker:
    LaunchTaskAction (registerTask): attempt_200909221656_0006_r_000000_1 task's
    state:UNASSIGNED
    2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker: Trying to
    launch : attempt_200909221656_0006_r_000000_1
    2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker: In
    TaskLauncher, current free slots : 2 and trying to launch
    attempt_200909221656_0006_r_000000_1
    2009-09-24 18:16:40,057 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner
    constructed JVM ID: jvm_200909221656_0006_r_-1417908695
    2009-09-24 18:16:40,057 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner
    jvm_200909221656_0006_r_-1417908695 spawned.
    2009-09-24 18:16:40,084 WARN org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_r_000000_1 Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
    2009-09-24 18:16:40,084 INFO org.apache.hadoop.mapred.JvmManager: JVM :
    jvm_200909221656_0006_r_-1417908695 exited. Number of tasks it ran: 0
    2009-09-24 18:16:43,091 INFO org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_r_000000_1 done; removing files.
    2009-09-24 18:16:43,092 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot
    : current free slots : 2
    2009-09-24 18:17:07,057 INFO org.apache.hadoop.mapred.TaskTracker: Received
    'KillJobAction' for job: job_200909221656_0006


    -----Original Message-----
    From: Todd Lipcon
    Sent: Thursday, September 24, 2009 10:19 AM
    To: common-user@hadoop.apache.org
    Subject: Re: Task process exit with nonzero status of 1

    Hi Marc,

    Exit status 1 usually means some kind of controlled exit by the mapreduce
    child task. Things like JVM crashes usually are indicated by other exit
    codes (134 seems to be the code most commonly reported).

    If you look at the stderr and stdout from your task (in the userlogs/
    directory on the task tracker that ran them) do you see any output?
    Additionally, is there anything in the logs for the task tracker itself?
    That log is hadoop.log.dir/hadoop-<username>-tasktracker*log

    If that log is pretty long, try grepping for WARN, ERROR, or Exception

    -Todd
    On Thu, Sep 24, 2009 at 9:57 AM, Marc Limotte wrote:

    Thanks for the suggestion, Edward. I only upgraded the JVM after the
    problem occurred to see if it would help, but it made no difference.

    Marc

    -----Original Message-----
    From: Edward Capriolo
    Sent: Thursday, September 24, 2009 7:50 AM
    To: common-user@hadoop.apache.org
    Subject: Re: Task process exit with nonzero status of 1
    On Wed, Sep 23, 2009 at 2:06 PM, Marc Limotte wrote:
    I'm seeing this error when I try to run my job.

    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)

    From what I can find by doing some Google searches, this means the mapred
    task JVM has crashed. Not many suggestions about what to do about it. Some
    suggestions about increasing max heap. I tried that, although I don't think
    that's the issue because it's not a particularly memory intensive process
    and I've even tried it with a super small input data set of only a few
    records. Still see the same issue.
    Can't find anything else in the logs. I don't think my task even
    started, because there are no user logs created at all. Seems to fail during
    Job Setup.
    A little more background. This job was working fine for weeks, running
    hourly, and then failed on Saturday morning and hasn't worked since.
    Obviously, I looked for something that changed at that point, but no one
    was working at that time... can't find anything that changed. I tried the
    job with different input data sets, doesn't seem to matter, unless I run it
    with no data at all. The job does run with no input data, but if I have
    even a few input records it fails-doesn't seem to matter which records. I
    suspected some corruption in HDFS, but I was able to extract the data from
    HDFS (hadoop dfs -get ...) and the data looks ok. I also copied this data
    set to our TEST cluster and ran the job there... and it WORKED!
    Ran one of our other jobs and it failed as well, so it doesn't seem to be
    job specific either; looks like every job fails the same way.
    Did a complete reboot of the cluster-no impact.

    We're using Hadoop 0.20.0, and Java 1.6 update 16 on CentOS 5.2 64bit.

    Any suggestions on what could be wrong or where to look for more
    information would be appreciated.


    Marc Limotte
    Feeva Technology

    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR
    ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION
    PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE,
    DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY
    PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND
    PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
    Just a shot in the dark....

    Did you update java recently


    http://www.koopman.me/2009/04/hadoop-0183-could-not-create-the-java-virtual-m
    achine/

    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR
    ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION
    PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE,
    DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY
    PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND
    PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY
    THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION
    PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE,
    DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED.
    PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE
    THIS MESSAGE FROM YOUR SYSTEM.
  • Marc Limotte at Sep 24, 2009 at 9:12 pm
    Hi Koji,

    Thanks for the suggestion. We have not set mapred.child.ulimit in our hadoop conf files. And I verified that it was not set in the logged job.conf. Don't see any limits set at the OS level, either.

    Marc

    -----Original Message-----
    From: Koji Noguchi
    Sent: Thursday, September 24, 2009 11:37 AM
    To: common-user@hadoop.apache.org
    Subject: Re: Task process exit with nonzero status of 1
    A little more background. This job was working fine for weeks, running
    hourly, and then failed on Saturday morning and hasn't worked since.
    Any chance that ulimit (mapred.child.ulimit) got enabled?

    Koji

    On 9/24/09 11:24 AM, "Marc Limotte" wrote:

    Hi Todd.

    No userlogs seem to be created. I'm guessing, because the map task never
    actually starts.

    I don't see any other errors in the tasktracker log, other than the one I put
    in the first message ("java.io.IOException: Task process exit with nonzero
    status of 1..."). I've included the output from one of the nodes' tasktracker
    logs below.

    Any other suggestions?

    Marc

    2009-09-24 18:15:36,955 INFO org.apache.hadoop.mapred.TaskTracker:
    LaunchTaskAction (registerTask): attempt_200909221656_0006_m_000003_0 task's
    state:UNASSIGNED
    2009-09-24 18:15:36,959 INFO org.apache.hadoop.mapred.TaskTracker: Trying to
    launch : attempt_200909221656_0006_m_000003_0
    2009-09-24 18:15:36,960 INFO org.apache.hadoop.mapred.TaskTracker: In
    TaskLauncher, current free slots : 2 and trying to launch
    attempt_200909221656_0006_m_000003_02009-09-24 18:15:37,483 INFO
    org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID:
    jvm_200909221656_0006_m_-145
    18051982009-09-24 18:15:37,483 INFO org.apache.hadoop.mapred.JvmManager: JVM
    Runner jvm_200909221656_0006_m_-1451805198 spawned.
    2009-09-24 18:15:37,511 INFO org.apache.hadoop.mapred.JvmManager: JVM :
    jvm_200909221656_0006_m_-1451805198 exited. Number of t
    asks it ran: 02009-09-24 18:15:37,512 WARN
    org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_m_000003_0
    Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
    2009-09-24 18:15:40,518 INFO org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_m_000003_0 done; removing files.
    2009-09-24 18:15:40,519 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot
    : current free slots : 2
    2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker:
    LaunchTaskAction (registerTask): attempt_200909221656_0006_r
    _000001_0 task's state:UNASSIGNED2009-09-24 18:15:42,964 INFO
    org.apache.hadoop.mapred.TaskTracker: Trying to launch :
    attempt_200909221656_0006_r_000001_0
    2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker: In
    TaskLauncher, current free slots : 2 and trying to launch
    attempt_200909221656_0006_r_000001_02009-09-24 18:15:43,000 INFO
    org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID:
    jvm_200909221656_0006_r_7885
    020722009-09-24 18:15:43,000 INFO org.apache.hadoop.mapred.JvmManager: JVM
    Runner jvm_200909221656_0006_r_788502072 spawned.
    2009-09-24 18:15:43,026 INFO org.apache.hadoop.mapred.JvmManager: JVM :
    jvm_200909221656_0006_r_788502072 exited. Number of tas
    ks it ran: 0
    2009-09-24 18:15:43,026 WARN org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_r_000001_0 Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at
    org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)2009-09-24
    18:15:46,034 INFO org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_r_000001_0 done; removing files.
    2009-09-24 18:15:46,039 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot
    : current free slots : 2
    2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker:
    LaunchTaskAction (registerTask): attempt_200909221656_0006_m
    _000002_1 task's state:UNASSIGNED
    2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: Trying to
    launch : attempt_200909221656_0006_m_000002_1
    2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: In
    TaskLauncher, current free slots : 2 and trying to launch
    attempt_200909221656_0006_m_000002_1
    2009-09-24 18:16:34,060 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner
    constructed JVM ID: jvm_200909221656_0006_m_-2120349138
    2009-09-24 18:16:34,060 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner
    jvm_200909221656_0006_m_-2120349138 spawned.
    2009-09-24 18:16:34,086 INFO org.apache.hadoop.mapred.JvmManager: JVM :
    jvm_200909221656_0006_m_-2120349138 exited. Number of tasks it ran: 0
    2009-09-24 18:16:34,087 WARN org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_m_000002_1 Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
    2009-09-24 18:16:37,094 INFO org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_m_000002_1 done; removing files.
    2009-09-24 18:16:37,095 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot
    : current free slots : 2
    2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker:
    LaunchTaskAction (registerTask): attempt_200909221656_0006_r_000000_1 task's
    state:UNASSIGNED
    2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker: Trying to
    launch : attempt_200909221656_0006_r_000000_1
    2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker: In
    TaskLauncher, current free slots : 2 and trying to launch
    attempt_200909221656_0006_r_000000_1
    2009-09-24 18:16:40,057 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner
    constructed JVM ID: jvm_200909221656_0006_r_-1417908695
    2009-09-24 18:16:40,057 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner
    jvm_200909221656_0006_r_-1417908695 spawned.
    2009-09-24 18:16:40,084 WARN org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_r_000000_1 Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
    2009-09-24 18:16:40,084 INFO org.apache.hadoop.mapred.JvmManager: JVM :
    jvm_200909221656_0006_r_-1417908695 exited. Number of tasks it ran: 0
    2009-09-24 18:16:43,091 INFO org.apache.hadoop.mapred.TaskRunner:
    attempt_200909221656_0006_r_000000_1 done; removing files.
    2009-09-24 18:16:43,092 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot
    : current free slots : 2
    2009-09-24 18:17:07,057 INFO org.apache.hadoop.mapred.TaskTracker: Received
    'KillJobAction' for job: job_200909221656_0006


    -----Original Message-----
    From: Todd Lipcon
    Sent: Thursday, September 24, 2009 10:19 AM
    To: common-user@hadoop.apache.org
    Subject: Re: Task process exit with nonzero status of 1

    Hi Marc,

    Exit status 1 usually means some kind of controlled exit by the mapreduce
    child task. Things like JVM crashes usually are indicated by other exit
    codes (134 seems to be the code most commonly reported).

    If you look at the stderr and stdout from your task (in the userlogs/
    directory on the task tracker that ran them) do you see any output?
    Additionally, is there anything in the logs for the task tracker itself?
    That log is hadoop.log.dir/hadoop-<username>-tasktracker*log

    If that log is pretty long, try grepping for WARN, ERROR, or Exception

    -Todd
    On Thu, Sep 24, 2009 at 9:57 AM, Marc Limotte wrote:

    Thanks for the suggestion, Edward. I only upgraded the JVM after the
    problem occurred to see if it would help, but it made no difference.

    Marc

    -----Original Message-----
    From: Edward Capriolo
    Sent: Thursday, September 24, 2009 7:50 AM
    To: common-user@hadoop.apache.org
    Subject: Re: Task process exit with nonzero status of 1
    On Wed, Sep 23, 2009 at 2:06 PM, Marc Limotte wrote:
    I'm seeing this error when I try to run my job.

    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)

    From what I can find by doing some Google searches, this means the mapred
    task JVM has crashed. Not many suggestions about what to do about it. Some
    suggestions about increasing max heap. I tried that, although I don't think
    that's the issue because it's not a particularly memory intensive process
    and I've even tried it with a super small input data set of only a few
    records. Still see the same issue.
    Can't find anything else in the logs. I don't think my task even
    started, because there are no user logs created at all. Seems to fail during
    Job Setup.
    A little more background. This job was working fine for weeks, running
    hourly, and then failed on Saturday morning and hasn't worked since.
    Obviously, I looked for something that changed at that point, but no one
    was working at that time... can't find anything that changed. I tried the
    job with different input data sets, doesn't seem to matter, unless I run it
    with no data at all. The job does run with no input data, but if I have
    even a few input records it fails-doesn't seem to matter which records. I
    suspected some corruption in HDFS, but I was able to extract the data from
    HDFS (hadoop dfs -get ...) and the data looks ok. I also copied this data
    set to our TEST cluster and ran the job there... and it WORKED!
    Ran one of our other jobs and it failed as well, so it doesn't seem to be
    job specific either; looks like every job fails the same way.
    Did a complete reboot of the cluster-no impact.

    We're using Hadoop 0.20.0, and Java 1.6 update 16 on CentOS 5.2 64bit.

    Any suggestions on what could be wrong or where to look for more
    information would be appreciated.


    Marc Limotte
    Feeva Technology

    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR
    ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION
    PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE,
    DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY
    PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND
    PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
    Just a shot in the dark....

    Did you update java recently


    http://www.koopman.me/2009/04/hadoop-0183-could-not-create-the-java-virtual-m
    achine/

    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR
    ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION
    PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE,
    DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY
    PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND
    PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY
    THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION
    PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE,
    DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED.
    PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE
    THIS MESSAGE FROM YOUR SYSTEM.

    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
  • Feng, Ao at Oct 9, 2009 at 5:48 pm
    I probably know what the problem it, as we are encountering the same issue on our prod cluster. Every once a while jobs start failing on the same task trackers, and the only error message is this exit status 1.

    Go to the userlogs directory on the host where your tasks fail, and verify if there are 31,999 directories all looking like attempt_... Once you get to that point, JVM would run out of file descriptors, as it tries to create the 32,000 one. I confirmed that cleaning up the userlogs directory solves the problem... temporarily.

    So my questions are:

    1. Where is the 32,000 limit imposed, and how do we change it?
    2. In the Hadoop configuration, is there any parameter to specify when to automatically delete those old user logs (like after a month)?

    As we are running 0.19, I was hoping 0.20 has fixed it, but seems it is not the case. Can someone file a bug report/feature request, if there is no elegant solution at this time?

    Thanks,
    Ao

    -----Original Message-----
    From: Marc Limotte
    Sent: Thursday, September 24, 2009 11:24 AM
    To: common-user@hadoop.apache.org
    Subject: RE: Task process exit with nonzero status of 1

    Hi Todd.

    No userlogs seem to be created. I'm guessing, because the map task never actually starts.

    I don't see any other errors in the tasktracker log, other than the one I put in the first message ("java.io.IOException: Task process exit with nonzero status of 1..."). I've included the output from one of the nodes' tasktracker logs below.

    Any other suggestions?

    Marc

    2009-09-24 18:15:36,955 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_200909221656_0006_m_000003_0 task's state:UNASSIGNED
    2009-09-24 18:15:36,959 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_200909221656_0006_m_000003_0
    2009-09-24 18:15:36,960 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch
    attempt_200909221656_0006_m_000003_02009-09-24 18:15:37,483 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_200909221656_0006_m_-145
    18051982009-09-24 18:15:37,483 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_200909221656_0006_m_-1451805198 spawned.
    2009-09-24 18:15:37,511 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_200909221656_0006_m_-1451805198 exited. Number of t
    asks it ran: 02009-09-24 18:15:37,512 WARN org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_m_000003_0 Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
    2009-09-24 18:15:40,518 INFO org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_m_000003_0 done; removing files.
    2009-09-24 18:15:40,519 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2
    2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_200909221656_0006_r
    _000001_0 task's state:UNASSIGNED2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_200909221656_0006_r_000001_0
    2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch
    attempt_200909221656_0006_r_000001_02009-09-24 18:15:43,000 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_200909221656_0006_r_7885
    020722009-09-24 18:15:43,000 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_200909221656_0006_r_788502072 spawned.
    2009-09-24 18:15:43,026 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_200909221656_0006_r_788502072 exited. Number of tas
    ks it ran: 0
    2009-09-24 18:15:43,026 WARN org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_r_000001_0 Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)2009-09-24 18:15:46,034 INFO org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_r_000001_0 done; removing files.
    2009-09-24 18:15:46,039 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2
    2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_200909221656_0006_m
    _000002_1 task's state:UNASSIGNED
    2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_200909221656_0006_m_000002_1
    2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_200909221656_0006_m_000002_1
    2009-09-24 18:16:34,060 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_200909221656_0006_m_-2120349138
    2009-09-24 18:16:34,060 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_200909221656_0006_m_-2120349138 spawned.
    2009-09-24 18:16:34,086 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_200909221656_0006_m_-2120349138 exited. Number of tasks it ran: 0
    2009-09-24 18:16:34,087 WARN org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_m_000002_1 Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
    2009-09-24 18:16:37,094 INFO org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_m_000002_1 done; removing files.
    2009-09-24 18:16:37,095 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2
    2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_200909221656_0006_r_000000_1 task's state:UNASSIGNED
    2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_200909221656_0006_r_000000_1
    2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_200909221656_0006_r_000000_1
    2009-09-24 18:16:40,057 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_200909221656_0006_r_-1417908695
    2009-09-24 18:16:40,057 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_200909221656_0006_r_-1417908695 spawned.
    2009-09-24 18:16:40,084 WARN org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_r_000000_1 Child Error
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
    2009-09-24 18:16:40,084 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_200909221656_0006_r_-1417908695 exited. Number of tasks it ran: 0
    2009-09-24 18:16:43,091 INFO org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_r_000000_1 done; removing files.
    2009-09-24 18:16:43,092 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2
    2009-09-24 18:17:07,057 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_200909221656_0006


    -----Original Message-----
    From: Todd Lipcon
    Sent: Thursday, September 24, 2009 10:19 AM
    To: common-user@hadoop.apache.org
    Subject: Re: Task process exit with nonzero status of 1

    Hi Marc,

    Exit status 1 usually means some kind of controlled exit by the mapreduce
    child task. Things like JVM crashes usually are indicated by other exit
    codes (134 seems to be the code most commonly reported).

    If you look at the stderr and stdout from your task (in the userlogs/
    directory on the task tracker that ran them) do you see any output?
    Additionally, is there anything in the logs for the task tracker itself?
    That log is hadoop.log.dir/hadoop-<username>-tasktracker*log

    If that log is pretty long, try grepping for WARN, ERROR, or Exception

    -Todd
    On Thu, Sep 24, 2009 at 9:57 AM, Marc Limotte wrote:

    Thanks for the suggestion, Edward. I only upgraded the JVM after the
    problem occurred to see if it would help, but it made no difference.

    Marc

    -----Original Message-----
    From: Edward Capriolo
    Sent: Thursday, September 24, 2009 7:50 AM
    To: common-user@hadoop.apache.org
    Subject: Re: Task process exit with nonzero status of 1
    On Wed, Sep 23, 2009 at 2:06 PM, Marc Limotte wrote:
    I'm seeing this error when I try to run my job.

    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)

    From what I can find by doing some Google searches, this means the mapred
    task JVM has crashed. Not many suggestions about what to do about it. Some
    suggestions about increasing max heap. I tried that, although I don't think
    that's the issue because it's not a particularly memory intensive process
    and I've even tried it with a super small input data set of only a few
    records. Still see the same issue.
    Can't find anything else in the logs. I don't think my task even
    started, because there are no user logs created at all. Seems to fail during
    Job Setup.
    A little more background. This job was working fine for weeks, running
    hourly, and then failed on Saturday morning and hasn't worked since.
    Obviously, I looked for something that changed at that point, but no one
    was working at that time... can't find anything that changed. I tried the
    job with different input data sets, doesn't seem to matter, unless I run it
    with no data at all. The job does run with no input data, but if I have
    even a few input records it fails-doesn't seem to matter which records. I
    suspected some corruption in HDFS, but I was able to extract the data from
    HDFS (hadoop dfs -get ...) and the data looks ok. I also copied this data
    set to our TEST cluster and ran the job there... and it WORKED!
    Ran one of our other jobs and it failed as well, so it doesn't seem to be
    job specific either; looks like every job fails the same way.
    Did a complete reboot of the cluster-no impact.

    We're using Hadoop 0.20.0, and Java 1.6 update 16 on CentOS 5.2 64bit.

    Any suggestions on what could be wrong or where to look for more
    information would be appreciated.


    Marc Limotte
    Feeva Technology

    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR
    ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION
    PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE,
    DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY
    PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND
    PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
    Just a shot in the dark....

    Did you update java recently


    http://www.koopman.me/2009/04/hadoop-0183-could-not-create-the-java-virtual-machine/

    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR
    ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION
    PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE,
    DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY
    PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND
    PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
  • Frank Singleton at Oct 9, 2009 at 6:27 pm

    Feng, Ao wrote:
    I probably know what the problem it, as we are encountering the same issue on our prod cluster. Every once a while jobs start failing on the same task trackers, and the only error message is this exit status 1.

    Go to the userlogs directory on the host where your tasks fail, and verify if there are 31,999 directories all looking like attempt_... Once you get to that point, JVM would run out of file descriptors, as it tries to create the 32,000 one. I confirmed that cleaning up the userlogs directory solves the problem... temporarily.

    So my questions are:

    1. Where is the 32,000 limit imposed, and how do we change it?
    >

    As far as ext3 file system capabilities are concerned,

    http://en.wikipedia.org/wiki/Ext3

    Specifically

    <quote>
    "There is a limit of 31998 sub-directories per one directory, stemming from its limit of 32000 links per inode"
    </quote>


    There is actually a funny story behind my personal experience with this (which I shall shorten for
    brevity)

    After I typed "ls <tab>" (to get the list of files/directories via bash completion) one day in a directory,
    the system came back (after a while) and said (from memory),

    Display all 31998 possibilities? (y or n)

    Hmm, where have I seen a number like (or close to) that before ?

    Cheers / Frank
  • Marc Limotte at Oct 27, 2009 at 10:51 pm
    Just an FYI, found the solution to this problem.

    Apparently, it's an OS limit on the number of sub-directories that can be created in another directory. In this case, we had 31998 sub-directories under hadoop/userlogs/, so any new tasks would fail in Job Setup.
    From the unix command line, mkdir fails as well:
    $ mkdir hadoop/userlogs/testdir
    mkdir: cannot create directory `hadoop/userlogs/testdir': Too many links

    Difficult to track down because the Hadoop error message gives no hint whatsoever. And normally, you'd look in the userlog itself for more info, but in this case the userlog couldn't be created.

    Marc


    -----Original Message-----
    From: Marc Limotte
    Sent: Wednesday, September 23, 2009 11:06 AM
    To: 'core-user@hadoop.apache.org'
    Subject: Task process exit with nonzero status of 1

    I'm seeing this error when I try to run my job.

    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
    From what I can find by doing some Google searches, this means the mapred task JVM has crashed. Not many suggestions about what to do about it. Some suggestions about increasing max heap. I tried that, although I don't think that's the issue because it's not a particularly memory intensive process and I've even tried it with a super small input data set of only a few records. Still see the same issue.
    Can't find anything else in the logs. I don't think my task even started, because there are no user logs created at all. Seems to fail during Job Setup.

    A little more background. This job was working fine for weeks, running hourly, and then failed on Saturday morning and hasn't worked since. Obviously, I looked for something that changed at that point, but no one was working at that time... can't find anything that changed. I tried the job with different input data sets, doesn't seem to matter, unless I run it with no data at all. The job does run with no input data, but if I have even a few input records it fails-doesn't seem to matter which records. I suspected some corruption in HDFS, but I was able to extract the data from HDFS (hadoop dfs -get ...) and the data looks ok. I also copied this data set to our TEST cluster and ran the job there... and it WORKED!

    Ran one of our other jobs and it failed as well, so it doesn't seem to be job specific either; looks like every job fails the same way.

    Did a complete reboot of the cluster-no impact.

    We're using Hadoop 0.20.0, and Java 1.6 update 16 on CentOS 5.2 64bit.

    Any suggestions on what could be wrong or where to look for more information would be appreciated.



    Marc Limotte
    Feeva Technology

    PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedSep 23, '09 at 6:06p
activeOct 27, '09 at 10:51p
posts14
users7
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase