FAQ
One more clue.

If I change "mapred.job.tracker" to "local" on this cluster, then the I can run the job successfully. I guess in this case it doesn't have to launch the child JVM, which is the thing that is failing.


Marc

-----Original Message-----
From: Marc Limotte
Sent: Thursday, September 24, 2009 2:19 PM
To: [email protected]
Cc: Deept Kumar
Subject: RE: Task process exit with nonzero status of 1

Added DEBUG, but don't see anything interesting. The only new tasktracker log entries are about receiving a heartbeat from the JobTracker, or about cleaning up the task afterward.

Tried the strace. It produces over 6mm lines of output. Not sure what I should be looking for.

I'm thinking I might try the Cloudera Hadoop 0.20.0 distribution and see if the behavior is any different.

Marc

-----Original Message-----
From: Todd Lipcon
Sent: Thursday, September 24, 2009 11:28 AM
To: [email protected]
Subject: Re: Task process exit with nonzero status of 1

Odd...

Try bumping up the logs to debug level on that tasktracker, see what you can
determine?

You could also strace -f -p <tasktracker pid> -o /tmp/tt_log and then grep
through those logs later to see what might be going on.

-Todd
On Thu, Sep 24, 2009 at 11:24 AM, Marc Limotte wrote:

Hi Todd.

No userlogs seem to be created. I'm guessing, because the map task never
actually starts.

I don't see any other errors in the tasktracker log, other than the one I
put in the first message ("java.io.IOException: Task process exit with
nonzero status of 1..."). I've included the output from one of the nodes'
tasktracker logs below.

Any other suggestions?

Marc

2009-09-24 18:15:36,955 INFO org.apache.hadoop.mapred.TaskTracker:
LaunchTaskAction (registerTask): attempt_200909221656_0006_m_000003_0 task's
state:UNASSIGNED
2009-09-24 18:15:36,959 INFO org.apache.hadoop.mapred.TaskTracker: Trying
to launch : attempt_200909221656_0006_m_000003_0
2009-09-24 18:15:36,960 INFO org.apache.hadoop.mapred.TaskTracker: In
TaskLauncher, current free slots : 2 and trying to launch
attempt_200909221656_0006_m_000003_02009-09-24 18:15:37,483 INFO
org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID:
jvm_200909221656_0006_m_-145
18051982009-09-24 18:15:37,483 INFO org.apache.hadoop.mapred.JvmManager:
JVM Runner jvm_200909221656_0006_m_-1451805198 spawned.
2009-09-24 18:15:37,511 INFO org.apache.hadoop.mapred.JvmManager: JVM :
jvm_200909221656_0006_m_-1451805198 exited. Number of t
asks it ran: 02009-09-24 18:15:37,512 WARN
org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_m_000003_0
Child Error
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
2009-09-24 18:15:40,518 INFO org.apache.hadoop.mapred.TaskRunner:
attempt_200909221656_0006_m_000003_0 done; removing files.
2009-09-24 18:15:40,519 INFO org.apache.hadoop.mapred.TaskTracker:
addFreeSlot : current free slots : 2
2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker:
LaunchTaskAction (registerTask): attempt_200909221656_0006_r
_000001_0 task's state:UNASSIGNED2009-09-24 18:15:42,964 INFO
org.apache.hadoop.mapred.TaskTracker: Trying to launch :
attempt_200909221656_0006_r_000001_0
2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker: In
TaskLauncher, current free slots : 2 and trying to launch
attempt_200909221656_0006_r_000001_02009-09-24 18:15:43,000 INFO
org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID:
jvm_200909221656_0006_r_7885
020722009-09-24 18:15:43,000 INFO org.apache.hadoop.mapred.JvmManager: JVM
Runner jvm_200909221656_0006_r_788502072 spawned.
2009-09-24 18:15:43,026 INFO org.apache.hadoop.mapred.JvmManager: JVM :
jvm_200909221656_0006_r_788502072 exited. Number of tas
ks it ran: 0
2009-09-24 18:15:43,026 WARN org.apache.hadoop.mapred.TaskRunner:
attempt_200909221656_0006_r_000001_0 Child Error
java.io.IOException: Task process exit with nonzero status of 1.
at
org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)2009-09-24
18:15:46,034 INFO org.apache.hadoop.mapred.TaskRunner:
attempt_200909221656_0006_r_000001_0 done; removing files.
2009-09-24 18:15:46,039 INFO org.apache.hadoop.mapred.TaskTracker:
addFreeSlot : current free slots : 2
2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker:
LaunchTaskAction (registerTask): attempt_200909221656_0006_m
_000002_1 task's state:UNASSIGNED
2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: Trying
to launch : attempt_200909221656_0006_m_000002_1
2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: In
TaskLauncher, current free slots : 2 and trying to launch
attempt_200909221656_0006_m_000002_1
2009-09-24 18:16:34,060 INFO org.apache.hadoop.mapred.JvmManager: In
JvmRunner constructed JVM ID: jvm_200909221656_0006_m_-2120349138
2009-09-24 18:16:34,060 INFO org.apache.hadoop.mapred.JvmManager: JVM
Runner jvm_200909221656_0006_m_-2120349138 spawned.
2009-09-24 18:16:34,086 INFO org.apache.hadoop.mapred.JvmManager: JVM :
jvm_200909221656_0006_m_-2120349138 exited. Number of tasks it ran: 0
2009-09-24 18:16:34,087 WARN org.apache.hadoop.mapred.TaskRunner:
attempt_200909221656_0006_m_000002_1 Child Error
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
2009-09-24 18:16:37,094 INFO org.apache.hadoop.mapred.TaskRunner:
attempt_200909221656_0006_m_000002_1 done; removing files.
2009-09-24 18:16:37,095 INFO org.apache.hadoop.mapred.TaskTracker:
addFreeSlot : current free slots : 2
2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker:
LaunchTaskAction (registerTask): attempt_200909221656_0006_r_000000_1 task's
state:UNASSIGNED
2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker: Trying
to launch : attempt_200909221656_0006_r_000000_1
2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker: In
TaskLauncher, current free slots : 2 and trying to launch
attempt_200909221656_0006_r_000000_1
2009-09-24 18:16:40,057 INFO org.apache.hadoop.mapred.JvmManager: In
JvmRunner constructed JVM ID: jvm_200909221656_0006_r_-1417908695
2009-09-24 18:16:40,057 INFO org.apache.hadoop.mapred.JvmManager: JVM
Runner jvm_200909221656_0006_r_-1417908695 spawned.
2009-09-24 18:16:40,084 WARN org.apache.hadoop.mapred.TaskRunner:
attempt_200909221656_0006_r_000000_1 Child Error
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
2009-09-24 18:16:40,084 INFO org.apache.hadoop.mapred.JvmManager: JVM :
jvm_200909221656_0006_r_-1417908695 exited. Number of tasks it ran: 0
2009-09-24 18:16:43,091 INFO org.apache.hadoop.mapred.TaskRunner:
attempt_200909221656_0006_r_000000_1 done; removing files.
2009-09-24 18:16:43,092 INFO org.apache.hadoop.mapred.TaskTracker:
addFreeSlot : current free slots : 2
2009-09-24 18:17:07,057 INFO org.apache.hadoop.mapred.TaskTracker: Received
'KillJobAction' for job: job_200909221656_0006


-----Original Message-----
From: Todd Lipcon
Sent: Thursday, September 24, 2009 10:19 AM
To: [email protected]
Subject: Re: Task process exit with nonzero status of 1

Hi Marc,

Exit status 1 usually means some kind of controlled exit by the mapreduce
child task. Things like JVM crashes usually are indicated by other exit
codes (134 seems to be the code most commonly reported).

If you look at the stderr and stdout from your task (in the userlogs/
directory on the task tracker that ran them) do you see any output?
Additionally, is there anything in the logs for the task tracker itself?
That log is hadoop.log.dir/hadoop-<username>-tasktracker*log

If that log is pretty long, try grepping for WARN, ERROR, or Exception

-Todd
On Thu, Sep 24, 2009 at 9:57 AM, Marc Limotte wrote:

Thanks for the suggestion, Edward. I only upgraded the JVM after the
problem occurred to see if it would help, but it made no difference.

Marc

-----Original Message-----
From: Edward Capriolo
Sent: Thursday, September 24, 2009 7:50 AM
To: [email protected]
Subject: Re: Task process exit with nonzero status of 1
On Wed, Sep 23, 2009 at 2:06 PM, Marc Limotte wrote:
I'm seeing this error when I try to run my job.

java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)

From what I can find by doing some Google searches, this means the
mapred
task JVM has crashed. Not many suggestions about what to do about it. Some
suggestions about increasing max heap. I tried that, although I don't think
that's the issue because it's not a particularly memory intensive process
and I've even tried it with a super small input data set of only a few
records. Still see the same issue.
Can't find anything else in the logs. I don't think my task even
started, because there are no user logs created at all. Seems to fail during
Job Setup.
A little more background. This job was working fine for weeks, running
hourly, and then failed on Saturday morning and hasn't worked since.
Obviously, I looked for something that changed at that point, but no one
was working at that time... can't find anything that changed. I tried the
job with different input data sets, doesn't seem to matter, unless I run it
with no data at all. The job does run with no input data, but if I have
even a few input records it fails-doesn't seem to matter which records. I
suspected some corruption in HDFS, but I was able to extract the data from
HDFS (hadoop dfs -get ...) and the data looks ok. I also copied this data
set to our TEST cluster and ran the job there... and it WORKED!
Ran one of our other jobs and it failed as well, so it doesn't seem to
be
job specific either; looks like every job fails the same way.
Did a complete reboot of the cluster-no impact.

We're using Hadoop 0.20.0, and Java 1.6 update 16 on CentOS 5.2 64bit.

Any suggestions on what could be wrong or where to look for more
information would be appreciated.


Marc Limotte
Feeva Technology

PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT
FOR
ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A
COMMUNICATION
PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE,
DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY
PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND
PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
Just a shot in the dark....

Did you update java recently


http://www.koopman.me/2009/04/hadoop-0183-could-not-create-the-java-virtual-machine/
PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR
ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A
COMMUNICATION
PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE,
DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY
PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND
PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR
ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION
PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE,
DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY
PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND
PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.

PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.

PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 10 of 14 | next ›
Discussion Overview
groupcommon-user @
categorieshadoop
postedSep 23, '09 at 6:06p
activeOct 27, '09 at 10:51p
posts14
users7
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2023 Grokbase