FAQ
One of my colleagues has noticed this problem for a while, and now it's
biting me. Jobs seem to be failing before every really starting. It seems
to be limited (so far) to running in pseudo-distributed mode, since that's
where he saw the problem and where I'm now seeing it; it hasn't come up on
our cluster (yet).

So here's what happens:

$ java -classpath $MY_CLASSPATH MyLauncherClass -conf my-config.xml -D
extra.properties=extravalues
...
launcher output
...
11/08/26 10:35:54 INFO input.FileInputFormat: Total input paths to process
: 2
11/08/26 10:35:54 INFO mapred.JobClient: Running job:
job_201108261034_0001
11/08/26 10:35:55 INFO mapred.JobClient: map 0% reduce 0%

and it just sits there. If I look at the jobtracker's web view the number
of submissions increments, but nothing shows up as a running, completed,
failed, or retired job. If I use the command line probe I find

$ hadoop job -list
1 jobs currently running
JobId State StartTime UserName Priority SchedulingInfo
job_201108261034_0001 4 1314369354247 hdfs NORMAL NA

If I try to kill this job, nothing happens; it remains in the list with
state 4 (failed?). I've tried telling the mapper JVM to suspend so I can
find it in netstat and attach a debugger from IDEA, but it seems that the
job never gets to the point of even spinning up a JVM to run the mapper.

Any ideas what might be going wrong? Thanks.

Search Discussions

  • Ramya Sunil at Aug 26, 2011 at 6:47 pm
    Hi John,

    How many tasktrackers do you have? Can you check if your tasktrackers are
    running and the total available map and reduce capacity in your cluster?
    Can you also post the configuration of the scheduler you are using? You
    might also want to check the jobtracker logs. It would help in further
    debugging.

    Thanks
    Ramya
    On Fri, Aug 26, 2011 at 7:50 AM, John Armstrong wrote:

    One of my colleagues has noticed this problem for a while, and now it's
    biting me. Jobs seem to be failing before every really starting. It seems
    to be limited (so far) to running in pseudo-distributed mode, since that's
    where he saw the problem and where I'm now seeing it; it hasn't come up on
    our cluster (yet).

    So here's what happens:

    $ java -classpath $MY_CLASSPATH MyLauncherClass -conf my-config.xml -D
    extra.properties=extravalues
    ...
    launcher output
    ...
    11/08/26 10:35:54 INFO input.FileInputFormat: Total input paths to process
    : 2
    11/08/26 10:35:54 INFO mapred.JobClient: Running job:
    job_201108261034_0001
    11/08/26 10:35:55 INFO mapred.JobClient: map 0% reduce 0%

    and it just sits there. If I look at the jobtracker's web view the number
    of submissions increments, but nothing shows up as a running, completed,
    failed, or retired job. If I use the command line probe I find

    $ hadoop job -list
    1 jobs currently running
    JobId State StartTime UserName Priority
    SchedulingInfo
    job_201108261034_0001 4 1314369354247 hdfs NORMAL NA

    If I try to kill this job, nothing happens; it remains in the list with
    state 4 (failed?). I've tried telling the mapper JVM to suspend so I can
    find it in netstat and attach a debugger from IDEA, but it seems that the
    job never gets to the point of even spinning up a JVM to run the mapper.

    Any ideas what might be going wrong? Thanks.
  • John Armstrong at Aug 26, 2011 at 6:51 pm

    On Fri, 26 Aug 2011 11:46:42 -0700, Ramya Sunil wrote:
    How many tasktrackers do you have? Can you check if your tasktrackers are
    running and the total available map and reduce capacity in your cluster?
    In pseudo-distributed there's one tasktracker, which is running, and the
    total map and reduce capacity is reported by the jobtracker at 6 slots
    each.
    Can you also post the configuration of the scheduler you are using? You
    might also want to check the jobtracker logs. It would help in further
    debugging.
    Any ideas what I should be looking for that could cause a job to list as
    failed before launching any task JVMs and without reporting back to the
    launcher that it's failed? Am I correct in interpreting "state 4" as
    "failure"?
  • Ramya Sunil at Aug 26, 2011 at 7:21 pm

    On Fri, Aug 26, 2011 at 11:50 AM, John Armstrong wrote:
    On Fri, 26 Aug 2011 11:46:42 -0700, Ramya Sunil wrote:
    How many tasktrackers do you have? Can you check if your tasktrackers are
    running and the total available map and reduce capacity in your cluster?
    In pseudo-distributed there's one tasktracker, which is running, and the
    total map and reduce capacity is reported by the jobtracker at 6 slots
    each.
    Can you also post the configuration of the scheduler you are using? You
    might also want to check the jobtracker logs. It would help in further
    debugging.
    Any ideas what I should be looking for that could cause a job to list as
    failed before launching any task JVMs and without reporting back to the
    launcher that it's failed? Am I correct in interpreting "state 4" as
    "failure"?
    State "4" indicates that the job is still in the PREP state and not a job
    failure. We have seen these kind of errors when either the cluster does not
    have tasktrackers to run the tasks or when the queue to which the job is
    submitted does not have sufficient capacity.
    In the logs, if you are able to see "Adding task (MAP/REDUCE)
    <attemptID>...for tracker 'tracker_<TT_hostname>'", that means the task was
    scheduled to be run on the TT. One can then look at the TT logs to check why
    the tasks did not begin execution.
    If you do not see this log message, that implies the cluster does not have
    enough resources due to which JT is unable to schedule the tasks.

    Thanks
    Ramya
  • John Armstrong at Aug 26, 2011 at 8:19 pm

    On Fri, 26 Aug 2011 12:20:47 -0700, Ramya Sunil wrote:
    Can you also post the configuration of the scheduler you are using? You
    might also want to check the jobtracker logs. It would help in further
    debugging.
    Where would I find the scheduler configuration? I haven't changed it, so
    I assume I'm using the default.

    This is what I see in the jobtracker logs when I submit the job:

    2011-08-26 16:11:19,164 INFO org.apache.hadoop.mapred.JobTracker: Job
    job_201108261610_0001 added successfully for user 'hdfs' to queue 'default'
    2011-08-26 16:11:19,164 INFO org.apache.hadoop.mapred.JobTracker:
    Initializing job_201108261610_0001
    2011-08-26 16:11:19,164 INFO org.apache.hadoop.mapred.JobInProgress:
    Initializing job_201108261610_0001
    2011-08-26 16:11:19,165 INFO org.apache.hadoop.mapred.AuditLogger:
    USER=hdfs IP=127.0.0.1 OPERATION=SUBMIT_JOB TARGET=job_201108261610_0001 RESULT=SUCCESS

    Nothing shows up in the tasktracker logs when I submit the job.
    State "4" indicates that the job is still in the PREP state and not a job
    failure. We have seen these kind of errors when either the cluster does not
    have tasktrackers to run the tasks or when the queue to which the job is
    submitted does not have sufficient capacity.
    So it's possible something has gone wrong with the job queue? Is it
    possible something's stuck in there? How would I find it/clean it out?
    If you do not see this log message, that implies the cluster does not have
    enough resources due to which JT is unable to schedule the tasks.
    I do see this line in the TaskTracker logs; it might have something to do
    with the problem, but I have no idea how to fix it.

    2011-08-26 16:14:41,966 WARN org.apache.hadoop.mapred.TaskTracker:
    TaskTracker's totalMemoryAllottedForTasks is -1. TaskMemoryManager is
    disabled.

    Thanks for the pointers.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedAug 26, '11 at 2:51p
activeAug 26, '11 at 8:19p
posts5
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

John Armstrong: 3 posts Ramya Sunil: 2 posts

People

Translate

site design / logo © 2022 Grokbase