FAQ

Job is never ending despite all tasks done

Guillaume
Jun 13, 2012 at 1:26 pm
Hi all,

I am currently using hadoop cdh3u3 on a cluster with 16 nodes.
Previous night, I launched a job which must take a long time (about 10
hours) on this cluster, using nohup command because my ssh session
might be disconnected, which actually happended (I don't think it has
something to do with it, because I testde that point with a smaller
job and the job ended correctly).
This morning, when I returned in front of my screen, the job was not
ended, though all maps and reduces where finished. Weird thing: on the
Hadoop job-tracker web GUI main screen, it says map finished
13399/14400, reduce 143/143 for my job. But when clicking on the job
link, it displays 14400 map and 143 reduce completed, so all tasks of
my job. But the status still displays running, and I think Hadoop will
never consider the job done, but I don't know why, and I don't know
where to look.

Does anybody have any idea, or any experience that could help ?

Thanks.
Guillaume Eynard.
reply

Search Discussions

6 responses

  • Harsh J at Jun 13, 2012 at 3:08 pm
    Hi,

    In this running job page, is the "Job Cleanup" task running? You can
    usually visit the Job Cleanup task page via:

    http://hostname:50030/jobtasks.jsp?jobid=JOBID&type=cleanup&pagenum=1
    On Wed, Jun 13, 2012 at 6:56 PM, Guillaume wrote:
    Hi all,

    I am currently using hadoop cdh3u3 on a cluster with 16 nodes.
    Previous night, I launched a job which must take a long time (about 10
    hours) on this cluster, using nohup command because my ssh session
    might be disconnected, which actually happended (I don't think it has
    something to do with it, because I testde that point with a smaller
    job and the job ended correctly).
    This morning, when I returned in front of my screen, the job was not
    ended, though all maps and reduces where finished. Weird thing: on the
    Hadoop job-tracker web GUI main screen, it says map finished
    13399/14400, reduce 143/143 for my job. But when clicking on the job
    link, it displays 14400 map and 143 reduce completed, so all tasks of
    my job. But the status still displays running, and I think Hadoop will
    never consider the job done, but I don't know why, and I don't know
    where to look.

    Does anybody have any idea, or any experience that could help ?

    Thanks.
    Guillaume Eynard.


    --
    Harsh J
  • Guillaume at Jun 13, 2012 at 3:47 pm
    Thank you for your answer.

    I actually had to clean the job, and I completly forgot to save
    anything, but I'm pretty sure Status was 'Running' and Job Cleanup was
    'Pending'. As if no signals had been sent, or no one has detected that
    all tasks were done.
    On 13 juin, 16:07, Harsh J wrote:
    Hi,

    In this running job page, is the "Job Cleanup" task running? You can
    usually visit the Job Cleanup task page via:

    http://hostname:50030/jobtasks.jsp?jobid=JOBID&type=cleanup&pagenum=1








    On Wed, Jun 13, 2012 at 6:56 PM, Guillaume wrote:
    Hi all,
    I am currently using hadoop cdh3u3 on a cluster with 16 nodes.
    Previous night, I launched a job which must take a long time (about 10
    hours) on this cluster, using nohup command because my ssh session
    might be disconnected, which actually happended (I don't think it has
    something to do with it, because I testde that point with a smaller
    job and the job ended correctly).
    This morning, when I returned in front of my screen, the job was not
    ended, though all maps and reduces where finished. Weird thing: on the
    Hadoop job-tracker web GUI main screen, it says map finished
    13399/14400, reduce 143/143 for my job. But when clicking on the job
    link, it displays 14400 map and 143 reduce completed, so all tasks of
    my job. But the status still displays running, and I think Hadoop will
    never consider the job done, but I don't know why, and I don't know
    where to look.
    Does anybody have any idea, or any experience that could help ?
    Thanks.
    Guillaume Eynard.
    --
    Harsh J
  • Harsh J at Jun 13, 2012 at 3:53 pm
    Hi,

    Ok, can you send across a "grep <Job ID (minus the "job_" part)> <JT
    Log>" via pastebin.com or
    so, to us?
    On Wed, Jun 13, 2012 at 9:16 PM, Guillaume wrote:
    Thank you for your answer.

    I actually had to clean the job, and I completly forgot to save
    anything, but I'm pretty sure Status was 'Running' and Job Cleanup was
    'Pending'. As if no signals had been sent, or no one has detected that
    all tasks were done.
    On 13 juin, 16:07, Harsh J wrote:
    Hi,

    In this running job page, is the "Job Cleanup" task running? You can
    usually visit the Job Cleanup task page via:

    http://hostname:50030/jobtasks.jsp?jobid=JOBID&type=cleanup&pagenum=1








    On Wed, Jun 13, 2012 at 6:56 PM, Guillaume wrote:
    Hi all,
    I am currently using hadoop cdh3u3 on a cluster with 16 nodes.
    Previous night, I launched a job which must take a long time (about 10
    hours) on this cluster, using nohup command because my ssh session
    might be disconnected, which actually happended (I don't think it has
    something to do with it, because I testde that point with a smaller
    job and the job ended correctly).
    This morning, when I returned in front of my screen, the job was not
    ended, though all maps and reduces where finished. Weird thing: on the
    Hadoop job-tracker web GUI main screen, it says map finished
    13399/14400, reduce 143/143 for my job. But when clicking on the job
    link, it displays 14400 map and 143 reduce completed, so all tasks of
    my job. But the status still displays running, and I think Hadoop will
    never consider the job done, but I don't know why, and I don't know
    where to look.
    Does anybody have any idea, or any experience that could help ?
    Thanks.
    Guillaume Eynard.
    --
    Harsh J


    --
    Harsh J
  • Guillaume at Jun 13, 2012 at 4:09 pm
    I will try to do that.

    I have taken a look at the JobTracker and NameNode logs, I can see no
    JOB_CLEANUP tasks launched into the JT logs.

    Moreover, I see that JT logs ends at 21:19 or so, and NN logs continue
    till 21:51, with writing on HDFS by the job tasks...

    I get the log files and share them with you.
    On 13 juin, 17:53, Harsh J wrote:
    Hi,

    Ok, can you send across a "grep <Job ID (minus the "job_" part)> <JT
    Log>" via pastebin.com or
    so, to us?








    On Wed, Jun 13, 2012 at 9:16 PM, Guillaume wrote:
    Thank you for your answer.
    I actually had to clean the job, and I completly forgot to save
    anything, but I'm pretty sure Status was 'Running' and Job Cleanup was
    'Pending'. As if no signals had been sent, or no one has detected that
    all tasks were done.
    On 13 juin, 16:07, Harsh J wrote:
    Hi,
    In this running job page, is the "Job Cleanup" task running? You can
    usually visit the Job Cleanup task page via:
    http://hostname:50030/jobtasks.jsp?jobid=JOBID&type=cleanup&pagenum=1
    On Wed, Jun 13, 2012 at 6:56 PM, Guillaume wrote:
    Hi all,
    I am currently using hadoop cdh3u3 on a cluster with 16 nodes.
    Previous night, I launched a job which must take a long time (about 10
    hours) on this cluster, using nohup command because my ssh session
    might be disconnected, which actually happended (I don't think it has
    something to do with it, because I testde that point with a smaller
    job and the job ended correctly).
    This morning, when I returned in front of my screen, the job was not
    ended, though all maps and reduces where finished. Weird thing: on the
    Hadoop job-tracker web GUI main screen, it says map finished
    13399/14400, reduce 143/143 for my job. But when clicking on the job
    link, it displays 14400 map and 143 reduce completed, so all tasks of
    my job. But the status still displays running, and I think Hadoop will
    never consider the job done, but I don't know why, and I don't know
    where to look.
    Does anybody have any idea, or any experience that could help ?
    Thanks.
    Guillaume Eynard.
    --
    Harsh J
    --
    Harsh J
  • Guillaume at Jun 13, 2012 at 4:29 pm
    Since the result is much too large. I directly put the log files into
    a drop box, here are the link to the files :

    http://dl.dropbox.com/u/84911303/boulot/hadoop-hadoop-jobtracker-job-tracker.internal.saga.cnes.log.2012-06-12
    http://dl.dropbox.com/u/84911303/boulot/hadoop-hadoop-namenode-name-node.internal.saga.cnes.log.2012-06-12
    On 13 juin, 18:09, Guillaume wrote:
    I will try to do that.

    I have taken a look at the JobTracker and NameNode logs, I can see no
    JOB_CLEANUP tasks launched into the JT logs.

    Moreover, I see that JT logs ends at 21:19 or so, and NN logs continue
    till 21:51, with writing on HDFS by the job tasks...

    I get the log files and share them with you.

    On 13 juin, 17:53, Harsh J wrote:






    Hi,
    Ok, can you send across a "grep <Job ID (minus the "job_" part)> <JT
    Log>" via pastebin.com or
    so, to us?
    On Wed, Jun 13, 2012 at 9:16 PM, Guillaume wrote:
    Thank you for your answer.
    I actually had to clean the job, and I completly forgot to save
    anything, but I'm pretty sure Status was 'Running' and Job Cleanup was
    'Pending'. As if no signals had been sent, or no one has detected that
    all tasks were done.
    On 13 juin, 16:07, Harsh J wrote:
    Hi,
    In this running job page, is the "Job Cleanup" task running? You can
    usually visit the Job Cleanup task page via:
    http://hostname:50030/jobtasks.jsp?jobid=JOBID&type=cleanup&pagenum=1
    On Wed, Jun 13, 2012 at 6:56 PM, Guillaume wrote:
    Hi all,
    I am currently using hadoop cdh3u3 on a cluster with 16 nodes.
    Previous night, I launched a job which must take a long time (about 10
    hours) on this cluster, using nohup command because my ssh session
    might be disconnected, which actually happended (I don't think it has
    something to do with it, because I testde that point with a smaller
    job and the job ended correctly).
    This morning, when I returned in front of my screen, the job was not
    ended, though all maps and reduces where finished. Weird thing: on the
    Hadoop job-tracker web GUI main screen, it says map finished
    13399/14400, reduce 143/143 for my job. But when clicking on the job
    link, it displays 14400 map and 143 reduce completed, so all tasks of
    my job. But the status still displays running, and I think Hadoop will
    never consider the job done, but I don't know why, and I don't know
    where to look.
    Does anybody have any idea, or any experience that could help ?
    Thanks.
    Guillaume Eynard.
    --
    Harsh J
    --
    Harsh J
  • Guillaume at Jun 14, 2012 at 1:43 pm
    Any other suggestions ?

    I will probably perform the test again during the next week, maybe I
    have some further information then, or maybe it will work correctly
    this time...
    On 13 juin, 18:29, Guillaume wrote:
    Since the result is much too large. I directly put the log files into
    a drop box, here are the link to the files :

    http://dl.dropbox.com/u/84911303/boulot/hadoop-hadoop-jobtracker-job-...http://dl.dropbox.com/u/84911303/boulot/hadoop-hadoop-namenode-name-n...

    On 13 juin, 18:09, Guillaume wrote:






    I will try to do that.
    I have taken a look at the JobTracker and NameNode logs, I can see no
    JOB_CLEANUP tasks launched into the JT logs.
    Moreover, I see that JT logs ends at 21:19 or so, and NN logs continue
    till 21:51, with writing on HDFS by the job tasks...
    I get the log files and share them with you.
    On 13 juin, 17:53, Harsh J wrote:

    Hi,
    Ok, can you send across a "grep <Job ID (minus the "job_" part)> <JT
    Log>" via pastebin.com or
    so, to us?
    On Wed, Jun 13, 2012 at 9:16 PM, Guillaume wrote:
    Thank you for your answer.
    I actually had to clean the job, and I completly forgot to save
    anything, but I'm pretty sure Status was 'Running' and Job Cleanup was
    'Pending'. As if no signals had been sent, or no one has detected that
    all tasks were done.
    On 13 juin, 16:07, Harsh J wrote:
    Hi,
    In this running job page, is the "Job Cleanup" task running? You can
    usually visit the Job Cleanup task page via:
    http://hostname:50030/jobtasks.jsp?jobid=JOBID&type=cleanup&pagenum=1
    On Wed, Jun 13, 2012 at 6:56 PM, Guillaume wrote:
    Hi all,
    I am currently using hadoop cdh3u3 on a cluster with 16 nodes.
    Previous night, I launched a job which must take a long time (about 10
    hours) on this cluster, using nohup command because my ssh session
    might be disconnected, which actually happended (I don't think it has
    something to do with it, because I testde that point with a smaller
    job and the job ended correctly).
    This morning, when I returned in front of my screen, the job was not
    ended, though all maps and reduces where finished. Weird thing: on the
    Hadoop job-tracker web GUI main screen, it says map finished
    13399/14400, reduce 143/143 for my job. But when clicking on the job
    link, it displays 14400 map and 143 reduce completed, so all tasks of
    my job. But the status still displays running, and I think Hadoop will
    never consider the job done, but I don't know why, and I don't know
    where to look.
    Does anybody have any idea, or any experience that could help ?
    Thanks.
    Guillaume Eynard.
    --
    Harsh J
    --
    Harsh J

Related Discussions

Discussion Navigation
viewthread | post

2 users in discussion

Guillaume: 5 posts Harsh J: 2 posts