I am currently using hadoop cdh3u3 on a cluster with 16 nodes.
Previous night, I launched a job which must take a long time (about 10
hours) on this cluster, using nohup command because my ssh session
might be disconnected, which actually happended (I don't think it has
something to do with it, because I testde that point with a smaller
job and the job ended correctly).
This morning, when I returned in front of my screen, the job was not
ended, though all maps and reduces where finished. Weird thing: on the
Hadoop job-tracker web GUI main screen, it says map finished
13399/14400, reduce 143/143 for my job. But when clicking on the job
link, it displays 14400 map and 143 reduce completed, so all tasks of
my job. But the status still displays running, and I think Hadoop will
never consider the job done, but I don't know why, and I don't know
where to look.
Does anybody have any idea, or any experience that could help ?