I'm experiencing hung reducers, with the following symptoms:
Task Logs: 'task_200807230647_0008_r_000009_1'
stdout logs
stderr logs
syslog logs
red.ReduceTask: task_200807230647_0008_r_000009_1 Got 0 known map output
location(s); scheduling... 2008-07-24 07:56:11,064 INFO
org.apache.hadoop.mapred.ReduceTask: task_200807230647_0008_r_000009_1
Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts) 2008-07-24
07:56:16,073 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1 Need 6 map output(s) 2008-07-24
07:56:16,074 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1: Got 0 new map-outputs & 0 obsolete
map-outputs from tasktracker and 0 map-outputs from previous failures
2008-07-24 07:56:16,074 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1 Got 0 known map output location(s);
scheduling... 2008-07-24 07:56:16,074 INFO
org.apache.hadoop.mapred.ReduceTask: task_200807230647_0008_r_000009_1
Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts) 2008-07-24
07:56:21,083 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1 Need 6 map output(s) 2008-07-24
07:56:21,084 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1: Got 0 new map-outputs & 0 obsolete
map-outputs from tasktracker and 0 map-outputs from previous failures
2008-07-24 07:56:21,084 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1 Got 0 known map output location(s);
scheduling... 2008-07-24 07:56:21,084 INFO
org.apache.hadoop.mapred.ReduceTask: task_200807230647_0008_r_000009_1
Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts) 2008-07-24
07:56:26,093 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1 Need 6 map output(s) 2008-07-24
07:56:26,094 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1: Got 0 new map-outputs & 0 obsolete
map-outputs from tasktracker and 0 map-outputs from previous failures
2008-07-24 07:56:26,094 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1 Got 0 known map output location(s);
scheduling... 2008-07-24 07:56:26,094 INFO
org.apache.hadoop.mapred.ReduceTask: task_200807230647_0008_r_000009_1
Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts) 2008-07-24
07:56:31,103 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1 Need 6 map output(s) 2008-07-24
07:56:31,104 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1: Got 0 new map-outputs & 0 obsolete
map-outputs from tasktracker and 0 map-outputs from previous failures
2008-07-24 07:56:31,104 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1 Got 0 known map output location(s);
scheduling... 2008-07-24 07:56:31,104 INFO
org.apache.hadoop.mapred.ReduceTask: task_200807230647_0008_r_000009_1
Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts) 2008-07-24
07:56:36,113 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1 Need 6 map output(s) 2008-07-24
07:56:36,114 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1: Got 0 new map-outputs & 0 obsolete
map-outputs from tasktracker and 0 map-outputs from previous failures
2008-07-24 07:56:36,114 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1 Got 0 known map output location(s);
scheduling... 2008-07-24 07:56:36,114 INFO
org.apache.hadoop.mapred.ReduceTask: task_200807230647_0008_r_000009_1
Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts) 2008-07-24
07:56:41,123 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1 Need 6 map output(s) 2008-07-24
07:56:41,126 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1: Got 0 new map-outputs & 0 obsolete
map-outputs from tasktracker and 0 map-outputs from previous failures
2008-07-24 07:56:41,126 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1 Got 0 known map output location(s);
scheduling... 2008-07-24 07:56:41,126 INFO
org.apache.hadoop.mapred.ReduceTask: task_200807230647_0008_r_000009_1
Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts)
stdout logs
stderr logs
syslog logs
red.ReduceTask: task_200807230647_0008_r_000009_1 Got 0 known map output
location(s); scheduling... 2008-07-24 07:56:11,064 INFO
org.apache.hadoop.mapred.ReduceTask: task_200807230647_0008_r_000009_1
Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts) 2008-07-24
07:56:16,073 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1 Need 6 map output(s) 2008-07-24
07:56:16,074 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1: Got 0 new map-outputs & 0 obsolete
map-outputs from tasktracker and 0 map-outputs from previous failures
2008-07-24 07:56:16,074 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1 Got 0 known map output location(s);
scheduling... 2008-07-24 07:56:16,074 INFO
org.apache.hadoop.mapred.ReduceTask: task_200807230647_0008_r_000009_1
Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts) 2008-07-24
07:56:21,083 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1 Need 6 map output(s) 2008-07-24
07:56:21,084 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1: Got 0 new map-outputs & 0 obsolete
map-outputs from tasktracker and 0 map-outputs from previous failures
2008-07-24 07:56:21,084 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1 Got 0 known map output location(s);
scheduling... 2008-07-24 07:56:21,084 INFO
org.apache.hadoop.mapred.ReduceTask: task_200807230647_0008_r_000009_1
Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts) 2008-07-24
07:56:26,093 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1 Need 6 map output(s) 2008-07-24
07:56:26,094 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1: Got 0 new map-outputs & 0 obsolete
map-outputs from tasktracker and 0 map-outputs from previous failures
2008-07-24 07:56:26,094 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1 Got 0 known map output location(s);
scheduling... 2008-07-24 07:56:26,094 INFO
org.apache.hadoop.mapred.ReduceTask: task_200807230647_0008_r_000009_1
Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts) 2008-07-24
07:56:31,103 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1 Need 6 map output(s) 2008-07-24
07:56:31,104 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1: Got 0 new map-outputs & 0 obsolete
map-outputs from tasktracker and 0 map-outputs from previous failures
2008-07-24 07:56:31,104 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1 Got 0 known map output location(s);
scheduling... 2008-07-24 07:56:31,104 INFO
org.apache.hadoop.mapred.ReduceTask: task_200807230647_0008_r_000009_1
Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts) 2008-07-24
07:56:36,113 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1 Need 6 map output(s) 2008-07-24
07:56:36,114 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1: Got 0 new map-outputs & 0 obsolete
map-outputs from tasktracker and 0 map-outputs from previous failures
2008-07-24 07:56:36,114 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1 Got 0 known map output location(s);
scheduling... 2008-07-24 07:56:36,114 INFO
org.apache.hadoop.mapred.ReduceTask: task_200807230647_0008_r_000009_1
Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts) 2008-07-24
07:56:41,123 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1 Need 6 map output(s) 2008-07-24
07:56:41,126 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1: Got 0 new map-outputs & 0 obsolete
map-outputs from tasktracker and 0 map-outputs from previous failures
2008-07-24 07:56:41,126 INFO org.apache.hadoop.mapred.ReduceTask:
task_200807230647_0008_r_000009_1 Got 0 known map output location(s);
scheduling... 2008-07-24 07:56:41,126 INFO
org.apache.hadoop.mapred.ReduceTask: task_200807230647_0008_r_000009_1
Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts)
Notice how it needs 6 map outputs, all map tasks have finished, and it still
just hangs there.
The second speculative copy of that reducer task needs 14 map outputs with the
same messages :(
Other observations:
killing the reduce tasks via job -killtask ends up with restarting the job on
the same node, and curiously the new job gets jammed at the same position
(6/14 maps needed).
The only remedy to this problem seems to be a complete restart of the cluster
and reprocessing. That gets really boring with jobs that took a day to
process first :(
Andreas