FAQ
I'm having my job fail on me just before finishing, and after
processing for a few hours. All the relevant log info I can find is
below, and the only thing I can see as potentially being a problem is
that the available space on one of the devices is pretty low.
Additionally, 2 map jobs failed because of malformed input, but I
would think that hadoop would just ignore those two jobs when trying
to complete the reduce phase. Does anyone recognize anything in
these logs that would help me identify the problem?

I had a task fail on me with the following message:

07/09/24 15:12:15 INFO mapred.JobClient: Task Id :
task_0001_m_004154_1, Status : FAILED
07/09/24 15:12:20 INFO mapred.JobClient: Task Id :
task_0001_m_010939_1, Status : FAILED
07/09/24 15:12:24 INFO mapred.JobClient: Task Id :
task_0001_m_004154_2, Status : FAILED
07/09/24 15:12:31 INFO mapred.JobClient: Task Id :
task_0001_m_010939_2, Status : FAILED
07/09/24 15:12:35 INFO mapred.JobClient: map 100% reduce 100%
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:
604)
at SearchCount.main(SearchCount.java:168)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke
(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.util.ProgramDriver
$ProgramDescription.invoke(ProgramDriver.java:69)
at org.apache.hadoop.util.ProgramDriver.driver
(ProgramDriver.java:140)
at SearchDriver.main(SearchDriver.java:34)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke
(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.util.RunJar.main(RunJar.java:155)

The logs for all of the tasktrackers end in

2007-09-24 15:13:34,205 WARN org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 50050, call getMapCompletionEvents(job_0001, 17180, 50)
from 17.231.8.30:63506: output error
java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.ensureWriteOpen
(SocketChannelImpl.java:125)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:
294)
at
org.apache.hadoop.ipc.SocketChannelOutputStream.flushBuffer
(SocketChannelOutputStream.java:108)
at org.apache.hadoop.ipc.SocketChannelOutputStream.write
(SocketChannelOutputStream.java:89)
at java.io.BufferedOutputStream.flushBuffer
(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush
(BufferedOutputStream.java:123)
at java.io.DataOutputStream.flush(DataOutputStream.java:106)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:592)

with only the Server handler number differing. Finally, the logs
from the jobtracker app for the failed reduce jobs show:

2007-09-24 15:12:36,401 WARN org.apache.hadoop.mapred.ReduceTask:
task_0001_r_000000_0 copy failed: task_0001_m_017073_0 from 17.231.8.33
2007-09-24 15:12:36,403 WARN org.apache.hadoop.mapred.ReduceTask:
java.io.IOException: Server returned HTTP response code: 500 for URL:
http://17.231.8.33:50060/mapOutput?map=task_0001_m_017073_0&reduce=0
at sun.net.www.protocol.http.HttpURLConnection.getInputStream
(HttpURLConnection.java:1152)
at org.apache.hadoop.mapred.MapOutputLocation.getFile
(MapOutputLocation.java:206)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier
$MapOutputCopier.copyOutput(ReduceTask.java:680)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier
$MapOutputCopier.run(ReduceTask.java:641)

Though they actually continue a few lines after that:

2007-09-24 15:12:37,393 INFO org.apache.hadoop.mapred.ReduceTask:
task_0001_r_000000_0 Got 0 new map outputs from tasktracker and 0 map
outputs from previous failures
2007-09-24 15:12:37,393 INFO org.apache.hadoop.mapred.ReduceTask:
task_0001_r_000000_0 Got 88 known map output location(s); scheduling...
2007-09-24 15:12:37,394 INFO org.apache.hadoop.mapred.ReduceTask:
task_0001_r_000000_0 Scheduled 2 of 88 known outputs (0 slow hosts
and 86 dup hosts)

Thanks.

Ross Boucher
boucher@apple.com

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 3 | next ›
Discussion Overview
groupcommon-user @
categorieshadoop
postedSep 19, '07 at 4:11p
activeSep 25, '07 at 7:07a
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Ross Boucher: 2 posts Devaraj Das: 1 post

People

Translate

site design / logo © 2022 Grokbase