FAQ
Dear all,

I use hadoop and do map-reduce work in our project. While the program's data
size is very large(we need to write about 100 GB data in reduce step), some
reducers failed(but not all reducers) during the reduce step, which
caused part of data failed to be written to HDFS.

When i check the errors, i found that there are different type of errors,
and i'm sure that these errors are not caused by our program bugs. Maybe
these errors are caused by hadoop potential bugs. I attached the error logs,
and anybody can explain the cause about this?

Any suggestions is really apreciated.
*The first type of error logs:*

2008-08-12 17:53:18,553 INFO org.apache.hadoop.fs.FileSystem:
Initialized InMemoryFileSystem:
ramfs://mapoutput1609592259/taskTracker/jobcache/job_200808121449_0113/task_200808121449_0113_r_000021_1/output/map_22.out-4
of size (in bytes): 78643200

2008-08-12 17:53:18,554 ERROR org.apache.hadoop.mapred.ReduceTask: Map
output copy failure: java.lang*.*IllegalStateException: Shutdown in progress
at java.lang.ApplicationShutdownHooks.add(ApplicationShutdownHooks.java:39)
at java.lang.Runtime.addShutdownHook(Runtime.java:192)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1293)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:203)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:831)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:764)

2008-08-12 17:53:18,576 WARN org.apache.hadoop.mapred.TaskRunner: Parent
died. Exiting task_200808121449_0113_r_000021_1
2008-08-12 17:53:18,597 INFO org.apache.hadoop.mapred.ReduceTask:
task_200808121449_0113_r_000021_1: Got 0 new map-outputs & 0 obsolete
map-outputs from tasktracker and 0 map-outputs from previous failures
2008-08-12 17:53:18,597 INFO org.apache.hadoop.mapred.ReduceTask:
task_200808121449_0113_r_000021_1 Got 14 known map output location(s);
scheduling...
2008-08-12 17:53:18,597 INFO org.apache.hadoop.mapred.ReduceTask:
task_200808121449_0113_r_000021_1 Scheduled 1 of 14 known outputs (0 slow
hosts and 13 dup hosts)
2008-08-12 17:53:18,597 INFO org.apache.hadoop.mapred.ReduceTask:
task_200808121449_0113_r_000021_1 Copying task_200808121449_0113_m_000024_0
output from linux-64z6.site.
2008-08-12 17:53:18,597 INFO org.apache.hadoop.mapred.ReduceTask: Task
task_200808121449_0113_r_000021_1: Failed fetch #1 from
task_200808121449_0113_m_000022_0
2008-08-12 17:53:18,597 WARN org.apache.hadoop.mapred.ReduceTask:
task_200808121449_0113_r_000021_1 adding host Kevin.localdomain to penalty
box, next contact in 4 seconds
2008-08-12 17:53:18,597 INFO org.apache.hadoop.mapred.ReduceTask:
task_200808121449_0113_r_000021_1 Need 16 map output(s)

*Then the second type of errors:*

2008-08-12 17:59:22,945 ERROR org.apache.hadoop.mapred.ReduceTask: Map
output copy failure: java.lang.NullPointerException
at
org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$InMemoryOutputStream.close(InMemoryFileSystem.java:173)
at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:59)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:79)
at
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.close(ChecksumFileSystem.java:332)
at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:59)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:79)
at
org.apache.hadoop.mapred.MapOutputLocation.getFile(MapOutputLocation.java:185)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:815)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:764)

*The third type of errors:*

*stderr logs*
Exception closing file
/map/matrix/similarity/_temporary/_task_200808121449_0113_r_000029_1/part-00029
org.apache.hadoop.ipc.RemoteException: java.io.IOException: Could not
complete write to file
/map/matrix/similarity/_temporary/_task_200808121449_0113_r_000029_1/part-00029
by DFSClient_task_200808121449_0113_r_000029_1

at org.apache.hadoop.dfs.NameNode.complete(NameNode.java:332)
at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)

at org.apache.hadoop.ipc.Client.call(Client.java:557)

at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
at org.apache.hadoop.dfs.$Proxy1.complete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)

at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at org.apache.hadoop.dfs.$Proxy1.complete(Unknown Source)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:2655)

at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:2576)
at org.apache.hadoop.dfs.DFSClient.close(DFSClient.java:220)
at
org.apache.hadoop.dfs.DistributedFileSystem.close(DistributedFileSystem.java:214)

at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:1324)
at org.apache.hadoop.fs.FileSystem.closeAll(FileSystem.java:224)
at org.apache.hadoop.fs.FileSystem$ClientFinalizer.run(FileSystem.java:209)

------------------------------


*syslog logs*
....Some info logs to get the intermediate map outputs.

2008-08-12 17:48:30,761 INFO org.apache.hadoop.mapred.ReduceTask:
task_200808121449_0113_r_000029_1 Merge of the 3 files in InMemoryFileSystem
complete. Local file is
/home/guangfeng/bin/hadoop-0.17.1/tmp/hadoop-guangfeng/mapred/local/taskTracker/jobcache/job_200808121449_0113/task_200808121449_0113_r_000029_1/output/map_10.out
2008-08-12 17:48:30,912 INFO
com.izenesoft.clustering.ap.similarity.SimilarityReducer: The file
similarity7511 exist. its size: 8055936
2008-08-12 17:48:33,560 WARN org.apache.hadoop.mapred.TaskTracker: Error
running child
java.io.IOException: Filesystem closed
at org.apache.hadoop.dfs.DFSClient.checkOpen(DFSClient.java:201)
at org.apache.hadoop.dfs.DFSClient.access$600(DFSClient.java:58)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:2364)
at
org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:155)
at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:58)
at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:39)
at java.io.DataOutputStream.writeInt(DataOutputStream.java:183)
at java.io.DataOutputStream.writeFloat(DataOutputStream.java:225)
at
com.izenesoft.clustering.ap.similarity.SimilarityReducer.reduce(SimilarityReducer.java:80)
at
com.izenesoft.clustering.ap.similarity.SimilarityReducer.reduce(SimilarityReducer.java:1)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
------------------------------



--
Guangfeng Jin

Software Engineer

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedAug 12, '08 at 2:33p
activeAug 12, '08 at 2:33p
posts1
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

晋光峰: 1 post

People

Translate

site design / logo © 2022 Grokbase