All
I am encountering the following out-of-memory error during the reduce phase of a large job.
Map output copy failure : java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1669)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1529)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1378)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1310)
I tried increasing the memory available using mapped.child.java.opts but that only helps a little. The reduce task eventually fails again. Here are some relevant job configuration details:
1. The input to the mappers is about 2.5 TB (LZO compressed). The mappers filter out a small percentage of the input ( less than 1%).
2. I am currently using 12 reducers and I can't increase this count by much to ensure availability of reduce slots for other users.
3. mapred.child.java.opts --> -Xms512M -Xmx1536M -XX:+UseSerialGC
4. mapred.job.shuffle.input.buffer.percent --> 0.70
5. mapred.job.shuffle.merge.percent --> 0.66
6. mapred.inmem.merge.threshold --> 1000
7. I have nearly 5000 mappers which are supposed to produce LZO compressed outputs. The logs seem to indicate that the map outputs range between 0.3G to 0.8GB.
Does anything here seem amiss? I'd appreciate any input of what settings to try. I can try different reduced values for the input buffer percent and the merge percent. Given that the job runs for about 7-8 hours before crashing, I would like to make some informed choices if possible.
Thanks.
~ Niranjan.