FAQ
Hello,

I'm currently developing a map/reduce program that emits a fair amount of maps per input record (around 50 - 100), and I'm getting OutOfMemory errors:

2008-09-06 15:28:08,993 ERROR org.apache.hadoop.mapred.pipes.BinaryProtocol: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$BlockingBuffer.reset(MapTask.java:564)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:440)
at org.apache.hadoop.mapred.pipes.OutputHandler.output(OutputHandler.java:55)
at org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(BinaryProtocol.java:117)


It is a reproducible error which occurs at the same percentage all the time - when I emit less maps per input record, the problem goes away.

Now, I have tried editing conf/hadoop-env.sh to increase the HADOOP_HEAPSIZE to 2000MB and set `export HADOOP_TASKTRACKER_OPTS="-Xms32m -Xmx2048m"`, but the problem persists at the exact same place.

Now, my use case doesn't really look that spectacular; is this a common problem, and if so, what are the usual ways to get around this?

Thanks in advance for a response!

Regards,

Leon Mergen

Search Discussions

  • Leon Mergen at Sep 6, 2008 at 4:36 pm
    Hello,
    I'm currently developing a map/reduce program that emits a fair amount
    of maps per input record (around 50 - 100), and I'm getting OutOfMemory
    errors:
    Sorry for the noise, I found out I had to set the mapred.child.java.opts JobConf parameter to "-Xmx512m" to make 512MB of heap space available in the map processes.

    However, I was wondering: are these hard architectural limits? Say that I wanted to emit 25,000 maps for a single input record, would that mean that I will require huge amounts of (virtual) memory? In other words, what exactly is the reason that increasing the number of emitted maps per input record causes an OutOfMemoryError ?

    Regards,

    Leon Mergen
  • Chris Douglas at Sep 6, 2008 at 9:58 pm
    From the stack trace you provided, your OOM is probably due to
    HADOOP-3931, which is fixed in 0.17.2. It occurs when the deserialized
    key in an outputted record exactly fills the serialization buffer that
    collects map outputs, causing an allocation as large as the size of
    that buffer. It causes an extra spill, an OOM exception if the task
    JVM has a max heap size too small to mask the bug, and will miss the
    combiner if you've defined one, but it won't drop records.
    However, I was wondering: are these hard architectural limits? Say
    that I wanted to emit 25,000 maps for a single input record, would
    that mean that I will require huge amounts of (virtual) memory? In
    other words, what exactly is the reason that increasing the number
    of emitted maps per input record causes an OutOfMemoryError ?

    Do you mean the number of output records per input record in the map?
    The memory allocated for collecting records out of the map is (mostly)
    fixed at the size defined in io.sort.mb. The ratio of input records to
    output records does not affect the collection and sort. The number of
    output records can sometimes influence the memory requirements, but
    not significantly. -C
  • Leon Mergen at Sep 7, 2008 at 11:24 am
    Hello Chris,
    From the stack trace you provided, your OOM is probably due to
    HADOOP-3931, which is fixed in 0.17.2. It occurs when the deserialized
    key in an outputted record exactly fills the serialization buffer that
    collects map outputs, causing an allocation as large as the size of
    that buffer. It causes an extra spill, an OOM exception if the task
    JVM has a max heap size too small to mask the bug, and will miss the
    combiner if you've defined one, but it won't drop records.
    Ok thanks for that information. I guess that means I will have to upgrade. :-)
    However, I was wondering: are these hard architectural limits? Say
    that I wanted to emit 25,000 maps for a single input record, would
    that mean that I will require huge amounts of (virtual) memory? In
    other words, what exactly is the reason that increasing the number
    of emitted maps per input record causes an OutOfMemoryError ?
    Do you mean the number of output records per input record in the map?
    The memory allocated for collecting records out of the map is (mostly)
    fixed at the size defined in io.sort.mb. The ratio of input records to
    output records does not affect the collection and sort. The number of
    output records can sometimes influence the memory requirements, but
    not significantly. -C
    Ok, so I should not have to worry about this too much! Thanks for the reply and information!

    Regards,

    Leon Mergen

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedSep 6, '08 at 3:44p
activeSep 7, '08 at 11:24a
posts4
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Leon Mergen: 3 posts Chris Douglas: 1 post

People

Translate

site design / logo © 2022 Grokbase