FAQ
Greetings,

I'm running into a brain-numbing problem on Elastic MapReduce. I'm
running a decent-size task (22,000 mappers, a ton of GZipped input
blocks, ~1TB of data) on 40 c1.xlarge nodes (7 gb RAM, ~8 "cores").

I get failures randomly --- sometimes at the end of my 6-step process,
sometimes at the first reducer phase, sometimes in the mapper. It
seems to fail in multiple areas. Mostly in the reducers. Any ideas?

Here's the settings I've changed:
-Xmx400m
6 max mappers
1 max reducer
1GB swap partition
mapred.job.reuse.jvm.num.tasks=50
mapred.reduce.parallel.copies=3


java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.nio.CharBuffer.wrap(CharBuffer.java:350)
at java.nio.CharBuffer.wrap(CharBuffer.java:373)
at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138)
at java.lang.StringCoding.decode(StringCoding.java:173)
at java.lang.String.(String.java:443)
at java.lang.String.(String.java:515)
at org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116)
at cascading.tuple.TupleInputStream.readString(TupleInputStream.java:144)
at cascading.tuple.TupleInputStream.readType(TupleInputStream.java:154)
at cascading.tuple.TupleInputStream.getNextElement(TupleInputStream.java:101)
at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:75)
at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:33)
at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74)
at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34)
at cascading.tuple.hadoop.DeserializerComparator.compareTuples(DeserializerComparator.java:142)
at cascading.tuple.hadoop.GroupingSortingComparator.compare(GroupingSortingComparator.java:55)
at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136)
at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2645)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2586)

--
Bradford Stephens,
Founder, Drawn to Scale
drawntoscalehq.com
727.697.7528

http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
solution. Process, store, query, search, and serve all your data.

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science

Search Discussions

  • Bradford Stephens at Sep 26, 2010 at 8:01 am
    I'm going to try running it on high-RAM boxes with -Xmx4096m or so,
    see if that helps.

    On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens
    wrote:
    Greetings,

    I'm running into a brain-numbing problem on Elastic MapReduce. I'm
    running a decent-size task (22,000 mappers, a ton of GZipped input
    blocks, ~1TB of data) on 40 c1.xlarge nodes (7 gb RAM, ~8 "cores").

    I get failures randomly --- sometimes at the end of my 6-step process,
    sometimes at the first reducer phase, sometimes in the mapper. It
    seems to fail in multiple areas. Mostly in the reducers. Any ideas?

    Here's the settings I've changed:
    -Xmx400m
    6 max mappers
    1 max reducer
    1GB swap partition
    mapred.job.reuse.jvm.num.tasks=50
    mapred.reduce.parallel.copies=3


    java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.nio.CharBuffer.wrap(CharBuffer.java:350)
    at java.nio.CharBuffer.wrap(CharBuffer.java:373)
    at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138)
    at java.lang.StringCoding.decode(StringCoding.java:173)
    at java.lang.String.(String.java:443)
    at java.lang.String.(String.java:515)
    at org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116)
    at cascading.tuple.TupleInputStream.readString(TupleInputStream.java:144)
    at cascading.tuple.TupleInputStream.readType(TupleInputStream.java:154)
    at cascading.tuple.TupleInputStream.getNextElement(TupleInputStream.java:101)
    at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:75)
    at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:33)
    at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74)
    at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34)
    at cascading.tuple.hadoop.DeserializerComparator.compareTuples(DeserializerComparator.java:142)
    at cascading.tuple.hadoop.GroupingSortingComparator.compare(GroupingSortingComparator.java:55)
    at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
    at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136)
    at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
    at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
    at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
    at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2645)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2586)

    --
    Bradford Stephens,
    Founder, Drawn to Scale
    drawntoscalehq.com
    727.697.7528

    http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
    solution. Process, store, query, search, and serve all your data.

    http://www.roadtofailure.com -- The Fringes of Scalability, Social
    Media, and Computer Science


    --
    Bradford Stephens,
    Founder, Drawn to Scale
    drawntoscalehq.com
    727.697.7528

    http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
    solution. Process, store, query, search, and serve all your data.

    http://www.roadtofailure.com -- The Fringes of Scalability, Social
    Media, and Computer Science
  • Bradford Stephens at Sep 26, 2010 at 10:31 am
    Nope, that didn't seem to help.

    On Sun, Sep 26, 2010 at 1:00 AM, Bradford Stephens
    wrote:
    I'm going to try running it on high-RAM boxes with -Xmx4096m or so,
    see if that helps.

    On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens
    wrote:
    Greetings,

    I'm running into a brain-numbing problem on Elastic MapReduce. I'm
    running a decent-size task (22,000 mappers, a ton of GZipped input
    blocks, ~1TB of data) on 40 c1.xlarge nodes (7 gb RAM, ~8 "cores").

    I get failures randomly --- sometimes at the end of my 6-step process,
    sometimes at the first reducer phase, sometimes in the mapper. It
    seems to fail in multiple areas. Mostly in the reducers. Any ideas?

    Here's the settings I've changed:
    -Xmx400m
    6 max mappers
    1 max reducer
    1GB swap partition
    mapred.job.reuse.jvm.num.tasks=50
    mapred.reduce.parallel.copies=3


    java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.nio.CharBuffer.wrap(CharBuffer.java:350)
    at java.nio.CharBuffer.wrap(CharBuffer.java:373)
    at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138)
    at java.lang.StringCoding.decode(StringCoding.java:173)
    at java.lang.String.(String.java:443)
    at java.lang.String.(String.java:515)
    at org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116)
    at cascading.tuple.TupleInputStream.readString(TupleInputStream.java:144)
    at cascading.tuple.TupleInputStream.readType(TupleInputStream.java:154)
    at cascading.tuple.TupleInputStream.getNextElement(TupleInputStream.java:101)
    at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:75)
    at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:33)
    at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74)
    at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34)
    at cascading.tuple.hadoop.DeserializerComparator.compareTuples(DeserializerComparator.java:142)
    at cascading.tuple.hadoop.GroupingSortingComparator.compare(GroupingSortingComparator.java:55)
    at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
    at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136)
    at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
    at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
    at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
    at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2645)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2586)

    --
    Bradford Stephens,
    Founder, Drawn to Scale
    drawntoscalehq.com
    727.697.7528

    http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
    solution. Process, store, query, search, and serve all your data.

    http://www.roadtofailure.com -- The Fringes of Scalability, Social
    Media, and Computer Science


    --
    Bradford Stephens,
    Founder, Drawn to Scale
    drawntoscalehq.com
    727.697.7528

    http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
    solution. Process, store, query, search, and serve all your data.

    http://www.roadtofailure.com -- The Fringes of Scalability, Social
    Media, and Computer Science


    --
    Bradford Stephens,
    Founder, Drawn to Scale
    drawntoscalehq.com
    727.697.7528

    http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
    solution. Process, store, query, search, and serve all your data.

    http://www.roadtofailure.com -- The Fringes of Scalability, Social
    Media, and Computer Science
  • Ted Yu at Sep 26, 2010 at 1:47 pm
    Have you tried lowering mapred.job.reuse.jvm.num.tasks ?
    On Sun, Sep 26, 2010 at 3:30 AM, Bradford Stephens wrote:

    Nope, that didn't seem to help.

    On Sun, Sep 26, 2010 at 1:00 AM, Bradford Stephens
    wrote:
    I'm going to try running it on high-RAM boxes with -Xmx4096m or so,
    see if that helps.

    On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens
    wrote:
    Greetings,

    I'm running into a brain-numbing problem on Elastic MapReduce. I'm
    running a decent-size task (22,000 mappers, a ton of GZipped input
    blocks, ~1TB of data) on 40 c1.xlarge nodes (7 gb RAM, ~8 "cores").

    I get failures randomly --- sometimes at the end of my 6-step process,
    sometimes at the first reducer phase, sometimes in the mapper. It
    seems to fail in multiple areas. Mostly in the reducers. Any ideas?

    Here's the settings I've changed:
    -Xmx400m
    6 max mappers
    1 max reducer
    1GB swap partition
    mapred.job.reuse.jvm.num.tasks=50
    mapred.reduce.parallel.copies=3


    java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.nio.CharBuffer.wrap(CharBuffer.java:350)
    at java.nio.CharBuffer.wrap(CharBuffer.java:373)
    at
    java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138)
    at java.lang.StringCoding.decode(StringCoding.java:173)
    at java.lang.String.(String.java:443)
    at java.lang.String.(String.java:515)
    at
    org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116)
    at
    cascading.tuple.TupleInputStream.readString(TupleInputStream.java:144)
    at
    cascading.tuple.TupleInputStream.readType(TupleInputStream.java:154)
    at
    cascading.tuple.TupleInputStream.getNextElement(TupleInputStream.java:101)
    at
    cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:75)
    at
    cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:33)
    at
    cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74)
    at
    cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34)
    at
    cascading.tuple.hadoop.DeserializerComparator.compareTuples(DeserializerComparator.java:142)
    at
    cascading.tuple.hadoop.GroupingSortingComparator.compare(GroupingSortingComparator.java:55)
    at
    org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
    at
    org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136)
    at
    org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
    at
    org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
    at
    org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
    at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
    at
    org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2645)
    at
    org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2586)
    --
    Bradford Stephens,
    Founder, Drawn to Scale
    drawntoscalehq.com
    727.697.7528

    http://www.drawntoscalehq.com -- The intuitive, cloud-scale data
    solution. Process, store, query, search, and serve all your data.

    http://www.roadtofailure.com -- The Fringes of Scalability, Social
    Media, and Computer Science


    --
    Bradford Stephens,
    Founder, Drawn to Scale
    drawntoscalehq.com
    727.697.7528

    http://www.drawntoscalehq.com -- The intuitive, cloud-scale data
    solution. Process, store, query, search, and serve all your data.

    http://www.roadtofailure.com -- The Fringes of Scalability, Social
    Media, and Computer Science


    --
    Bradford Stephens,
    Founder, Drawn to Scale
    drawntoscalehq.com
    727.697.7528

    http://www.drawntoscalehq.com -- The intuitive, cloud-scale data
    solution. Process, store, query, search, and serve all your data.

    http://www.roadtofailure.com -- The Fringes of Scalability, Social
    Media, and Computer Science
  • Bradford Stephens at Sep 26, 2010 at 8:20 pm
    Hrm.... no. I've lowered it to -1, but I can try 1 again.
    On Sun, Sep 26, 2010 at 6:47 AM, Ted Yu wrote:
    Have you tried lowering mapred.job.reuse.jvm.num.tasks ?

    On Sun, Sep 26, 2010 at 3:30 AM, Bradford Stephens <
    [email protected]> wrote:
    Nope, that didn't seem to help.

    On Sun, Sep 26, 2010 at 1:00 AM, Bradford Stephens
    wrote:
    I'm going to try running it on high-RAM boxes with -Xmx4096m or so,
    see if that helps.

    On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens
    wrote:
    Greetings,

    I'm running into a brain-numbing problem on Elastic MapReduce. I'm
    running a decent-size task (22,000 mappers, a ton of GZipped input
    blocks, ~1TB of data) on 40 c1.xlarge nodes (7 gb RAM, ~8 "cores").

    I get failures randomly --- sometimes at the end of my 6-step process,
    sometimes at the first reducer phase, sometimes in the mapper. It
    seems to fail in multiple areas. Mostly in the reducers. Any ideas?

    Here's the settings I've changed:
    -Xmx400m
    6 max mappers
    1 max reducer
    1GB swap partition
    mapred.job.reuse.jvm.num.tasks=50
    mapred.reduce.parallel.copies=3


    java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.nio.CharBuffer.wrap(CharBuffer.java:350)
    at java.nio.CharBuffer.wrap(CharBuffer.java:373)
    at
    java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138)
    at java.lang.StringCoding.decode(StringCoding.java:173)
    at java.lang.String.(String.java:443)
    at java.lang.String.(String.java:515)
    at
    org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116)
    at
    cascading.tuple.TupleInputStream.readString(TupleInputStream.java:144)
    at
    cascading.tuple.TupleInputStream.readType(TupleInputStream.java:154)
    at
    cascading.tuple.TupleInputStream.getNextElement(TupleInputStream.java:101)
    at
    cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:75)
    at
    cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:33)
    at
    cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74)
    at
    cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34)
    at
    cascading.tuple.hadoop.DeserializerComparator.compareTuples(DeserializerComparator.java:142)
    at
    cascading.tuple.hadoop.GroupingSortingComparator.compare(GroupingSortingComparator.java:55)
    at
    org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
    at
    org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136)
    at
    org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
    at
    org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
    at
    org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
    at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
    at
    org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2645)
    at
    org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2586)
    --
    Bradford Stephens,
    Founder, Drawn to Scale
    drawntoscalehq.com
    727.697.7528

    http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
    solution. Process, store, query, search, and serve all your data.

    http://www.roadtofailure.com -- The Fringes of Scalability, Social
    Media, and Computer Science


    --
    Bradford Stephens,
    Founder, Drawn to Scale
    drawntoscalehq.com
    727.697.7528

    http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
    solution. Process, store, query, search, and serve all your data.

    http://www.roadtofailure.com -- The Fringes of Scalability, Social
    Media, and Computer Science


    --
    Bradford Stephens,
    Founder, Drawn to Scale
    drawntoscalehq.com
    727.697.7528

    http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
    solution. Process, store, query, search, and serve all your data.

    http://www.roadtofailure.com -- The Fringes of Scalability, Social
    Media, and Computer Science


    --
    Bradford Stephens,
    Founder, Drawn to Scale
    drawntoscalehq.com
    727.697.7528

    http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
    solution. Process, store, query, search, and serve all your data.

    http://www.roadtofailure.com -- The Fringes of Scalability, Social
    Media, and Computer Science
  • Ted Yu at Sep 26, 2010 at 9:36 pm
    -1 means there is no limit to reusing.
    At the same time, you can generate heap dump from OOME and analyze with
    YourKit, etc.

    Cheers
    On Sun, Sep 26, 2010 at 1:19 PM, Bradford Stephens wrote:

    Hrm.... no. I've lowered it to -1, but I can try 1 again.
    On Sun, Sep 26, 2010 at 6:47 AM, Ted Yu wrote:
    Have you tried lowering mapred.job.reuse.jvm.num.tasks ?

    On Sun, Sep 26, 2010 at 3:30 AM, Bradford Stephens <
    [email protected]> wrote:
    Nope, that didn't seem to help.

    On Sun, Sep 26, 2010 at 1:00 AM, Bradford Stephens
    wrote:
    I'm going to try running it on high-RAM boxes with -Xmx4096m or so,
    see if that helps.

    On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens
    wrote:
    Greetings,

    I'm running into a brain-numbing problem on Elastic MapReduce. I'm
    running a decent-size task (22,000 mappers, a ton of GZipped input
    blocks, ~1TB of data) on 40 c1.xlarge nodes (7 gb RAM, ~8 "cores").

    I get failures randomly --- sometimes at the end of my 6-step
    process,
    sometimes at the first reducer phase, sometimes in the mapper. It
    seems to fail in multiple areas. Mostly in the reducers. Any ideas?

    Here's the settings I've changed:
    -Xmx400m
    6 max mappers
    1 max reducer
    1GB swap partition
    mapred.job.reuse.jvm.num.tasks=50
    mapred.reduce.parallel.copies=3


    java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.nio.CharBuffer.wrap(CharBuffer.java:350)
    at java.nio.CharBuffer.wrap(CharBuffer.java:373)
    at
    java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138)
    at java.lang.StringCoding.decode(StringCoding.java:173)
    at java.lang.String.(String.java:443)
    at java.lang.String.(String.java:515)
    at
    org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116)
    at
    cascading.tuple.TupleInputStream.readString(TupleInputStream.java:144)
    at
    cascading.tuple.TupleInputStream.readType(TupleInputStream.java:154)
    at
    cascading.tuple.TupleInputStream.getNextElement(TupleInputStream.java:101)
    at
    cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:75)
    at
    cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:33)
    at
    cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74)
    at
    cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34)
    at
    cascading.tuple.hadoop.DeserializerComparator.compareTuples(DeserializerComparator.java:142)
    at
    cascading.tuple.hadoop.GroupingSortingComparator.compare(GroupingSortingComparator.java:55)
    at
    org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
    at
    org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136)
    at
    org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
    at
    org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
    at
    org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
    at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
    at
    org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2645)
    at
    org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2586)
    --
    Bradford Stephens,
    Founder, Drawn to Scale
    drawntoscalehq.com
    727.697.7528

    http://www.drawntoscalehq.com -- The intuitive, cloud-scale data
    solution. Process, store, query, search, and serve all your data.

    http://www.roadtofailure.com -- The Fringes of Scalability, Social
    Media, and Computer Science


    --
    Bradford Stephens,
    Founder, Drawn to Scale
    drawntoscalehq.com
    727.697.7528

    http://www.drawntoscalehq.com -- The intuitive, cloud-scale data
    solution. Process, store, query, search, and serve all your data.

    http://www.roadtofailure.com -- The Fringes of Scalability, Social
    Media, and Computer Science


    --
    Bradford Stephens,
    Founder, Drawn to Scale
    drawntoscalehq.com
    727.697.7528

    http://www.drawntoscalehq.com -- The intuitive, cloud-scale data
    solution. Process, store, query, search, and serve all your data.

    http://www.roadtofailure.com -- The Fringes of Scalability, Social
    Media, and Computer Science


    --
    Bradford Stephens,
    Founder, Drawn to Scale
    drawntoscalehq.com
    727.697.7528

    http://www.drawntoscalehq.com -- The intuitive, cloud-scale data
    solution. Process, store, query, search, and serve all your data.

    http://www.roadtofailure.com -- The Fringes of Scalability, Social
    Media, and Computer Science
  • Chris K Wensel at Sep 26, 2010 at 3:11 pm
    fwiw

    I run m2.xlarge slaves, using the default mappers/reducers (4/2 i think).

    with swap
    --bootstrap-action s3://elasticmapreduce/bootstrap-actions/create-swap-file.rb --args "-E,/mnt/swap,1000"

    historically i'v run this property with no issues, but should probably re-research the gc setting (comments please)
    "mapred.child.java.opts", "-server -Xmx2000m -XX:+UseParallelOldGC"

    i haven't co-installed ganglia to look at utilization lately, but any more mappers than 4 or more than 2 reducers have always given me headaches.

    ckw
    On Sep 26, 2010, at 12:55 AM, Bradford Stephens wrote:

    Greetings,

    I'm running into a brain-numbing problem on Elastic MapReduce. I'm
    running a decent-size task (22,000 mappers, a ton of GZipped input
    blocks, ~1TB of data) on 40 c1.xlarge nodes (7 gb RAM, ~8 "cores").

    I get failures randomly --- sometimes at the end of my 6-step process,
    sometimes at the first reducer phase, sometimes in the mapper. It
    seems to fail in multiple areas. Mostly in the reducers. Any ideas?

    Here's the settings I've changed:
    -Xmx400m
    6 max mappers
    1 max reducer
    1GB swap partition
    mapred.job.reuse.jvm.num.tasks=50
    mapred.reduce.parallel.copies=3


    java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.nio.CharBuffer.wrap(CharBuffer.java:350)
    at java.nio.CharBuffer.wrap(CharBuffer.java:373)
    at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138)
    at java.lang.StringCoding.decode(StringCoding.java:173)
    at java.lang.String.(String.java:443)
    at java.lang.String.(String.java:515)
    at org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116)
    at cascading.tuple.TupleInputStream.readString(TupleInputStream.java:144)
    at cascading.tuple.TupleInputStream.readType(TupleInputStream.java:154)
    at cascading.tuple.TupleInputStream.getNextElement(TupleInputStream.java:101)
    at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:75)
    at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:33)
    at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74)
    at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34)
    at cascading.tuple.hadoop.DeserializerComparator.compareTuples(DeserializerComparator.java:142)
    at cascading.tuple.hadoop.GroupingSortingComparator.compare(GroupingSortingComparator.java:55)
    at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
    at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136)
    at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
    at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
    at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
    at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2645)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2586)

    --
    Bradford Stephens,
    Founder, Drawn to Scale
    drawntoscalehq.com
    727.697.7528

    http://www.drawntoscalehq.com -- The intuitive, cloud-scale data
    solution. Process, store, query, search, and serve all your data.

    http://www.roadtofailure.com -- The Fringes of Scalability, Social
    Media, and Computer Science

    --
    You received this message because you are subscribed to the Google Groups "cascading-user" group.
    To post to this group, send email to [email protected].
    To unsubscribe from this group, send email to [email protected].
    For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.
    --
    Chris K Wensel
    [email protected]
    http://www.concurrentinc.com

    -- Concurrent, Inc. offers mentoring, support, and licensing for Cascading
  • Ted Dunning at Sep 26, 2010 at 4:36 pm
    The old GC routinely gives me lower performance than modern GC. The default
    is now quite good for batch programs.
    On Sun, Sep 26, 2010 at 8:10 AM, Chris K Wensel wrote:

    historically i'v run this property with no issues, but should probably
    re-research the gc setting (comments please)
    "mapred.child.java.opts", "-server -Xmx2000m -XX:+UseParallelOldGC"
  • Ted Dunning at Sep 26, 2010 at 4:37 pm
    My feeling is that you have some kind of leak going on in your mappers or
    reducers and that reducing the number of times the jvm is re-used would
    improve matters.

    GC overhead limit indicates that your (tiny) heap is full and collection is
    not reducing that.
    On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens wrote:

    mapred.job.reuse.jvm.num.tasks=50
  • Bradford Stephens at Sep 26, 2010 at 11:47 pm
    Sadly, making Chris's changes didn't help.

    Here's the Cascading code, it's pretty simple but uses the new
    "combiner"-like functionality:

    http://pastebin.com/ccvDmLSX


    On Sun, Sep 26, 2010 at 9:37 AM, Ted Dunning wrote:
    My feeling is that you have some kind of leak going on in your mappers or
    reducers and that reducing the number of times the jvm is re-used would
    improve matters.

    GC overhead limit indicates that your (tiny) heap is full and collection is
    not reducing that.

    On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens <
    [email protected]> wrote:
    mapred.job.reuse.jvm.num.tasks=50


    --
    Bradford Stephens,
    Founder, Drawn to Scale
    drawntoscalehq.com
    727.697.7528

    http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
    solution. Process, store, query, search, and serve all your data.

    http://www.roadtofailure.com -- The Fringes of Scalability, Social
    Media, and Computer Science
  • Chris K Wensel at Sep 27, 2010 at 12:10 am
    Try using a lower threshold value (the num of values in the LRU to cache). this is the tradeoff of this approach.

    ckw
    On Sep 26, 2010, at 4:46 PM, Bradford Stephens wrote:

    Sadly, making Chris's changes didn't help.

    Here's the Cascading code, it's pretty simple but uses the new
    "combiner"-like functionality:

    http://pastebin.com/ccvDmLSX


    On Sun, Sep 26, 2010 at 9:37 AM, Ted Dunning wrote:
    My feeling is that you have some kind of leak going on in your mappers or
    reducers and that reducing the number of times the jvm is re-used would
    improve matters.

    GC overhead limit indicates that your (tiny) heap is full and collection is
    not reducing that.

    On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens <
    [email protected]> wrote:
    mapred.job.reuse.jvm.num.tasks=50


    --
    Bradford Stephens,
    Founder, Drawn to Scale
    drawntoscalehq.com
    727.697.7528

    http://www.drawntoscalehq.com -- The intuitive, cloud-scale data
    solution. Process, store, query, search, and serve all your data.

    http://www.roadtofailure.com -- The Fringes of Scalability, Social
    Media, and Computer Science

    --
    You received this message because you are subscribed to the Google Groups "cascading-user" group.
    To post to this group, send email to [email protected].
    To unsubscribe from this group, send email to [email protected].
    For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.
    --
    Chris K Wensel
    [email protected]
    http://www.concurrentinc.com

    -- Concurrent, Inc. offers mentoring, support, and licensing for Cascading
  • Bradford Stephens at Sep 27, 2010 at 12:38 am
    Yup, I've turned it down to 1,000. Let's see if that helps!
    On Sun, Sep 26, 2010 at 5:09 PM, Chris K Wensel wrote:
    Try using a lower threshold value (the num of values in the LRU to cache). this is the tradeoff of this approach.

    ckw
    On Sep 26, 2010, at 4:46 PM, Bradford Stephens wrote:

    Sadly, making Chris's changes didn't help.

    Here's the Cascading code, it's pretty simple but uses the new
    "combiner"-like functionality:

    http://pastebin.com/ccvDmLSX


    On Sun, Sep 26, 2010 at 9:37 AM, Ted Dunning wrote:
    My feeling is that you have some kind of leak going on in your mappers or
    reducers and that reducing the number of times the jvm is re-used would
    improve matters.

    GC overhead limit indicates that your (tiny) heap is full and collection is
    not reducing that.

    On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens <
    [email protected]> wrote:
    mapred.job.reuse.jvm.num.tasks=50


    --
    Bradford Stephens,
    Founder, Drawn to Scale
    drawntoscalehq.com
    727.697.7528

    http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
    solution. Process, store, query, search, and serve all your data.

    http://www.roadtofailure.com -- The Fringes of Scalability, Social
    Media, and Computer Science

    --
    You received this message because you are subscribed to the Google Groups "cascading-user" group.
    To post to this group, send email to [email protected].
    To unsubscribe from this group, send email to [email protected].
    For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.
    --
    Chris K Wensel
    [email protected]
    http://www.concurrentinc.com

    -- Concurrent, Inc. offers mentoring, support, and licensing for Cascading


    --
    Bradford Stephens,
    Founder, Drawn to Scale
    drawntoscalehq.com
    727.697.7528

    http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
    solution. Process, store, query, search, and serve all your data.

    http://www.roadtofailure.com -- The Fringes of Scalability, Social
    Media, and Computer Science
  • Alex Kozlov at Sep 27, 2010 at 12:41 am
    Hi Bradford,

    Sometimes the reducers do not handle merging large chunks of data too well:
    How many reducers do you have? Try to increase the # of reducers (you can
    always merge the data later if you are worried about too many output files).

    --
    Alex Kozlov
    Solutions Architect
    Cloudera, Inc
    twitter: alexvk2009

    Hadoop World 2010, October 12, New York City - Register now:
    http://www.cloudera.com/company/press-center/hadoop-world-nyc/

    On Sun, Sep 26, 2010 at 5:09 PM, Chris K Wensel wrote:

    Try using a lower threshold value (the num of values in the LRU to cache).
    this is the tradeoff of this approach.

    ckw
    On Sep 26, 2010, at 4:46 PM, Bradford Stephens wrote:

    Sadly, making Chris's changes didn't help.

    Here's the Cascading code, it's pretty simple but uses the new
    "combiner"-like functionality:

    http://pastebin.com/ccvDmLSX


    On Sun, Sep 26, 2010 at 9:37 AM, Ted Dunning wrote:
    My feeling is that you have some kind of leak going on in your mappers
    or
    reducers and that reducing the number of times the jvm is re-used would
    improve matters.

    GC overhead limit indicates that your (tiny) heap is full and collection
    is
    not reducing that.

    On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens <
    [email protected]> wrote:
    mapred.job.reuse.jvm.num.tasks=50


    --
    Bradford Stephens,
    Founder, Drawn to Scale
    drawntoscalehq.com
    727.697.7528

    http://www.drawntoscalehq.com -- The intuitive, cloud-scale data
    solution. Process, store, query, search, and serve all your data.

    http://www.roadtofailure.com -- The Fringes of Scalability, Social
    Media, and Computer Science

    --
    You received this message because you are subscribed to the Google Groups
    "cascading-user" group.
    To post to this group, send email to [email protected].
    To unsubscribe from this group, send email to
    [email protected]<cascading-user%[email protected]>
    .
    For more options, visit this group at
    http://groups.google.com/group/cascading-user?hl=en.
    --
    Chris K Wensel
    [email protected]
    http://www.concurrentinc.com

    -- Concurrent, Inc. offers mentoring, support, and licensing for Cascading
  • Bradford Stephens at Sep 27, 2010 at 1:02 am
    One of the problems with this data set is that I'm grouping by a
    category that has only, say, 20 different values. Then I'm doing a
    unique count of Facebook user IDs per group. I imagine that's not
    pleasant for the reducers.
    On Sun, Sep 26, 2010 at 5:41 PM, Alex Kozlov wrote:
    Hi Bradford,

    Sometimes the reducers do not handle merging large chunks of data too well:
    How many reducers do you have?  Try to increase the # of reducers (you can
    always merge the data later if you are worried about too many output files).

    --
    Alex Kozlov
    Solutions Architect
    Cloudera, Inc
    twitter: alexvk2009

    Hadoop World 2010, October 12, New York City - Register now:
    http://www.cloudera.com/company/press-center/hadoop-world-nyc/

    On Sun, Sep 26, 2010 at 5:09 PM, Chris K Wensel wrote:

    Try using a lower threshold value (the num of values in the LRU to cache).
    this is the tradeoff of this approach.

    ckw
    On Sep 26, 2010, at 4:46 PM, Bradford Stephens wrote:

    Sadly, making Chris's changes didn't help.

    Here's the Cascading code, it's pretty simple but uses the new
    "combiner"-like functionality:

    http://pastebin.com/ccvDmLSX



    On Sun, Sep 26, 2010 at 9:37 AM, Ted Dunning <[email protected]>
    wrote:
    My feeling is that you have some kind of leak going on in your mappers
    or
    reducers and that reducing the number of times the jvm is re-used would
    improve matters.

    GC overhead limit indicates that your (tiny) heap is full and collection
    is
    not reducing that.

    On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens <
    [email protected]> wrote:
    mapred.job.reuse.jvm.num.tasks=50


    --
    Bradford Stephens,
    Founder, Drawn to Scale
    drawntoscalehq.com
    727.697.7528

    http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
    solution. Process, store, query, search, and serve all your data.

    http://www.roadtofailure.com -- The Fringes of Scalability, Social
    Media, and Computer Science

    --
    You received this message because you are subscribed to the Google Groups
    "cascading-user" group.
    To post to this group, send email to [email protected].
    To unsubscribe from this group, send email to
    [email protected]<cascading-user%[email protected]>
    .
    For more options, visit this group at
    http://groups.google.com/group/cascading-user?hl=en.
    --
    Chris K Wensel
    [email protected]
    http://www.concurrentinc.com

    -- Concurrent, Inc. offers mentoring, support, and licensing for Cascading


    --
    Bradford Stephens,
    Founder, Drawn to Scale
    drawntoscalehq.com
    727.697.7528

    http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
    solution. Process, store, query, search, and serve all your data.

    http://www.roadtofailure.com -- The Fringes of Scalability, Social
    Media, and Computer Science
  • Ted Dunning at Sep 27, 2010 at 2:01 am
    If there are combiners, the reducers shouldn't get any lists longer than a
    small multiple of the number of maps.
    On Sun, Sep 26, 2010 at 6:01 PM, Bradford Stephens wrote:

    One of the problems with this data set is that I'm grouping by a
    category that has only, say, 20 different values. Then I'm doing a
    unique count of Facebook user IDs per group. I imagine that's not
    pleasant for the reducers.
    On Sun, Sep 26, 2010 at 5:41 PM, Alex Kozlov wrote:
    Hi Bradford,

    Sometimes the reducers do not handle merging large chunks of data too well:
    How many reducers do you have? Try to increase the # of reducers (you can
    always merge the data later if you are worried about too many output files).
    --
    Alex Kozlov
    Solutions Architect
    Cloudera, Inc
    twitter: alexvk2009

    Hadoop World 2010, October 12, New York City - Register now:
    http://www.cloudera.com/company/press-center/hadoop-world-nyc/

    On Sun, Sep 26, 2010 at 5:09 PM, Chris K Wensel wrote:

    Try using a lower threshold value (the num of values in the LRU to
    cache).
    this is the tradeoff of this approach.

    ckw
    On Sep 26, 2010, at 4:46 PM, Bradford Stephens wrote:

    Sadly, making Chris's changes didn't help.

    Here's the Cascading code, it's pretty simple but uses the new
    "combiner"-like functionality:

    http://pastebin.com/ccvDmLSX



    On Sun, Sep 26, 2010 at 9:37 AM, Ted Dunning <[email protected]>
    wrote:
    My feeling is that you have some kind of leak going on in your
    mappers
    or
    reducers and that reducing the number of times the jvm is re-used
    would
    improve matters.

    GC overhead limit indicates that your (tiny) heap is full and
    collection
    is
    not reducing that.

    On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens <
    [email protected]> wrote:
    mapred.job.reuse.jvm.num.tasks=50


    --
    Bradford Stephens,
    Founder, Drawn to Scale
    drawntoscalehq.com
    727.697.7528

    http://www.drawntoscalehq.com -- The intuitive, cloud-scale data
    solution. Process, store, query, search, and serve all your data.

    http://www.roadtofailure.com -- The Fringes of Scalability, Social
    Media, and Computer Science

    --
    You received this message because you are subscribed to the Google
    Groups
    "cascading-user" group.
    To post to this group, send email to [email protected].
    To unsubscribe from this group, send email to
    [email protected]<cascading-user%[email protected]>
    <cascading-user%[email protected]<cascading-user%[email protected]>
    .
    For more options, visit this group at
    http://groups.google.com/group/cascading-user?hl=en.
    --
    Chris K Wensel
    [email protected]
    http://www.concurrentinc.com

    -- Concurrent, Inc. offers mentoring, support, and licensing for
    Cascading


    --
    Bradford Stephens,
    Founder, Drawn to Scale
    drawntoscalehq.com
    727.697.7528

    http://www.drawntoscalehq.com -- The intuitive, cloud-scale data
    solution. Process, store, query, search, and serve all your data.

    http://www.roadtofailure.com -- The Fringes of Scalability, Social
    Media, and Computer Science

    --
    You received this message because you are subscribed to the Google Groups
    "cascading-user" group.
    To post to this group, send email to [email protected].
    To unsubscribe from this group, send email to
    [email protected]<cascading-user%[email protected]>
    .
    For more options, visit this group at
    http://groups.google.com/group/cascading-user?hl=en.
  • Bradford Stephens at Sep 27, 2010 at 9:46 am
    It turned out to be a deployment issue of an old version. Ted and
    Chris's suggestions were spot-on.

    I can't believe how BRILLIANT these combiners from Cascading are. It's
    cut my processing time down from 20 hours to 50 minutes. AND I cut out
    about 80% of my hand-crafted code.

    Bravo. I look smart now. (Almost).

    -B
    On Sun, Sep 26, 2010 at 7:00 PM, Ted Dunning wrote:
    If there are combiners, the reducers shouldn't get any lists longer than a
    small multiple of the number of maps.

    On Sun, Sep 26, 2010 at 6:01 PM, Bradford Stephens <
    [email protected]> wrote:
    One of the problems with this data set is that I'm grouping by a
    category that has only, say, 20 different values. Then I'm doing a
    unique count of Facebook user IDs per group. I imagine that's not
    pleasant for the reducers.
    On Sun, Sep 26, 2010 at 5:41 PM, Alex Kozlov wrote:
    Hi Bradford,

    Sometimes the reducers do not handle merging large chunks of data too well:
    How many reducers do you have?  Try to increase the # of reducers (you can
    always merge the data later if you are worried about too many output files).
    --
    Alex Kozlov
    Solutions Architect
    Cloudera, Inc
    twitter: alexvk2009

    Hadoop World 2010, October 12, New York City - Register now:
    http://www.cloudera.com/company/press-center/hadoop-world-nyc/


    On Sun, Sep 26, 2010 at 5:09 PM, Chris K Wensel <[email protected]>
    wrote:
    Try using a lower threshold value (the num of values in the LRU to
    cache).
    this is the tradeoff of this approach.

    ckw
    On Sep 26, 2010, at 4:46 PM, Bradford Stephens wrote:

    Sadly, making Chris's changes didn't help.

    Here's the Cascading code, it's pretty simple but uses the new
    "combiner"-like functionality:

    http://pastebin.com/ccvDmLSX



    On Sun, Sep 26, 2010 at 9:37 AM, Ted Dunning <[email protected]>
    wrote:
    My feeling is that you have some kind of leak going on in your
    mappers
    or
    reducers and that reducing the number of times the jvm is re-used
    would
    improve matters.

    GC overhead limit indicates that your (tiny) heap is full and
    collection
    is
    not reducing that.

    On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens <
    [email protected]> wrote:
    mapred.job.reuse.jvm.num.tasks=50


    --
    Bradford Stephens,
    Founder, Drawn to Scale
    drawntoscalehq.com
    727.697.7528

    http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
    solution. Process, store, query, search, and serve all your data.

    http://www.roadtofailure.com -- The Fringes of Scalability, Social
    Media, and Computer Science

    --
    You received this message because you are subscribed to the Google
    Groups
    "cascading-user" group.
    To post to this group, send email to [email protected].
    To unsubscribe from this group, send email to
    [email protected]<cascading-user%[email protected]>
    <cascading-user%[email protected]<cascading-user%[email protected]>
    .
    For more options, visit this group at
    http://groups.google.com/group/cascading-user?hl=en.
    --
    Chris K Wensel
    [email protected]
    http://www.concurrentinc.com

    -- Concurrent, Inc. offers mentoring, support, and licensing for
    Cascading


    --
    Bradford Stephens,
    Founder, Drawn to Scale
    drawntoscalehq.com
    727.697.7528

    http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
    solution. Process, store, query, search, and serve all your data.

    http://www.roadtofailure.com -- The Fringes of Scalability, Social
    Media, and Computer Science

    --
    You received this message because you are subscribed to the Google Groups
    "cascading-user" group.
    To post to this group, send email to [email protected].
    To unsubscribe from this group, send email to
    [email protected]<cascading-user%[email protected]>
    .
    For more options, visit this group at
    http://groups.google.com/group/cascading-user?hl=en.


    --
    Bradford Stephens,
    Founder, Drawn to Scale
    drawntoscalehq.com
    727.697.7528

    http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
    solution. Process, store, query, search, and serve all your data.

    http://www.roadtofailure.com -- The Fringes of Scalability, Social
    Media, and Computer Science
  • Vitaliy Semochkin at Sep 27, 2010 at 9:21 am
    Hi,

    "[..]if more than 98% of the total time is spent in garbage collection
    and less than 2% of the heap is recovered, an OutOfMemoryError will be
    thrown. This feature is designed to prevent applications from running
    for an extended period of time while making little or no progress
    because the heap is too small. If necessary, this feature can be
    disabled by adding the option -XX:-UseGCOverheadLimit to the command
    line."

    This is what often happens in MapReduce operations when u process a lot of data.
    I recommend to try
    <property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx1024m -XX:-UseGCOverheadLimit</value>
    </property>


    also from my personal experience when process a lot of data often it
    is much cheaper to kill JVM rather than wait for GC.
    For that reason if you have a lot of BIG tasks rather than tons of
    small tasks do not reuse JVM, killing JVM and starting it again often
    much cheaper than trying to GC 1GB of ram(don't know why, it just
    tuned out in my tests).
    <property>
    <name>mapred.job.reuse.jvm.num.tasks</name>
    <value>1</value>
    </description>

    Regards,
    Vitaliy S

    On Sun, Sep 26, 2010 at 11:55 AM, Bradford Stephens
    wrote:
    Greetings,

    I'm running into a brain-numbing problem on Elastic MapReduce. I'm
    running a decent-size task (22,000 mappers, a ton of GZipped input
    blocks, ~1TB of data) on 40 c1.xlarge nodes (7 gb RAM, ~8 "cores").

    I get failures randomly --- sometimes at the end of my 6-step process,
    sometimes at the first reducer phase, sometimes in the mapper. It
    seems to fail in multiple areas. Mostly in the reducers. Any ideas?

    Here's the settings I've changed:
    -Xmx400m
    6 max mappers
    1 max reducer
    1GB swap partition
    mapred.job.reuse.jvm.num.tasks=50
    mapred.reduce.parallel.copies=3


    java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.nio.CharBuffer.wrap(CharBuffer.java:350)
    at java.nio.CharBuffer.wrap(CharBuffer.java:373)
    at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138)
    at java.lang.StringCoding.decode(StringCoding.java:173)
    at java.lang.String.(String.java:443)
    at java.lang.String.(String.java:515)
    at org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116)
    at cascading.tuple.TupleInputStream.readString(TupleInputStream.java:144)
    at cascading.tuple.TupleInputStream.readType(TupleInputStream.java:154)
    at cascading.tuple.TupleInputStream.getNextElement(TupleInputStream.java:101)
    at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:75)
    at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:33)
    at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74)
    at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34)
    at cascading.tuple.hadoop.DeserializerComparator.compareTuples(DeserializerComparator.java:142)
    at cascading.tuple.hadoop.GroupingSortingComparator.compare(GroupingSortingComparator.java:55)
    at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
    at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136)
    at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
    at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
    at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
    at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2645)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2586)

    --
    Bradford Stephens,
    Founder, Drawn to Scale
    drawntoscalehq.com
    727.697.7528

    http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
    solution. Process, store, query, search, and serve all your data.

    http://www.roadtofailure.com -- The Fringes of Scalability, Social
    Media, and Computer Science
  • Bharath Mundlapudi at Sep 27, 2010 at 6:25 pm
    Couple of things you can try.
    1. Increase the Heap Size for the tasks.

    2. Since, your OOM happening randomly, try setting -XX:+HeapDumpOnOutOfMemoryError for your child JVM parameters. Atleast you can detect, why your heap growing -is it due to a leak ? or if you need to increase the heap size for your mappers or reduces from this heap dump analysis.

    3. Other reason is due to poor JVM GC tuning. Sometimes, default can't catchup with the garbage created. This needs some GC tuning.

    -Bharath




    From: [email protected]
    To: [email protected]; [email protected]
    Cc:
    Sent: Sunday, September 26, 2010 12:55:15 AM
    Subject: java.lang.OutOfMemoryError: GC overhead limit exceeded

    Greetings,

    I'm running into a brain-numbing problem on Elastic MapReduce. I'm
    running a decent-size task (22,000 mappers, a ton of GZipped input
    blocks, ~1TB of data) on 40 c1.xlarge nodes (7 gb RAM, ~8 "cores").

    I get failures randomly --- sometimes at the end of my 6-step process,
    sometimes at the first reducer phase, sometimes in the mapper. It
    seems to fail in multiple areas. Mostly in the reducers. Any ideas?

    Here's the settings I've changed:
    -Xmx400m
    6 max mappers
    1 max reducer
    1GB swap partition
    mapred.job.reuse.jvm.num.tasks=50
    mapred.reduce.parallel.copies=3


    java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.nio.CharBuffer.wrap(CharBuffer.java:350)
    at java.nio.CharBuffer.wrap(CharBuffer.java:373)
    at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138)
    at java.lang.StringCoding.decode(StringCoding.java:173)
    at java.lang.String.(String.java:443)
    at java.lang.String.(String.java:515)
    at org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116)
    at cascading.tuple.TupleInputStream.readString(TupleInputStream.java:144)
    at cascading.tuple.TupleInputStream.readType(TupleInputStream.java:154)
    at cascading.tuple.TupleInputStream.getNextElement(TupleInputStream.java:101)
    at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:75)
    at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:33)
    at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74)
    at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34)
    at cascading.tuple.hadoop.DeserializerComparator.compareTuples(DeserializerComparator.java:142)
    at cascading.tuple.hadoop.GroupingSortingComparator.compare(GroupingSortingComparator.java:55)
    at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
    at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136)
    at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
    at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
    at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
    at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2645)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2586)

    --
    Bradford Stephens,
    Founder, Drawn to Scale
    drawntoscalehq.com
    727.697.7528

    http://www.drawntoscalehq.com -- The intuitive, cloud-scale data
    solution. Process, store, query, search, and serve all your data.

    http://www.roadtofailure.com -- The Fringes of Scalability, Social
    Media, and Computer Science

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedSep 26, '10 at 7:56a
activeSep 27, '10 at 6:25p
posts18
users7
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2023 Grokbase