FAQ
I'm seeing this error in my tasktracker's log.

FATAL org.apache.hadoop.mapred.TaskTracker:
Task: attempt_201007160344_0001_m_000005_1
- Killed : GC overhead limit exceed

more detail from my task's log states:

FATAL org.apache.hadoop.mapred.TaskTracker:
Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.regex.Pattern$BitClass.(Pattern.java:2190)
at java.util.regex.Pattern.sequence(Pattern.java:1818)
at java.util.regex.Pattern.expr(Pattern.java:1752)
at java.util.regex.Pattern.compile(Pattern.java:1460)
at java.util.regex.Pattern.(Pattern.java:823)
at java.lang.String.replaceAll(String.java:2189)
at com.synopsys.hadoop.RecordParser.parse(MyRecordParser.java:71)
at com.synopsys.hadoop.ComputeSeqMapper.map(MyMapper.java:106)
at com.synopsys.hadoop.ComputeSeqMapper.map(MyMapper.java:35)

Any ideas where to look further? I don't see anything wrong codewise?

line 106 MyMapper.java is from my map() and calls:

parser.parse(line.toString());

which calls line 71 from MyParser.java which is basically this:

public class RecordParser {
.....
int curFieldCount = 0;
String[] values = {};
.....
public void parse(String record) {
values = record.split ("\t");
// strip preceeding/trailing spaces
for (int i=0; i<values.length; i++) {
values[i] = values[i].replaceAll("^[\\s]*", "");
values[i] = values[i].replaceAll("[\\s]*$", "");
}
curFieldCount = values.length;
}
}

Alan

Search Discussions

  • Some Body at Jul 16, 2010 at 12:44 pm
    I tried again and connected to my task tracker via JMX but I still dont see what's wrong.
    Here's the log, it was spilling records then ran out of memory?

    2010-07-16 05:27:04,295 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true
    2010-07-16 05:27:04,295 INFO org.apache.hadoop.mapred.MapTask: bufstart = 19279; bufend = 159403424; bufvoid = 199229440
    2010-07-16 05:27:04,295 INFO org.apache.hadoop.mapred.MapTask: kvstart = 205295; kvend = 342047; length = 655360
    2010-07-16 05:27:14,118 INFO org.apache.hadoop.mapred.MapTask: Finished spill 35
    2010-07-16 05:28:13,294 FATAL org.apache.hadoop.mapred.TaskTracker:
    Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.regex.Pattern.compile(Pattern.java:823)
    at java.lang.String.split(String.java:2292)
    at java.lang.String.split(String.java:2334)
    at com.synopsys.hadoop.ComputeSeqMapper.map(MyMapper.java:143)
    at com.synopsys.hadoop.ComputeSeqMapper.map(MyMapper.java:35)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    2010-07-16 05:28:16,473 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=CLEANUP, sessionId=
    2010-07-16 05:28:16,588 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the task

    See the attached graph of the tasks memory usage via jconsole.
    Note: ignore the 9 hour offset from the log and the image, I'm in a different time zone than the cluster.

    Alan
  • Some Body at Jul 16, 2010 at 1:21 pm
    Guess attachments are stripped.

    Here's the memory graph: http://tinyurl.com/37g3hmu
    Here's the VM Summary: http://tinyurl.com/36wqzjq

    Alan
  • Ted Yu at Jul 17, 2010 at 5:30 am
    Have you tried increasing memory beyond 1GB for your map task ?

    I think you have noticed that both OOME came from Pattern.compile().

    Please take a look at
    http://www.docjar.com/html/api/java/lang/String.java.html

    I would suggest pre-compiling the three patterns when setting up your mapper
    - basically write your own split() and replaceAll().

    I recently did something similar. You can find out the performance
    improvement by customization -
    https://issues.apache.org/jira/browse/MAPREDUCE-1946

    Cheers
    On Fri, Jul 16, 2010 at 6:06 AM, Some Body wrote:

    Guess attachments are stripped.

    Here's the memory graph: http://tinyurl.com/37g3hmu
    Here's the VM Summary: http://tinyurl.com/36wqzjq

    Alan
  • Alan Miller at Jul 18, 2010 at 11:14 am
    Thanks Ted,

    One or both suggestions remedied the problem. I'm not seeing that error
    anymore.

    In my Driver class I used config.set("mapred.child.java.opts",
    "-Xmx2048m -Xincgc");
    But I also altered my mapred-site.xml and set:
    io.file.buffer.size 65536
    io.sort.factor 32
    io.sort.mb 320

    For the 2nd suggestion. I'm a java novice, so I'm not sure if this
    actually does what you intended:

    I moved the 3 Patterns outside my map() and changed the logic to this:

    public class MyMapper extends Mapper<Object, Text, Text, Text) {

    Pattern tabPattern = Pattern.compile("\t");
    Pattern eolPattern = Pattern.compile("\n");
    Pattern spacePattern = Pattern.compile("(^[\\s]*)|([\\s]$)");

    public void map(Object key, Text value, Context context) {
    for (String line : eolPattern.split(value.toString()) {
    ....
    String[] values = tabPattern.split(line);
    for (int i=0; i,values.length; i++) {
    values[i] = spacePattern.matcher(values[i]).replaceAll("");
    }
    parser.setvals(values);
    ....
    }
    }
    }

    Alan
    On 07/17/2010 07:28 AM, Ted Yu wrote:
    Have you tried increasing memory beyond 1GB for your map task ?

    I think you have noticed that both OOME came from Pattern.compile().

    Please take a look at
    http://www.docjar.com/html/api/java/lang/String.java.html

    I would suggest pre-compiling the three patterns when setting up your mapper
    - basically write your own split() and replaceAll().

    I recently did something similar. You can find out the performance
    improvement by customization -
    https://issues.apache.org/jira/browse/MAPREDUCE-1946

    Cheers

    On Fri, Jul 16, 2010 at 6:06 AM, Some Bodywrote:

    Guess attachments are stripped.

    Here's the memory graph: http://tinyurl.com/37g3hmu
    Here's the VM Summary: http://tinyurl.com/36wqzjq

    Alan
  • Ted Yu at Jul 18, 2010 at 7:07 pm
    That's what I suggested.
    You can actually declare tabPattern, etc static variables of your Mapper
    class.
    You can also lower -Xmx to give other processes on the same node more
    memory.

    Cheers
    On Sun, Jul 18, 2010 at 4:11 AM, Alan Miller wrote:

    Thanks Ted,

    One or both suggestions remedied the problem. I'm not seeing that error
    anymore.

    In my Driver class I used config.set("mapred.child.java.opts", "-Xmx2048m
    -Xincgc");
    But I also altered my mapred-site.xml and set:
    io.file.buffer.size 65536
    io.sort.factor 32
    io.sort.mb 320

    For the 2nd suggestion. I'm a java novice, so I'm not sure if this actually
    does what you intended:

    I moved the 3 Patterns outside my map() and changed the logic to this:

    public class MyMapper extends Mapper<Object, Text, Text, Text) {

    Pattern tabPattern = Pattern.compile("\t");
    Pattern eolPattern = Pattern.compile("\n");
    Pattern spacePattern = Pattern.compile("(^[\\s]*)|([\\s]$)");

    public void map(Object key, Text value, Context context) {
    for (String line : eolPattern.split(value.toString()) {
    ....
    String[] values = tabPattern.split(line);

    for (int i=0; i,values.length; i++) {
    values[i] = spacePattern.matcher(values[i]).replaceAll("");
    }
    parser.setvals(values);
    ....
    }
    }
    }

    Alan

    On 07/17/2010 07:28 AM, Ted Yu wrote:

    Have you tried increasing memory beyond 1GB for your map task ?

    I think you have noticed that both OOME came from Pattern.compile().

    Please take a look at
    http://www.docjar.com/html/api/java/lang/String.java.html

    I would suggest pre-compiling the three patterns when setting up your
    mapper
    - basically write your own split() and replaceAll().

    I recently did something similar. You can find out the performance
    improvement by customization -
    https://issues.apache.org/jira/browse/MAPREDUCE-1946

    Cheers

    On Fri, Jul 16, 2010 at 6:06 AM, Some Body<somebody@squareplanet.de>
    wrote:


    Guess attachments are stripped.

    Here's the memory graph: http://tinyurl.com/37g3hmu
    Here's the VM Summary: http://tinyurl.com/36wqzjq

    Alan

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJul 16, '10 at 12:12p
activeJul 18, '10 at 7:07p
posts6
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Alan Miller: 4 posts Ted Yu: 2 posts

People

Translate

site design / logo © 2022 Grokbase