FAQ
I am running a hadoop job written in PIG. It fails from out of memory because a UDF function consumes a lot of memory, it loads a big file. What are the settings to avoid the following OutOfMemoryError? I guess by simply giving PIG big memory (java -XmxBIGmemory org.apache.pig.Main ...) won't work.

Error message --->

java.lang.OutOfMemoryError: Java heap space
at java.util.regex.Pattern.compile(Pattern.java:1451)
at java.util.regex.Pattern.(Pattern.java:1133)
at java.util.regex.Pattern.compile(Pattern.java:823)
at java.lang.String.split(String.java:2293)
at java.lang.String.split(String.java:2335)
at UDF.load(Unknown Source)
at UDF.load(Unknown Source)
at UDF.exec(Unknown Source)
at UDF.exec(Unknown Source)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:201)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:287)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:278)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child.main(Child.java:155)

Thanks!
Michael

Search Discussions

  • Jeff Zhang at Feb 23, 2010 at 2:13 am
    Hi Jiang,

    you should set property *mapred.child.java.opts* in mapred-site.xml to
    increase the memeory
    as following:

    <property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx1024m</value>
    </property>

    and then restart your hadoop cluster



    On Tue, Feb 23, 2010 at 9:43 AM, jiang licht wrote:

    I am running a hadoop job written in PIG. It fails from out of memory
    because a UDF function consumes a lot of memory, it loads a big file. What
    are the settings to avoid the following OutOfMemoryError? I guess by simply
    giving PIG big memory (java -XmxBIGmemory org.apache.pig.Main ...) won't
    work.

    Error message --->

    java.lang.OutOfMemoryError: Java heap space
    at java.util.regex.Pattern.compile(Pattern.java:1451)
    at java.util.regex.Pattern.(Pattern.java:1133)
    at java.util.regex.Pattern.compile(Pattern.java:823)
    at java.lang.String.split(String.java:2293)
    at java.lang.String.split(String.java:2335)
    at UDF.load(Unknown Source)
    at UDF.load(Unknown Source)
    at UDF.exec(Unknown Source)
    at UDF.exec(Unknown Source)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:201)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:287)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:278)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
    at org.apache.hadoop.mapred.Child.main(Child.java:155)

    Thanks!
    Michael





    --
    Best Regards

    Jeff Zhang
  • Jiang licht at Feb 23, 2010 at 2:37 am
    Thanks Jeff. I also just found this one and solved my problem. BTW, so many settings to play with :)


    Michael

    --- On Mon, 2/22/10, Jeff Zhang wrote:

    From: Jeff Zhang <zjffdu@gmail.com>
    Subject: Re: OutOfMemoryError of PIG job (UDF loads big file)
    To: common-user@hadoop.apache.org
    Date: Monday, February 22, 2010, 8:13 PM

    Hi Jiang,

    you should set property *mapred.child.java.opts* in mapred-site.xml to
    increase the memeory
    as following:

    <property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx1024m</value>
    </property>

    and then restart your hadoop cluster



    On Tue, Feb 23, 2010 at 9:43 AM, jiang licht wrote:

    I am running a hadoop job written in PIG. It fails from out of memory
    because a UDF function consumes a lot of memory, it loads a big file. What
    are the settings to avoid the following OutOfMemoryError? I guess by simply
    giving PIG big memory (java -XmxBIGmemory org.apache.pig.Main ...) won't
    work.

    Error message --->

    java.lang.OutOfMemoryError: Java heap space
    at java.util.regex.Pattern.compile(Pattern.java:1451)
    at java.util.regex.Pattern.(Pattern.java:1133)
    at java.util.regex.Pattern.compile(Pattern.java:823)
    at java.lang.String.split(String.java:2293)
    at java.lang.String.split(String.java:2335)
    at UDF.load(Unknown Source)
    at UDF.load(Unknown Source)
    at UDF.exec(Unknown Source)
    at UDF.exec(Unknown Source)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:201)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:287)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:278)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
    at org.apache.hadoop.mapred.Child.main(Child.java:155)

    Thanks!
    Michael





    --
    Best Regards

    Jeff Zhang
  • Ankur C. Goel at Feb 23, 2010 at 8:13 am
    Yeah! Wait till you stumble across the need to adjust shuffle/reduce buffers, reuse JVMs, sort factor, copier threads .........
    :-)

    On 2/23/10 8:06 AM, "jiang licht" wrote:

    Thanks Jeff. I also just found this one and solved my problem. BTW, so many settings to play with :)


    Michael

    --- On Mon, 2/22/10, Jeff Zhang wrote:

    From: Jeff Zhang <zjffdu@gmail.com>
    Subject: Re: OutOfMemoryError of PIG job (UDF loads big file)
    To: common-user@hadoop.apache.org
    Date: Monday, February 22, 2010, 8:13 PM

    Hi Jiang,

    you should set property *mapred.child.java.opts* in mapred-site.xml to
    increase the memeory
    as following:

    <property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx1024m</value>
    </property>

    and then restart your hadoop cluster



    On Tue, Feb 23, 2010 at 9:43 AM, jiang licht wrote:

    I am running a hadoop job written in PIG. It fails from out of memory
    because a UDF function consumes a lot of memory, it loads a big file. What
    are the settings to avoid the following OutOfMemoryError? I guess by simply
    giving PIG big memory (java -XmxBIGmemory org.apache.pig.Main ...) won't
    work.

    Error message --->

    java.lang.OutOfMemoryError: Java heap space
    at java.util.regex.Pattern.compile(Pattern.java:1451)
    at java.util.regex.Pattern.(Pattern.java:1133)
    at java.util.regex.Pattern.compile(Pattern.java:823)
    at java.lang.String.split(String.java:2293)
    at java.lang.String.split(String.java:2335)
    at UDF.load(Unknown Source)
    at UDF.load(Unknown Source)
    at UDF.exec(Unknown Source)
    at UDF.exec(Unknown Source)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:201)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:287)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:278)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
    at org.apache.hadoop.mapred.Child.main(Child.java:155)

    Thanks!
    Michael





    --
    Best Regards

    Jeff Zhang
  • Jiang licht at Feb 23, 2010 at 5:31 pm
    Hm, Im wondering if there are some case studies regarding how ppl handle memory related issues posted somewhere as good references?

    Thanks,

    Michael

    --- On Tue, 2/23/10, Ankur C. Goel wrote:

    From: Ankur C. Goel <gankur@yahoo-inc.com>
    Subject: Re: OutOfMemoryError of PIG job (UDF loads big file)
    To: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org>
    Date: Tuesday, February 23, 2010, 2:11 AM

    Yeah! Wait till you stumble across the need to adjust shuffle/reduce buffers, reuse JVMs, sort factor, copier threads .........
    :-)

    On 2/23/10 8:06 AM, "jiang licht" wrote:

    Thanks Jeff. I also just found this one and solved my problem. BTW, so many settings to play with :)


    Michael

    --- On Mon, 2/22/10, Jeff Zhang wrote:

    From: Jeff Zhang <zjffdu@gmail.com>
    Subject: Re: OutOfMemoryError of PIG job (UDF loads big file)
    To: common-user@hadoop.apache.org
    Date: Monday, February 22, 2010, 8:13 PM

    Hi Jiang,

    you should set property *mapred.child.java.opts* in mapred-site.xml to
    increase the memeory
    as following:

    <property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx1024m</value>
    </property>

    and then restart your hadoop cluster



    On Tue, Feb 23, 2010 at 9:43 AM, jiang licht wrote:

    I am running a hadoop job written in PIG. It fails from out of memory
    because a UDF function consumes a lot of memory, it loads a big file. What
    are the settings to avoid the following OutOfMemoryError? I guess by simply
    giving PIG big memory (java -XmxBIGmemory org.apache.pig.Main ...) won't
    work.

    Error message --->

    java.lang.OutOfMemoryError: Java heap space
    at java.util.regex.Pattern.compile(Pattern.java:1451)
    at java.util.regex.Pattern.(Pattern.java:1133)
    at java.util.regex.Pattern.compile(Pattern.java:823)
    at java.lang.String.split(String.java:2293)
    at java.lang.String.split(String.java:2335)
    at UDF.load(Unknown Source)
    at UDF.load(Unknown Source)
    at UDF.exec(Unknown Source)
    at UDF.exec(Unknown Source)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:201)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:287)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:278)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
    at org.apache.hadoop.mapred.Child.main(Child.java:155)

    Thanks!
    Michael





    --
    Best Regards

    Jeff Zhang

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedFeb 23, '10 at 1:44a
activeFeb 23, '10 at 5:31p
posts5
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase