Grokbase Groups Pig user June 2011
FAQ
I have a pig script that is working well for small test data sets but fails on a run over realistic-sized data. Logs show
INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201106061024_0331 has failed!

job_201106061024_0331 CitedItemsGrpByDocId,DedupTCPerDocId GROUP_BY,COMBINER Message: Job failed!

attempt_201106061024_0331_m_000198_0 […] Error: java.lang.OutOfMemoryError: Java heap space
and similar same for all attempts at a few of the other (many) map tasks for this job.

I believe this job corresponds to these lines in my pig script:

CitedItemsGrpByDocId = group CitedItems by citeddocid;
DedupTCPerDocId =
foreach CitedItemsGrpByDocId {
CitingDocids = CitedItems.citingdocid;
UniqCitingDocids = distinct CitingDocids;
generate group, COUNT(UniqCitingDocids) as tc;
};

I tried increasing mapred.child.java.opts but the job failed in a setup stage with
Error occurred during initialization of VM
Could not reserve enough space for object heap

Are there job configurations/parameters for Hadoop or pig I can set to get around this? Is there a Pig Latin circumlocution, or better way to express what I want, that is not as memory-hungry?

Thank in advance,

Will

William F Dowling
Sr Technical Specialist, Software Engineering

Search Discussions

  • Thejas M Nair at Jun 10, 2011 at 6:51 pm
    I have seen this happen when there are very large number of distinct values
    for a set of group keys. When combiner gets used, input records for reduce
    task already has partial distinct bags, and this can result in large records
    which cause MR to run out of memory trying to load the records.

    You can modify the query the way its mentioned in comemnt#1 in -
    https://issues.apache.org/jira/browse/PIG-1846

    Or you can adding following to your script to disable combiner -

    set pig.exec.nocombiner true;

    Thanks,
    Thejas




    On 6/10/11 11:15 AM, "william.dowling@thomsonreuters.com"
    wrote:
    I have a pig script that is working well for small test data sets but fails on
    a run over realistic-sized data. Logs show
    INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - job job_201106061024_0331 has failed!
    S
    job_201106061024_0331 CitedItemsGrpByDocId,DedupTCPerDocId
    GROUP_BY,COMBINER Message: Job failed!
    S
    attempt_201106061024_0331_m_000198_0 [S] Error:
    java.lang.OutOfMemoryError: Java heap space
    and similar same for all attempts at a few of the other (many) map tasks for
    this job.

    I believe this job corresponds to these lines in my pig script:

    CitedItemsGrpByDocId = group CitedItems by citeddocid;
    DedupTCPerDocId =
    foreach CitedItemsGrpByDocId {
    CitingDocids = CitedItems.citingdocid;
    UniqCitingDocids = distinct CitingDocids;
    generate group, COUNT(UniqCitingDocids) as tc;
    };

    I tried increasing mapred.child.java.opts but the job failed in a setup stage
    with
    Error occurred during initialization of VM
    Could not reserve enough space for object heap

    Are there job configurations/parameters for Hadoop or pig I can set to get
    around this? Is there a Pig Latin circumlocution, or better way to express
    what I want, that is not as memory-hungry?

    Thank in advance,

    Will

    William F Dowling
    Sr Technical Specialist, Software Engineering


    --
  • William Dowling at Jun 10, 2011 at 7:57 pm
    Thank you Thejas! Turning off the combiner let the job go to completion. Next I can try the two-level approach to see what the performance penalty was. Kind regards,
    Will

    William F Dowling
    Sr Technical Specialist, Software Engineering
    Thomson Reuters



    -----Original Message-----
    From: Thejas M Nair
    Sent: Friday, June 10, 2011 2:50 PM
    To: user@pig.apache.org; Dowling, William (Professional)
    Subject: Re: workaround for java.lang.OutOfMemoryError: Java heap space?

    I have seen this happen when there are very large number of distinct values
    for a set of group keys. When combiner gets used, input records for reduce
    task already has partial distinct bags, and this can result in large records
    which cause MR to run out of memory trying to load the records.

    You can modify the query the way its mentioned in comemnt#1 in -
    https://issues.apache.org/jira/browse/PIG-1846

    Or you can adding following to your script to disable combiner -

    set pig.exec.nocombiner true;

    Thanks,
    Thejas




    On 6/10/11 11:15 AM, "william.dowling@thomsonreuters.com"
    wrote:
    I have a pig script that is working well for small test data sets but fails on
    a run over realistic-sized data. Logs show
    INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - job job_201106061024_0331 has failed!
    S
    job_201106061024_0331 CitedItemsGrpByDocId,DedupTCPerDocId
    GROUP_BY,COMBINER Message: Job failed!
    S
    attempt_201106061024_0331_m_000198_0 [S] Error:
    java.lang.OutOfMemoryError: Java heap space
    and similar same for all attempts at a few of the other (many) map tasks for
    this job.

    I believe this job corresponds to these lines in my pig script:

    CitedItemsGrpByDocId = group CitedItems by citeddocid;
    DedupTCPerDocId =
    foreach CitedItemsGrpByDocId {
    CitingDocids = CitedItems.citingdocid;
    UniqCitingDocids = distinct CitingDocids;
    generate group, COUNT(UniqCitingDocids) as tc;
    };

    I tried increasing mapred.child.java.opts but the job failed in a setup stage
    with
    Error occurred during initialization of VM
    Could not reserve enough space for object heap

    Are there job configurations/parameters for Hadoop or pig I can set to get
    around this? Is there a Pig Latin circumlocution, or better way to express
    what I want, that is not as memory-hungry?

    Thank in advance,

    Will

    William F Dowling
    Sr Technical Specialist, Software Engineering


    --

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedJun 10, '11 at 6:16p
activeJun 10, '11 at 7:57p
posts3
users2
websitepig.apache.org

2 users in discussion

William Dowling: 2 posts Thejas M Nair: 1 post

People

Translate

site design / logo © 2022 Grokbase