Grokbase Groups Pig dev January 2011
FAQ
Maps are failing if combiner is enabled
---------------------------------------

Key: PIG-1803
URL: https://issues.apache.org/jira/browse/PIG-1803
Project: Pig
Issue Type: Bug
Reporter: Alex Rovner
Fix For: 0.7.0


We are constantly hitting the java heap space memory issue if the combiner is enabled on our jobs.

Configs:
pig.cachedbag.memusage=20
io.sort.mb=300
pig.exec.nocombiner=false
mapred.child.java.opts=-Xmx750m

Sample job:
{noformat}
A = LOAD '$INPUT' USING com.contextweb.pig.CWHeaderLoader('$WORK_DIR/schema/rpt.xml');
AA = foreach A GENERATE checkPointStart, PublisherId, TagId,
ContextCategoryId,Impressions, Clicks, Actions;

DESCRIBE AA;

B = GROUP AA BY (checkPointStart, PublisherId, TagId,
ContextCategoryId);

result = FOREACH B GENERATE group, SUM(AA.Impressions) as Impressions, SUM(AA.Clicks) as Clicks, SUM(AA.Actions) as Actions;

DESCRIBE result;

STORE result INTO '$OUTPUT' USING com.contextweb.pig.CWHeaderStore();
{noformat}

Mapper Error Log:
2011-01-12 18:43:22,084 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:549)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:631)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:315)
at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
at org.apache.hadoop.mapred.Child.main(Child.java:211)


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Thejas M Nair (JIRA) at Jan 13, 2011 at 12:28 am
    [ https://issues.apache.org/jira/browse/PIG-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981041#action_12981041 ]

    Thejas M Nair commented on PIG-1803:
    ------------------------------------

    Pig 0.8 has some fixes for memory leaks/management. Can you try the same query with 0.8 as well ?

    Maps are failing if combiner is enabled
    ---------------------------------------

    Key: PIG-1803
    URL: https://issues.apache.org/jira/browse/PIG-1803
    Project: Pig
    Issue Type: Bug
    Reporter: Alex Rovner
    Fix For: 0.7.0


    We are constantly hitting the java heap space memory issue if the combiner is enabled on our jobs.
    Configs:
    pig.cachedbag.memusage=20
    io.sort.mb=300
    pig.exec.nocombiner=false
    mapred.child.java.opts=-Xmx750m
    Sample job:
    {noformat}
    A = LOAD '$INPUT' USING com.contextweb.pig.CWHeaderLoader('$WORK_DIR/schema/rpt.xml');
    AA = foreach A GENERATE checkPointStart, PublisherId, TagId,
    ContextCategoryId,Impressions, Clicks, Actions;
    DESCRIBE AA;
    B = GROUP AA BY (checkPointStart, PublisherId, TagId,
    ContextCategoryId);
    result = FOREACH B GENERATE group, SUM(AA.Impressions) as Impressions, SUM(AA.Clicks) as Clicks, SUM(AA.Actions) as Actions;
    DESCRIBE result;
    STORE result INTO '$OUTPUT' USING com.contextweb.pig.CWHeaderStore();
    {noformat}
    Mapper Error Log:
    2011-01-12 18:43:22,084 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:799)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:549)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:631)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:315)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
    at org.apache.hadoop.mapred.Child.main(Child.java:211)
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Jan 13, 2011 at 12:30 am
    [ https://issues.apache.org/jira/browse/PIG-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981042#action_12981042 ]

    Olga Natkovich commented on PIG-1803:
    -------------------------------------

    There are two things:

    (1) Have you tried Pig 0.8 as we have made quite a bit of progress on memory utilization
    (2) There is a bug in Hadoop that causes memory overuse when combiner is used. I don't believe it has been addressed. Thejas, do you remember what JIRA number is for MR?
    Maps are failing if combiner is enabled
    ---------------------------------------

    Key: PIG-1803
    URL: https://issues.apache.org/jira/browse/PIG-1803
    Project: Pig
    Issue Type: Bug
    Reporter: Alex Rovner
    Fix For: 0.7.0


    We are constantly hitting the java heap space memory issue if the combiner is enabled on our jobs.
    Configs:
    pig.cachedbag.memusage=20
    io.sort.mb=300
    pig.exec.nocombiner=false
    mapred.child.java.opts=-Xmx750m
    Sample job:
    {noformat}
    A = LOAD '$INPUT' USING com.contextweb.pig.CWHeaderLoader('$WORK_DIR/schema/rpt.xml');
    AA = foreach A GENERATE checkPointStart, PublisherId, TagId,
    ContextCategoryId,Impressions, Clicks, Actions;
    DESCRIBE AA;
    B = GROUP AA BY (checkPointStart, PublisherId, TagId,
    ContextCategoryId);
    result = FOREACH B GENERATE group, SUM(AA.Impressions) as Impressions, SUM(AA.Clicks) as Clicks, SUM(AA.Actions) as Actions;
    DESCRIBE result;
    STORE result INTO '$OUTPUT' USING com.contextweb.pig.CWHeaderStore();
    {noformat}
    Mapper Error Log:
    2011-01-12 18:43:22,084 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:799)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:549)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:631)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:315)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
    at org.apache.hadoop.mapred.Child.main(Child.java:211)
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Thejas M Nair (JIRA) at Jan 13, 2011 at 12:52 am
    [ https://issues.apache.org/jira/browse/PIG-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981052#action_12981052 ]

    Thejas M Nair commented on PIG-1803:
    ------------------------------------

    bq. (2) There is a bug in Hadoop that causes memory overuse when combiner is used. I don't believe it has been addressed. Thejas, do you remember what JIRA number is for MR?
    HADOOP-5494 was causing out-of-memory errors in reduce, not in the map. And that happens when there are large records being combined, like in the case of a group followed by distinct in nested-foreach.


    Maps are failing if combiner is enabled
    ---------------------------------------

    Key: PIG-1803
    URL: https://issues.apache.org/jira/browse/PIG-1803
    Project: Pig
    Issue Type: Bug
    Reporter: Alex Rovner
    Fix For: 0.7.0


    We are constantly hitting the java heap space memory issue if the combiner is enabled on our jobs.
    Configs:
    pig.cachedbag.memusage=20
    io.sort.mb=300
    pig.exec.nocombiner=false
    mapred.child.java.opts=-Xmx750m
    Sample job:
    {noformat}
    A = LOAD '$INPUT' USING com.contextweb.pig.CWHeaderLoader('$WORK_DIR/schema/rpt.xml');
    AA = foreach A GENERATE checkPointStart, PublisherId, TagId,
    ContextCategoryId,Impressions, Clicks, Actions;
    DESCRIBE AA;
    B = GROUP AA BY (checkPointStart, PublisherId, TagId,
    ContextCategoryId);
    result = FOREACH B GENERATE group, SUM(AA.Impressions) as Impressions, SUM(AA.Clicks) as Clicks, SUM(AA.Actions) as Actions;
    DESCRIBE result;
    STORE result INTO '$OUTPUT' USING com.contextweb.pig.CWHeaderStore();
    {noformat}
    Mapper Error Log:
    2011-01-12 18:43:22,084 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:799)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:549)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:631)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:315)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
    at org.apache.hadoop.mapred.Child.main(Child.java:211)
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Feb 17, 2011 at 8:51 pm
    [ https://issues.apache.org/jira/browse/PIG-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12996062#comment-12996062 ]

    Olga Natkovich commented on PIG-1803:
    -------------------------------------

    Alex, did you get a chance to try whether your script works with latest code on Pig 0.8 branch or Pig 0.9? We will be releasing Pig 0.8.1 that would address the problem that Thejas fixed.

    If this does work for you, would you be able to move to 0.8? We do not have plans to backport the fix to Pig 0.7 but you could apply the patch and see if it works as is or with small tweaks.

    Please, let us know how you want to proceed and whether we can close this ticket, thanks
    Maps are failing if combiner is enabled
    ---------------------------------------

    Key: PIG-1803
    URL: https://issues.apache.org/jira/browse/PIG-1803
    Project: Pig
    Issue Type: Bug
    Reporter: Alex Rovner
    Fix For: 0.7.0


    We are constantly hitting the java heap space memory issue if the combiner is enabled on our jobs.
    Configs:
    pig.cachedbag.memusage=20
    io.sort.mb=300
    pig.exec.nocombiner=false
    mapred.child.java.opts=-Xmx750m
    Sample job:
    {noformat}
    A = LOAD '$INPUT' USING com.contextweb.pig.CWHeaderLoader('$WORK_DIR/schema/rpt.xml');
    AA = foreach A GENERATE checkPointStart, PublisherId, TagId,
    ContextCategoryId,Impressions, Clicks, Actions;
    DESCRIBE AA;
    B = GROUP AA BY (checkPointStart, PublisherId, TagId,
    ContextCategoryId);
    result = FOREACH B GENERATE group, SUM(AA.Impressions) as Impressions, SUM(AA.Clicks) as Clicks, SUM(AA.Actions) as Actions;
    DESCRIBE result;
    STORE result INTO '$OUTPUT' USING com.contextweb.pig.CWHeaderStore();
    {noformat}
    Mapper Error Log:
    2011-01-12 18:43:22,084 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:799)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:549)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:631)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:315)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
    at org.apache.hadoop.mapred.Child.main(Child.java:211)
    --
    This message is automatically generated by JIRA.
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Alex Rovner (JIRA) at Feb 17, 2011 at 9:53 pm
    [ https://issues.apache.org/jira/browse/PIG-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Alex Rovner resolved PIG-1803.
    ------------------------------

    Resolution: Won't Fix

    This issue is fixed in 0.8
    Maps are failing if combiner is enabled
    ---------------------------------------

    Key: PIG-1803
    URL: https://issues.apache.org/jira/browse/PIG-1803
    Project: Pig
    Issue Type: Bug
    Reporter: Alex Rovner
    Fix For: 0.7.0


    We are constantly hitting the java heap space memory issue if the combiner is enabled on our jobs.
    Configs:
    pig.cachedbag.memusage=20
    io.sort.mb=300
    pig.exec.nocombiner=false
    mapred.child.java.opts=-Xmx750m
    Sample job:
    {noformat}
    A = LOAD '$INPUT' USING com.contextweb.pig.CWHeaderLoader('$WORK_DIR/schema/rpt.xml');
    AA = foreach A GENERATE checkPointStart, PublisherId, TagId,
    ContextCategoryId,Impressions, Clicks, Actions;
    DESCRIBE AA;
    B = GROUP AA BY (checkPointStart, PublisherId, TagId,
    ContextCategoryId);
    result = FOREACH B GENERATE group, SUM(AA.Impressions) as Impressions, SUM(AA.Clicks) as Clicks, SUM(AA.Actions) as Actions;
    DESCRIBE result;
    STORE result INTO '$OUTPUT' USING com.contextweb.pig.CWHeaderStore();
    {noformat}
    Mapper Error Log:
    2011-01-12 18:43:22,084 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:799)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:549)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:631)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:315)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
    at org.apache.hadoop.mapred.Child.main(Child.java:211)
    --
    This message is automatically generated by JIRA.
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Alex Rovner (JIRA) at Feb 17, 2011 at 9:53 pm
    [ https://issues.apache.org/jira/browse/PIG-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12996110#comment-12996110 ]

    Alex Rovner commented on PIG-1803:
    ----------------------------------

    Tried on 0.8 works great!
    Maps are failing if combiner is enabled
    ---------------------------------------

    Key: PIG-1803
    URL: https://issues.apache.org/jira/browse/PIG-1803
    Project: Pig
    Issue Type: Bug
    Reporter: Alex Rovner
    Fix For: 0.7.0


    We are constantly hitting the java heap space memory issue if the combiner is enabled on our jobs.
    Configs:
    pig.cachedbag.memusage=20
    io.sort.mb=300
    pig.exec.nocombiner=false
    mapred.child.java.opts=-Xmx750m
    Sample job:
    {noformat}
    A = LOAD '$INPUT' USING com.contextweb.pig.CWHeaderLoader('$WORK_DIR/schema/rpt.xml');
    AA = foreach A GENERATE checkPointStart, PublisherId, TagId,
    ContextCategoryId,Impressions, Clicks, Actions;
    DESCRIBE AA;
    B = GROUP AA BY (checkPointStart, PublisherId, TagId,
    ContextCategoryId);
    result = FOREACH B GENERATE group, SUM(AA.Impressions) as Impressions, SUM(AA.Clicks) as Clicks, SUM(AA.Actions) as Actions;
    DESCRIBE result;
    STORE result INTO '$OUTPUT' USING com.contextweb.pig.CWHeaderStore();
    {noformat}
    Mapper Error Log:
    2011-01-12 18:43:22,084 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:799)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:549)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:631)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:315)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
    at org.apache.hadoop.mapred.Child.main(Child.java:211)
    --
    This message is automatically generated by JIRA.
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriespig, hadoop
postedJan 12, '11 at 11:58p
activeFeb 17, '11 at 9:53p
posts7
users1
websitepig.apache.org

1 user in discussion

Alex Rovner (JIRA): 7 posts

People

Translate

site design / logo © 2022 Grokbase