Grokbase Groups Pig dev April 2010
FAQ
Mapside cogroup runs out of memory
----------------------------------

Key: PIG-1395
URL: https://issues.apache.org/jira/browse/PIG-1395
Project: Pig
Issue Type: Improvement
Components: impl
Affects Versions: 0.8.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
Fix For: 0.8.0


In a particular scenario when there aren't lot of tuples with a same key in a relation (i.e. there aren't many repeating keys) map tasks doing cogroup fails with GC overhead exception.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Ashutosh Chauhan (JIRA) at Apr 26, 2010 at 10:43 pm
    [ https://issues.apache.org/jira/browse/PIG-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Ashutosh Chauhan updated PIG-1395:
    ----------------------------------

    Attachment: cogrp_mem.patch

    While doing cogroup, we first put tuples from all the relations in a heap, then we drain the heap and generate the output tuple as appropriate. We need to look ahead atleast one tuple from all the relations before generating an output tuple to be sure that we have all the tuples belonging to a key. Currently, we look too far ahead and tuples starts to accumulate faster in heap then we are draining. At a certain point, we had enough information to generate output tuple instead of waiting and putting another tuple in heap. This patch generate the output tuple at that point.
    Mapside cogroup runs out of memory
    ----------------------------------

    Key: PIG-1395
    URL: https://issues.apache.org/jira/browse/PIG-1395
    Project: Pig
    Issue Type: Improvement
    Components: impl
    Affects Versions: 0.8.0
    Reporter: Ashutosh Chauhan
    Assignee: Ashutosh Chauhan
    Fix For: 0.8.0

    Attachments: cogrp_mem.patch


    In a particular scenario when there aren't lot of tuples with a same key in a relation (i.e. there aren't many repeating keys) map tasks doing cogroup fails with GC overhead exception.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Ashutosh Chauhan (JIRA) at Apr 26, 2010 at 10:45 pm
    [ https://issues.apache.org/jira/browse/PIG-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Ashutosh Chauhan updated PIG-1395:
    ----------------------------------

    Status: Patch Available (was: Open)
    Mapside cogroup runs out of memory
    ----------------------------------

    Key: PIG-1395
    URL: https://issues.apache.org/jira/browse/PIG-1395
    Project: Pig
    Issue Type: Improvement
    Components: impl
    Affects Versions: 0.8.0
    Reporter: Ashutosh Chauhan
    Assignee: Ashutosh Chauhan
    Fix For: 0.8.0

    Attachments: cogrp_mem.patch


    In a particular scenario when there aren't lot of tuples with a same key in a relation (i.e. there aren't many repeating keys) map tasks doing cogroup fails with GC overhead exception.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Apr 27, 2010 at 4:07 am
    [ https://issues.apache.org/jira/browse/PIG-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861223#action_12861223 ]

    Hadoop QA commented on PIG-1395:
    --------------------------------

    +1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12442908/cogrp_mem.patch
    against trunk revision 937570.

    +1 @author. The patch does not contain any @author tags.

    +1 tests included. The patch appears to include 3 new or modified tests.

    +1 javadoc. The javadoc tool did not generate any warning messages.

    +1 javac. The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs. The patch does not introduce any new Findbugs warnings.

    +1 release audit. The applied patch does not increase the total number of release audit warnings.

    +1 core tests. The patch passed core unit tests.

    +1 contrib tests. The patch passed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/302/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/302/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/302/console

    This message is automatically generated.
    Mapside cogroup runs out of memory
    ----------------------------------

    Key: PIG-1395
    URL: https://issues.apache.org/jira/browse/PIG-1395
    Project: Pig
    Issue Type: Improvement
    Components: impl
    Affects Versions: 0.8.0
    Reporter: Ashutosh Chauhan
    Assignee: Ashutosh Chauhan
    Fix For: 0.8.0

    Attachments: cogrp_mem.patch


    In a particular scenario when there aren't lot of tuples with a same key in a relation (i.e. there aren't many repeating keys) map tasks doing cogroup fails with GC overhead exception.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Pradeep Kamath (JIRA) at Apr 27, 2010 at 11:07 pm
    [ https://issues.apache.org/jira/browse/PIG-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861611#action_12861611 ]

    Pradeep Kamath commented on PIG-1395:
    -------------------------------------

    +1, the comment can be updated to reflect the nature of the comparison in the code - currently the comment and code seem to be different. - otherwise the change looks good.
    Mapside cogroup runs out of memory
    ----------------------------------

    Key: PIG-1395
    URL: https://issues.apache.org/jira/browse/PIG-1395
    Project: Pig
    Issue Type: Improvement
    Components: impl
    Affects Versions: 0.8.0
    Reporter: Ashutosh Chauhan
    Assignee: Ashutosh Chauhan
    Fix For: 0.8.0

    Attachments: cogrp_mem.patch


    In a particular scenario when there aren't lot of tuples with a same key in a relation (i.e. there aren't many repeating keys) map tasks doing cogroup fails with GC overhead exception.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Ashutosh Chauhan (JIRA) at Apr 27, 2010 at 11:59 pm
    [ https://issues.apache.org/jira/browse/PIG-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Ashutosh Chauhan updated PIG-1395:
    ----------------------------------

    Status: Resolved (was: Patch Available)
    Resolution: Fixed

    Patch checked-in with updated comment.
    Mapside cogroup runs out of memory
    ----------------------------------

    Key: PIG-1395
    URL: https://issues.apache.org/jira/browse/PIG-1395
    Project: Pig
    Issue Type: Improvement
    Components: impl
    Affects Versions: 0.8.0
    Reporter: Ashutosh Chauhan
    Assignee: Ashutosh Chauhan
    Fix For: 0.8.0

    Attachments: cogrp_mem.patch


    In a particular scenario when there aren't lot of tuples with a same key in a relation (i.e. there aren't many repeating keys) map tasks doing cogroup fails with GC overhead exception.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriespig, hadoop
postedApr 26, '10 at 10:41p
activeApr 27, '10 at 11:59p
posts6
users1
websitepig.apache.org

1 user in discussion

Ashutosh Chauhan (JIRA): 6 posts

People

Translate

site design / logo © 2022 Grokbase