Grokbase Groups Pig dev June 2009
FAQ
PERFORMANCE: streaming data to the UDFs in foreach
--------------------------------------------------

Key: PIG-844
URL: https://issues.apache.org/jira/browse/PIG-844
Project: Pig
Issue Type: Improvement
Reporter: Olga Natkovich


Currently, Pig places the data passed to UDFs into a bag. This can cause the process to use more memory than actually needed as in many cases it would be better to push the data one tuple at a time to the UDFs.

For the case where combiner is invoked, this might not be that important; however, for non-algebraic UDFs as well as other cases where combiner can't be used, this can provide significant memory improvement.

Another possible use case is where the data is already grouped going into pig and we don't need to group it again.

How this will effect UDF interface needs to be further discussed.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Olga Natkovich (JIRA) at Nov 23, 2009 at 5:27 pm
    [ https://issues.apache.org/jira/browse/PIG-844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Olga Natkovich resolved PIG-844.
    --------------------------------


    accumulate interface took care of this.
    PERFORMANCE: streaming data to the UDFs in foreach
    --------------------------------------------------

    Key: PIG-844
    URL: https://issues.apache.org/jira/browse/PIG-844
    Project: Pig
    Issue Type: Improvement
    Reporter: Olga Natkovich

    Currently, Pig places the data passed to UDFs into a bag. This can cause the process to use more memory than actually needed as in many cases it would be better to push the data one tuple at a time to the UDFs.
    For the case where combiner is invoked, this might not be that important; however, for non-algebraic UDFs as well as other cases where combiner can't be used, this can provide significant memory improvement.
    Another possible use case is where the data is already grouped going into pig and we don't need to group it again.
    How this will effect UDF interface needs to be further discussed.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Daniel Dai (JIRA) at May 14, 2010 at 6:47 am
    [ https://issues.apache.org/jira/browse/PIG-844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Daniel Dai closed PIG-844.
    --------------------------

    PERFORMANCE: streaming data to the UDFs in foreach
    --------------------------------------------------

    Key: PIG-844
    URL: https://issues.apache.org/jira/browse/PIG-844
    Project: Pig
    Issue Type: Improvement
    Reporter: Olga Natkovich
    Fix For: 0.7.0


    Currently, Pig places the data passed to UDFs into a bag. This can cause the process to use more memory than actually needed as in many cases it would be better to push the data one tuple at a time to the UDFs.
    For the case where combiner is invoked, this might not be that important; however, for non-algebraic UDFs as well as other cases where combiner can't be used, this can provide significant memory improvement.
    Another possible use case is where the data is already grouped going into pig and we don't need to group it again.
    How this will effect UDF interface needs to be further discussed.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriespig, hadoop
postedJun 11, '09 at 11:53p
activeMay 14, '10 at 6:47a
posts3
users1
websitepig.apache.org

1 user in discussion

Daniel Dai (JIRA): 3 posts

People

Translate

site design / logo © 2022 Grokbase