Grokbase Groups Pig dev August 2008
FAQ
Limit can not push in front of ForEach with flatten
---------------------------------------------------

Key: PIG-362
URL: https://issues.apache.org/jira/browse/PIG-362
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: types_branch
Reporter: Daniel Dai
Assignee: Daniel Dai
Fix For: types_branch


Currently logical optimizer will push Limit in front of ForEach with flatten. It is based on the assumption that ForEach with a flatten always increase the number of records. However, this is a false assumption. In the case that there is empty bags inside input tuple, the number of output records can be 0, thus less than input records.

We have no way to know whether there is an empty bag in the input at optimization time. So the only solution is not to push Limit in front of ForEach with flatten

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Daniel Dai (JIRA) at Aug 15, 2008 at 12:08 am
    [ https://issues.apache.org/jira/browse/PIG-362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Daniel Dai updated PIG-362:
    ---------------------------

    Attachment: PIG-362.patch
    Limit can not push in front of ForEach with flatten
    ---------------------------------------------------

    Key: PIG-362
    URL: https://issues.apache.org/jira/browse/PIG-362
    Project: Pig
    Issue Type: Bug
    Components: impl
    Affects Versions: types_branch
    Reporter: Daniel Dai
    Assignee: Daniel Dai
    Fix For: types_branch

    Attachments: PIG-362.patch


    Currently logical optimizer will push Limit in front of ForEach with flatten. It is based on the assumption that ForEach with a flatten always increase the number of records. However, this is a false assumption. In the case that there is empty bags inside input tuple, the number of output records can be 0, thus less than input records.
    We have no way to know whether there is an empty bag in the input at optimization time. So the only solution is not to push Limit in front of ForEach with flatten
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Aug 15, 2008 at 4:48 pm
    [ https://issues.apache.org/jira/browse/PIG-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12622918#action_12622918 ]

    Olga Natkovich commented on PIG-362:
    ------------------------------------

    Daniel, could you add a unit test for this to your patch. thanks
    Limit can not push in front of ForEach with flatten
    ---------------------------------------------------

    Key: PIG-362
    URL: https://issues.apache.org/jira/browse/PIG-362
    Project: Pig
    Issue Type: Bug
    Components: impl
    Affects Versions: types_branch
    Reporter: Daniel Dai
    Assignee: Daniel Dai
    Fix For: types_branch

    Attachments: PIG-362.patch


    Currently logical optimizer will push Limit in front of ForEach with flatten. It is based on the assumption that ForEach with a flatten always increase the number of records. However, this is a false assumption. In the case that there is empty bags inside input tuple, the number of output records can be 0, thus less than input records.
    We have no way to know whether there is an empty bag in the input at optimization time. So the only solution is not to push Limit in front of ForEach with flatten
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Daniel Dai (JIRA) at Aug 15, 2008 at 9:40 pm
    [ https://issues.apache.org/jira/browse/PIG-362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Daniel Dai updated PIG-362:
    ---------------------------

    Attachment: PIG-362-2.patch

    Include test case
    Limit can not push in front of ForEach with flatten
    ---------------------------------------------------

    Key: PIG-362
    URL: https://issues.apache.org/jira/browse/PIG-362
    Project: Pig
    Issue Type: Bug
    Components: impl
    Affects Versions: types_branch
    Reporter: Daniel Dai
    Assignee: Daniel Dai
    Fix For: types_branch

    Attachments: PIG-362-2.patch, PIG-362.patch


    Currently logical optimizer will push Limit in front of ForEach with flatten. It is based on the assumption that ForEach with a flatten always increase the number of records. However, this is a false assumption. In the case that there is empty bags inside input tuple, the number of output records can be 0, thus less than input records.
    We have no way to know whether there is an empty bag in the input at optimization time. So the only solution is not to push Limit in front of ForEach with flatten
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Aug 18, 2008 at 11:30 pm
    [ https://issues.apache.org/jira/browse/PIG-362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Olga Natkovich updated PIG-362:
    -------------------------------

    Status: Patch Available (was: Open)
    Limit can not push in front of ForEach with flatten
    ---------------------------------------------------

    Key: PIG-362
    URL: https://issues.apache.org/jira/browse/PIG-362
    Project: Pig
    Issue Type: Bug
    Components: impl
    Affects Versions: types_branch
    Reporter: Daniel Dai
    Assignee: Daniel Dai
    Fix For: types_branch

    Attachments: PIG-362-2.patch, PIG-362.patch


    Currently logical optimizer will push Limit in front of ForEach with flatten. It is based on the assumption that ForEach with a flatten always increase the number of records. However, this is a false assumption. In the case that there is empty bags inside input tuple, the number of output records can be 0, thus less than input records.
    We have no way to know whether there is an empty bag in the input at optimization time. So the only solution is not to push Limit in front of ForEach with flatten
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Aug 19, 2008 at 12:10 am
    [ https://issues.apache.org/jira/browse/PIG-362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Olga Natkovich updated PIG-362:
    -------------------------------

    Resolution: Fixed
    Status: Resolved (was: Patch Available)

    patch committed. thanks daniel for contributing!
    Limit can not push in front of ForEach with flatten
    ---------------------------------------------------

    Key: PIG-362
    URL: https://issues.apache.org/jira/browse/PIG-362
    Project: Pig
    Issue Type: Bug
    Components: impl
    Affects Versions: types_branch
    Reporter: Daniel Dai
    Assignee: Daniel Dai
    Fix For: types_branch

    Attachments: PIG-362-2.patch, PIG-362.patch


    Currently logical optimizer will push Limit in front of ForEach with flatten. It is based on the assumption that ForEach with a flatten always increase the number of records. However, this is a false assumption. In the case that there is empty bags inside input tuple, the number of output records can be 0, thus less than input records.
    We have no way to know whether there is an empty bag in the input at optimization time. So the only solution is not to push Limit in front of ForEach with flatten
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriespig, hadoop
postedAug 6, '08 at 10:55p
activeAug 19, '08 at 12:10a
posts6
users1
websitepig.apache.org

1 user in discussion

Olga Natkovich (JIRA): 6 posts

People

Translate

site design / logo © 2022 Grokbase