FAQ
Self join wth implicit split has the join output in wrong order
---------------------------------------------------------------

Key: PIG-429
URL: https://issues.apache.org/jira/browse/PIG-429
Project: Pig
Issue Type: Bug
Affects Versions: types_branch
Reporter: Pradeep Kamath
Fix For: types_branch


Query:
{code}
A = load 'st10k' split by 'file';
B = filter A by $1 > 25;
D = join A by $0, B by $0;
dump D;
{code}

In the output the columns from B are projected out first and from A next. On closer examination of the code, the ImplicitSplitInserter class adds in the split and two splitoutput operators into the plan and tries the connect the successors of LOad to these. However it does this by iterating over its successors and disconnecting from them and connecting up the split-splitoutput to the successors. However the order in which it gets its successors is NOT the same as the order in which cogroup (join) expects its inputs. Hence the discrepancy.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Pradeep Kamath (JIRA) at Sep 13, 2008 at 5:15 am
    [ https://issues.apache.org/jira/browse/PIG-429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Pradeep Kamath updated PIG-429:
    -------------------------------

    Patch Info: [Patch Available]
    Self join wth implicit split has the join output in wrong order
    ---------------------------------------------------------------

    Key: PIG-429
    URL: https://issues.apache.org/jira/browse/PIG-429
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Pradeep Kamath
    Fix For: types_branch

    Attachments: PIG-429.patch


    Query:
    {code}
    A = load 'st10k' split by 'file';
    B = filter A by $1 > 25;
    D = join A by $0, B by $0;
    dump D;
    {code}
    In the output the columns from B are projected out first and from A next. On closer examination of the code, the ImplicitSplitInserter class adds in the split and two splitoutput operators into the plan and tries the connect the successors of LOad to these. However it does this by iterating over its successors and disconnecting from them and connecting up the split-splitoutput to the successors. However the order in which it gets its successors is NOT the same as the order in which cogroup (join) expects its inputs. Hence the discrepancy.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Pradeep Kamath (JIRA) at Sep 13, 2008 at 5:15 am
    [ https://issues.apache.org/jira/browse/PIG-429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Pradeep Kamath updated PIG-429:
    -------------------------------

    Attachment: PIG-429.patch

    patch for issue - passes all unit tests
    Self join wth implicit split has the join output in wrong order
    ---------------------------------------------------------------

    Key: PIG-429
    URL: https://issues.apache.org/jira/browse/PIG-429
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Pradeep Kamath
    Fix For: types_branch

    Attachments: PIG-429.patch


    Query:
    {code}
    A = load 'st10k' split by 'file';
    B = filter A by $1 > 25;
    D = join A by $0, B by $0;
    dump D;
    {code}
    In the output the columns from B are projected out first and from A next. On closer examination of the code, the ImplicitSplitInserter class adds in the split and two splitoutput operators into the plan and tries the connect the successors of LOad to these. However it does this by iterating over its successors and disconnecting from them and connecting up the split-splitoutput to the successors. However the order in which it gets its successors is NOT the same as the order in which cogroup (join) expects its inputs. Hence the discrepancy.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Sep 13, 2008 at 4:53 pm
    [ https://issues.apache.org/jira/browse/PIG-429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Olga Natkovich resolved PIG-429.
    --------------------------------

    Resolution: Fixed

    patch committed. thanks pradeep for contributing!
    Self join wth implicit split has the join output in wrong order
    ---------------------------------------------------------------

    Key: PIG-429
    URL: https://issues.apache.org/jira/browse/PIG-429
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Pradeep Kamath
    Fix For: types_branch

    Attachments: PIG-429.patch


    Query:
    {code}
    A = load 'st10k' split by 'file';
    B = filter A by $1 > 25;
    D = join A by $0, B by $0;
    dump D;
    {code}
    In the output the columns from B are projected out first and from A next. On closer examination of the code, the ImplicitSplitInserter class adds in the split and two splitoutput operators into the plan and tries the connect the successors of LOad to these. However it does this by iterating over its successors and disconnecting from them and connecting up the split-splitoutput to the successors. However the order in which it gets its successors is NOT the same as the order in which cogroup (join) expects its inputs. Hence the discrepancy.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriespig, hadoop
postedSep 12, '08 at 11:00p
activeSep 13, '08 at 4:53p
posts4
users1
websitepig.apache.org

1 user in discussion

Olga Natkovich (JIRA): 4 posts

People

Translate

site design / logo © 2022 Grokbase