Grokbase Groups Pig user October 2010
FAQ
What's the best way to do something like this in PIG:

JOIN A with B where (A.property1 = B.property1 OR A.property2 = B.property2) ?

Thanks,
-Rakesh

Search Discussions

  • Dmitriy Ryaboy at Oct 18, 2010 at 3:19 am
    Two joins, followed by a full outer join of the results, and a selection
    pass?
    It's not pretty, but it'll work...
    On Sun, Oct 17, 2010 at 5:03 PM, rakesh kothari wrote:


    What's the best way to do something like this in PIG:

    JOIN A with B where (A.property1 = B.property1 OR A.property2 =
    B.property2) ?

    Thanks,
    -Rakesh

  • Thejas M Nair at Oct 27, 2010 at 6:07 pm
    I don't understand the solution proposed by Dmitriy using 3 joins. But it can be done using two joins and a union, as follows -

    J1 = join A by prop1, B by prop1;
    J2 = join A by prop2, B by prop2;

    -- this filters prevents joined rows where both prop1, prop2 match from being counted twice
    J2_fil = filter J2 by A::prop1 != A::prop2;
    JoinP1OrP2 = union J1, J2;


    -Thejas


    On 10/17/10 8:18 PM, "Dmitriy Ryaboy" wrote:

    Two joins, followed by a full outer join of the results, and a selection
    pass?
    It's not pretty, but it'll work...
    On Sun, Oct 17, 2010 at 5:03 PM, rakesh kothari wrote:


    What's the best way to do something like this in PIG:

    JOIN A with B where (A.property1 = B.property1 OR A.property2 =
    B.property2) ?

    Thanks,
    -Rakesh

  • Rakesh kothari at Oct 27, 2010 at 9:20 pm
    Yes. I did that as well.

    Thanks,
    -Rakesh

    From: tejas@yahoo-inc.com
    To: user@pig.apache.org; dvryaboy@gmail.com; rkothari_iit@hotmail.com
    CC: pig-user@hadoop.apache.org
    Date: Wed, 27 Oct 2010 10:43:27 -0700
    Subject: Re: Joins with OR condition





    Message body


    I don’t understand the solution proposed by Dmitriy using 3 joins. But it can be done using two joins and a union, as follows -



    J1 = join A by prop1, B by prop1;

    J2 = join A by prop2, B by prop2;



    -- this filters prevents joined rows where both prop1, prop2 match from being counted twice

    J2_fil = filter J2 by A::prop1 != A::prop2;

    JoinP1OrP2 = union J1, J2;





    -Thejas





    On 10/17/10 8:18 PM, "Dmitriy Ryaboy" wrote:



    Two joins, followed by a full outer join of the results, and a selection

    pass?

    It's not pretty, but it'll work...



    On Sun, Oct 17, 2010 at 5:03 PM, rakesh kothari wrote:



    >
    What's the best way to do something like this in PIG: >
    JOIN A with B where (A.property1 = B.property1 OR A.property2 =
    B.property2) ? >
    Thanks,
    -Rakesh
    >

    >

    >

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedOct 18, '10 at 12:03a
activeOct 27, '10 at 9:20p
posts4
users3
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase