Grokbase Groups Pig user May 2011
FAQ
This is more of a "how can I do this" question.
Imagine you have an sql query like the following:
select a.f1, a.f2, a.f3, b.f1, b.f2, c.f1, c.f2, d.f1, d.f2
from tableA a, tableB b, tableC c, tableD d
where
a.f1 = b.f1
and b.f2 = d.f1
and c.f2 = b.f1
/* etc.... */
What's the best way to replicate that query in Pig?
It's obvious to me that a simple thing could be done with a simple join.
That is, if you just had
where a.f1 = b.f1
You can just JOIN and then foreach generate.
But with more than one clause in the where, I'm not sure of a good approach...
-Mark

Search Discussions

  • Dmitriy Ryaboy at May 5, 2011 at 10:12 pm
    If you are joining on the same field, you can do that in one statement:

    x = join a on f1, b on f1, c on f2, d on f3...;

    If however it's more of a star-like join, you have to do several joins:

    x = join a on f1, b on f1;
    xx = join x on b::f2, c on f2;
    ....

    D
    On Thu, May 5, 2011 at 12:06 PM, Mark Laczin wrote:
    This is more of a "how can I do this" question.
    Imagine you have an sql query like the following:
    select a.f1, a.f2, a.f3, b.f1, b.f2, c.f1, c.f2, d.f1, d.f2
    from tableA a, tableB b, tableC c, tableD d
    where
    a.f1 = b.f1
    and b.f2 = d.f1
    and c.f2 = b.f1
    /* etc.... */
    What's the best way to replicate that query in Pig?
    It's obvious to me that a simple thing could be done with a simple join.
    That is, if you just had
    where a.f1 = b.f1
    You can just JOIN and then foreach generate.
    But with more than one clause in the where, I'm not sure of a good approach...
    -Mark
  • Mark Laczin at May 9, 2011 at 2:32 pm
    Can the same pattern be used to deal with OR's in a WHERE?

    That is,
    SELECT somestuff, somemorestuff
    FROM data1 d1, data2 d2
    WHERE (d1.d = "somestring" OR d1.d = d2.d) AND (d1.a = d2.a OR d1.b = d2.c)

    ?

    Quite an odd situation, I know.

    -Mark
    On Thu, May 5, 2011 at 6:12 PM, Dmitriy Ryaboy wrote:
    If you are joining on the same field, you can do that in one statement:

    x = join a on f1, b on f1, c on f2, d on f3...;

    If however it's more of a star-like join, you have to do several joins:

    x = join a on f1, b on f1;
    xx = join x on b::f2, c on f2;
    ....

    D
    On Thu, May 5, 2011 at 12:06 PM, Mark Laczin wrote:
    This is more of a "how can I do this" question.
    Imagine you have an sql query like the following:
    select a.f1, a.f2, a.f3, b.f1, b.f2, c.f1, c.f2, d.f1, d.f2
    from tableA a, tableB b, tableC c, tableD d
    where
    a.f1 = b.f1
    and b.f2 = d.f1
    and c.f2 = b.f1
    /* etc.... */
    What's the best way to replicate that query in Pig?
    It's obvious to me that a simple thing could be done with a simple join.
    That is, if you just had
    where a.f1 = b.f1
    You can just JOIN and then foreach generate.
    But with more than one clause in the where, I'm not sure of a good approach...
    -Mark
  • Dmitriy Ryaboy at May 9, 2011 at 4:28 pm
    My first thought is that you can get there through a combination of
    joins, filters, and unions, but there's probably a better way.


    On Mon, May 9, 2011 at 7:31 AM, Mark Laczin wrote:
    Can the same pattern be used to deal with OR's in a WHERE?

    That is,
    SELECT somestuff, somemorestuff
    FROM data1 d1, data2 d2
    WHERE (d1.d = "somestring" OR d1.d = d2.d) AND (d1.a = d2.a OR d1.b = d2.c)

    ?

    Quite an odd situation, I know.

    -Mark
    On Thu, May 5, 2011 at 6:12 PM, Dmitriy Ryaboy wrote:
    If you are joining on the same field, you can do that in one statement:

    x = join a on f1, b on f1, c on f2, d on f3...;

    If however it's more of a star-like join, you have to do several joins:

    x = join a on f1, b on f1;
    xx = join x on b::f2, c on f2;
    ....

    D
    On Thu, May 5, 2011 at 12:06 PM, Mark Laczin wrote:
    This is more of a "how can I do this" question.
    Imagine you have an sql query like the following:
    select a.f1, a.f2, a.f3, b.f1, b.f2, c.f1, c.f2, d.f1, d.f2
    from tableA a, tableB b, tableC c, tableD d
    where
    a.f1 = b.f1
    and b.f2 = d.f1
    and c.f2 = b.f1
    /* etc.... */
    What's the best way to replicate that query in Pig?
    It's obvious to me that a simple thing could be done with a simple join.
    That is, if you just had
    where a.f1 = b.f1
    You can just JOIN and then foreach generate.
    But with more than one clause in the where, I'm not sure of a good approach...
    -Mark
  • Mark Laczin at May 9, 2011 at 5:06 pm
    I tried (in ultimate CS nerd fashion) to eliminate the OR's using
    DeMorgan's laws, and that might lead me to some refactoring of the
    logic that'll make it work with just sequential joins, but I'm not
    sure how far it'll take me.

    Anyone else have an idea?
    On Mon, May 9, 2011 at 12:28 PM, Dmitriy Ryaboy wrote:
    My first thought is that you can get there through a combination of
    joins, filters, and unions, but there's probably a better way.


    On Mon, May 9, 2011 at 7:31 AM, Mark Laczin wrote:
    Can the same pattern be used to deal with OR's in a WHERE?

    That is,
    SELECT somestuff, somemorestuff
    FROM data1 d1, data2 d2
    WHERE (d1.d = "somestring" OR d1.d = d2.d) AND (d1.a = d2.a OR d1.b = d2.c)

    ?

    Quite an odd situation, I know.

    -Mark
    On Thu, May 5, 2011 at 6:12 PM, Dmitriy Ryaboy wrote:
    If you are joining on the same field, you can do that in one statement:

    x = join a on f1, b on f1, c on f2, d on f3...;

    If however it's more of a star-like join, you have to do several joins:

    x = join a on f1, b on f1;
    xx = join x on b::f2, c on f2;
    ....

    D
    On Thu, May 5, 2011 at 12:06 PM, Mark Laczin wrote:
    This is more of a "how can I do this" question.
    Imagine you have an sql query like the following:
    select a.f1, a.f2, a.f3, b.f1, b.f2, c.f1, c.f2, d.f1, d.f2
    from tableA a, tableB b, tableC c, tableD d
    where
    a.f1 = b.f1
    and b.f2 = d.f1
    and c.f2 = b.f1
    /* etc.... */
    What's the best way to replicate that query in Pig?
    It's obvious to me that a simple thing could be done with a simple join.
    That is, if you just had
    where a.f1 = b.f1
    You can just JOIN and then foreach generate.
    But with more than one clause in the where, I'm not sure of a good approach...
    -Mark

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedMay 5, '11 at 7:06p
activeMay 9, '11 at 5:06p
posts5
users2
websitepig.apache.org

2 users in discussion

Mark Laczin: 3 posts Dmitriy Ryaboy: 2 posts

People

Translate

site design / logo © 2021 Grokbase