Grokbase Groups Pig user April 2010
FAQ
Hello,

can anybody tell me what the LEFT OUTER JOIN produces in case of non matching tuples?
I thought it would produce nulls for the right relation but a later test for IS NULL does not produce the desired solution.

Example:

left_rel has schema (s: bytearray)
right_rel has schema (s: bytearray, p: bytearray)

A = JOIN left_rel BY s LEFT OUTER, right_rel BY s;

So what is the resulting tuple in case that a value for s in 'left_rel' doesn't find a join partner in 'right_rel'?
I thought it would be: (value for s in left_rel, null, null)

But a later FILTER for null values does not produce the right result:

B = FILTER A BY right_rel : : s is null

Can anybody tell me what I got wrong?


Thx in advance,
Alex

Search Discussions

  • Alexander Schätzle at Apr 22, 2010 at 10:37 am
    Consider the following example:

    left_rel (s: bytearray)
    ################
    (a)
    (b)
    (c)
    (d)
    (e)

    right_rel (s: bytearray, p: bytearray)
    ##########################
    (a, 1)
    (b, 2)
    (e, 3)



    A = JOIN left_rel BY s LEFT OUTER, right_rel BY s;
    ###########################################
    (a, a, 1)
    (b, b, 2)
    (c, , )
    (d, , )
    (e, e, 3)

    B = FILTER A BY $1 is null;

    DUMP B; -> produces an ERROR!

    STORE B INTO 'outfile';
    ###################
    a
    b
    c
    d
    e

    But what I expect is:
    c
    d

    I'm very confused at the moment and I absolutely don't understand what's happening!
    Does anybody knows the answer?

    Thx,
    Alex
  • Mridul Muralidharan at Apr 22, 2010 at 12:42 pm
    Hi Alex,

    This is a bug in pig imo where it is pushing the filter before the
    join : when it should not.


    To validate, simply introduce an intermediate store/load pair to see
    right results.
    There probably already is some JIRA similar to this, if yes - please do
    add to that or please do create a new one.

    Someone from pig team should probably resolve this (if you are not on
    latest pig already).



    Regards,
    Mridul

    On Thursday 22 April 2010 03:10 PM, Alexander Schätzle wrote:
    Hello,

    can anybody tell me what the LEFT OUTER JOIN produces in case of non matching tuples?
    I thought it would produce nulls for the right relation but a later test for IS NULL does not produce the desired solution.

    Example:

    left_rel has schema (s: bytearray)
    right_rel has schema (s: bytearray, p: bytearray)

    A = JOIN left_rel BY s LEFT OUTER, right_rel BY s;

    So what is the resulting tuple in case that a value for s in 'left_rel' doesn't find a join partner in 'right_rel'?
    I thought it would be: (value for s in left_rel, null, null)

    But a later FILTER for null values does not produce the right result:

    B = FILTER A BY right_rel : : s is null

    Can anybody tell me what I got wrong?


    Thx in advance,
    Alex

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedApr 22, '10 at 9:41a
activeApr 22, '10 at 12:42p
posts3
users2
websitepig.apache.org

People

Translate

site design / logo © 2022 Grokbase