Grokbase Groups Pig user October 2009
FAQ
new to pig, I want to do an out join using pig, but cannot the result
I want. did I do something wrong?

--1.txt
a 1
b 2
c 3

---2.txt
a aa
c cc


A = LOAD '1.txt' USING PigStorage('\t') as (a1,a2);
B = LOAD '2.txt' USING PigStorage('\t') as (a1,a2);
ret = JOIN A by a1 LEFT OUTER, B BY a1;
STORE ret INTO 'result';

but the result is

a 1
b 2
c 3

the result from 2.txt is somehow missing...

Thanks in advance!

Search Discussions

  • Dmitriy Ryaboy at Oct 14, 2009 at 7:00 pm
    Make sure A and B contain what you think they contain (use the 'dump'
    command to view them inside the shell).
    Are you sure the input files are delimited by tabs?

    I ran your script and got the expected results:

    grunt> a = load '/tmp/1.txt' as (a1,a2);
    grunt> b = load '/tmp/2.txt' as (a1,a2);
    grunt> ret = JOIN a by a1 LEFT OUTER, b by a1;
    grunt> dump a;
    (a,1)
    (b,2)
    (c,3)
    grunt> dump b;
    (a,aa)
    (c,cc)
    grunt> dump ret;
    (a,1,a,aa)
    (b,2,,)
    (c,3,c,cc)
    grunt> store ret into '/tmp/res';
    grunt> quit

    dvryaboy@abacus:~/src/pig$ cat /tmp/res
    a 1 a aa
    b 2
    c 3 c cc


    On Wed, Oct 14, 2009 at 2:48 PM, Yonggang Qiao wrote:
    new to pig, I want to do an out join using pig, but cannot the result
    I want. did I do something wrong?

    --1.txt
    a       1
    b       2
    c       3

    ---2.txt
    a       aa
    c       cc


    A = LOAD '1.txt' USING PigStorage('\t') as (a1,a2);
    B = LOAD '2.txt' USING PigStorage('\t') as (a1,a2);
    ret = JOIN A by a1 LEFT OUTER, B BY a1;
    STORE ret INTO 'result';

    but the result is

    a       1
    b       2
    c       3

    the result from 2.txt is somehow missing...

    Thanks in advance!
  • Yonggang Qiao at Oct 14, 2009 at 7:29 pm
    you are right. it is actually space instead of tab. Thanks!
    On Wed, Oct 14, 2009 at 12:00 PM, Dmitriy Ryaboy wrote:
    Make sure A and B contain what you think they contain (use the 'dump'
    command to view them inside the shell).
    Are you sure the input files are delimited by tabs?

    I ran your script and got the expected results:

    grunt> a = load '/tmp/1.txt' as (a1,a2);
    grunt> b = load '/tmp/2.txt' as (a1,a2);
    grunt> ret = JOIN a by a1 LEFT OUTER, b by a1;
    grunt> dump a;
    (a,1)
    (b,2)
    (c,3)
    grunt> dump b;
    (a,aa)
    (c,cc)
    grunt> dump ret;
    (a,1,a,aa)
    (b,2,,)
    (c,3,c,cc)
    grunt> store ret into '/tmp/res';
    grunt> quit

    dvryaboy@abacus:~/src/pig$ cat /tmp/res
    a       1       a       aa
    b       2
    c       3       c       cc


    On Wed, Oct 14, 2009 at 2:48 PM, Yonggang Qiao wrote:
    new to pig, I want to do an out join using pig, but cannot the result
    I want. did I do something wrong?

    --1.txt
    a       1
    b       2
    c       3

    ---2.txt
    a       aa
    c       cc


    A = LOAD '1.txt' USING PigStorage('\t') as (a1,a2);
    B = LOAD '2.txt' USING PigStorage('\t') as (a1,a2);
    ret = JOIN A by a1 LEFT OUTER, B BY a1;
    STORE ret INTO 'result';

    but the result is

    a       1
    b       2
    c       3

    the result from 2.txt is somehow missing...

    Thanks in advance!

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedOct 14, '09 at 6:48p
activeOct 14, '09 at 7:29p
posts3
users2
websitepig.apache.org

2 users in discussion

Yonggang Qiao: 2 posts Dmitriy Ryaboy: 1 post

People

Translate

site design / logo © 2022 Grokbase