Grokbase Groups Pig user January 2011
FAQ
Hi,

Hope there is some simple answer to this. I have bunch of rows, for each
row, I want to add a column which is derived from some existing columns. And
I have large number of columns in my input tuple so I don't want to repeat
the name using "AS" when I generate. Is there an easy way just to append a
column to tuples without having to touch the tuple itself on the output.

Here's my example:

grunt> DESCRIBE X;
X: {id: chararray,v1: int,v2: int}

grunt> DUMP X;
(a,3,42)
(b,2,4)
(c,7,32)

I can do this:
grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, id, v1, v2;
grunt> DUMP Y;
(39,a,3,42)
(2,b,2,4)
(25,c,7,32)

But I would prefer not to have to list all the v's. I may have v1, v2, v3,
..., v100.

Of course this doesn't work

grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, FLATTEN(X);

What can be done to simplify this? And related question, what is the schema
after the FOREACH, I wish I could do a DESCRIBE after FOREACH.

Thanks !!

Search Discussions

  • Jonathan Coveney at Jan 12, 2011 at 11:16 pm
    Foreach a generate function(thing), *; should do what yopu want. * just throws on all the columns

    Sent via BlackBerry

    -----Original Message-----
    From: Dexin Wang <wangdexin@gmail.com>
    Date: Wed, 12 Jan 2011 14:51:58
    To: <user@pig.apache.org>
    Reply-To: user@pig.apache.org
    Subject: wild card for all fields in a tuple

    Hi,

    Hope there is some simple answer to this. I have bunch of rows, for each
    row, I want to add a column which is derived from some existing columns. And
    I have large number of columns in my input tuple so I don't want to repeat
    the name using "AS" when I generate. Is there an easy way just to append a
    column to tuples without having to touch the tuple itself on the output.

    Here's my example:

    grunt> DESCRIBE X;
    X: {id: chararray,v1: int,v2: int}

    grunt> DUMP X;
    (a,3,42)
    (b,2,4)
    (c,7,32)

    I can do this:
    grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, id, v1, v2;
    grunt> DUMP Y;
    (39,a,3,42)
    (2,b,2,4)
    (25,c,7,32)

    But I would prefer not to have to list all the v's. I may have v1, v2, v3,
    ..., v100.

    Of course this doesn't work

    grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, FLATTEN(X);

    What can be done to simplify this? And related question, what is the schema
    after the FOREACH, I wish I could do a DESCRIBE after FOREACH.

    Thanks !!
  • Alan Gates at Jan 12, 2011 at 11:19 pm
    There isn't a way to do that yet. See https://issues.apache.org/jira/browse/PIG-1693
    for our plans on adding it in the next release.

    Alan.
    On Jan 12, 2011, at 2:51 PM, Dexin Wang wrote:

    Hi,

    Hope there is some simple answer to this. I have bunch of rows, for
    each
    row, I want to add a column which is derived from some existing
    columns. And
    I have large number of columns in my input tuple so I don't want to
    repeat
    the name using "AS" when I generate. Is there an easy way just to
    append a
    column to tuples without having to touch the tuple itself on the
    output.

    Here's my example:

    grunt> DESCRIBE X;
    X: {id: chararray,v1: int,v2: int}

    grunt> DUMP X;
    (a,3,42)
    (b,2,4)
    (c,7,32)

    I can do this:
    grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, id, v1, v2;
    grunt> DUMP Y;
    (39,a,3,42)
    (2,b,2,4)
    (25,c,7,32)

    But I would prefer not to have to list all the v's. I may have v1,
    v2, v3,
    ..., v100.

    Of course this doesn't work

    grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, FLATTEN(X);

    What can be done to simplify this? And related question, what is the
    schema
    after the FOREACH, I wish I could do a DESCRIBE after FOREACH.

    Thanks !!
  • Alan Gates at Jan 12, 2011 at 11:34 pm
    Jonathan is right, you can do all fields in a tuple with *. I was
    thinking of doing all fields in between two fields, which you can't do
    yet.

    Alan.
    On Jan 12, 2011, at 3:18 PM, Alan Gates wrote:

    There isn't a way to do that yet. See https://issues.apache.org/jira/browse/PIG-1693
    for our plans on adding it in the next release.

    Alan.
    On Jan 12, 2011, at 2:51 PM, Dexin Wang wrote:

    Hi,

    Hope there is some simple answer to this. I have bunch of rows, for
    each
    row, I want to add a column which is derived from some existing
    columns. And
    I have large number of columns in my input tuple so I don't want to
    repeat
    the name using "AS" when I generate. Is there an easy way just to
    append a
    column to tuples without having to touch the tuple itself on the
    output.

    Here's my example:

    grunt> DESCRIBE X;
    X: {id: chararray,v1: int,v2: int}

    grunt> DUMP X;
    (a,3,42)
    (b,2,4)
    (c,7,32)

    I can do this:
    grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, id, v1, v2;
    grunt> DUMP Y;
    (39,a,3,42)
    (2,b,2,4)
    (25,c,7,32)

    But I would prefer not to have to list all the v's. I may have v1,
    v2, v3,
    ..., v100.

    Of course this doesn't work

    grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, FLATTEN(X);

    What can be done to simplify this? And related question, what is the
    schema
    after the FOREACH, I wish I could do a DESCRIBE after FOREACH.

    Thanks !!
  • Dexin Wang at Jan 12, 2011 at 11:44 pm
    Yeah, that works great. Thanks Jonathan and Alan. I can see that all fields
    in between feature will be totally useful for some cases.
    On Wed, Jan 12, 2011 at 3:33 PM, Alan Gates wrote:

    Jonathan is right, you can do all fields in a tuple with *. I was thinking
    of doing all fields in between two fields, which you can't do yet.

    Alan.


    On Jan 12, 2011, at 3:18 PM, Alan Gates wrote:

    There isn't a way to do that yet. See
    https://issues.apache.org/jira/browse/PIG-1693
    for our plans on adding it in the next release.

    Alan.

    On Jan 12, 2011, at 2:51 PM, Dexin Wang wrote:

    Hi,
    Hope there is some simple answer to this. I have bunch of rows, for
    each
    row, I want to add a column which is derived from some existing
    columns. And
    I have large number of columns in my input tuple so I don't want to
    repeat
    the name using "AS" when I generate. Is there an easy way just to
    append a
    column to tuples without having to touch the tuple itself on the
    output.

    Here's my example:

    grunt> DESCRIBE X;
    X: {id: chararray,v1: int,v2: int}

    grunt> DUMP X;
    (a,3,42)
    (b,2,4)
    (c,7,32)

    I can do this:
    grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, id, v1, v2;
    grunt> DUMP Y;
    (39,a,3,42)
    (2,b,2,4)
    (25,c,7,32)

    But I would prefer not to have to list all the v's. I may have v1,
    v2, v3,
    ..., v100.

    Of course this doesn't work

    grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, FLATTEN(X);

    What can be done to simplify this? And related question, what is the
    schema
    after the FOREACH, I wish I could do a DESCRIBE after FOREACH.

    Thanks !!

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedJan 12, '11 at 10:52p
activeJan 12, '11 at 11:44p
posts5
users3
websitepig.apache.org

People

Translate

site design / logo © 2022 Grokbase