FAQ
Guys, I know this must be a common use case, but how do you explode and
implode in pig?

so, I have a file like this...

1, asdf
2, qewrty
3, zcxvb


and I want to apply an explode operation to it:

1, a
1, s
1, d
1, f
2, q
2, e
2, w
2, r
2, t
2, y
3, z
3, c
3, x
3, v
3, b

and after some work... I have this file:

1, aa
1, ss
1, dd
1, ff
2, qq
2, ee
2, ww
2, rr
2, tt
2, yy
3, zz
3, cc
3, xx
3, vv
3, bb


and I want to perform an implode:

1, aassddff
2, qqeewwrrttyy
3, zzccxxvvbb


well, obviously this is a dumb example, but I'd like to do those things. Can
somebody help me with this? I looked in the piggy bank and didn't see
anything that would do this for me.

Thanks!

Search Discussions

  • Dmitriy Ryaboy at Feb 21, 2010 at 9:32 am
    Explode:

    exploded = foreach foo generate id, FLATTEN(CharSplit(string));

    -- MySplit is an EvalFunc<DataBag> that takes a string and splits it into
    characters
    -- flatten on a bag creates multiple new rows, one per element in the bag

    imploded = group exploded by id;
    imploded = foreach imploded generate BagConcat(exploded)

    -- BagConcat is an EvalFunc<String> that takes a bag of one-field tuples and
    returns a string that's a concatenation of all the strings in the tuples in
    the bag

    do note that bags do not guarantee order, so if you have an order
    requirement, you may need to enforce it in BagConcat

    -D
    On Fri, Feb 19, 2010 at 10:21 AM, hc busy wrote:

    Guys, I know this must be a common use case, but how do you explode and
    implode in pig?

    so, I have a file like this...

    1, asdf
    2, qewrty
    3, zcxvb


    and I want to apply an explode operation to it:

    1, a
    1, s
    1, d
    1, f
    2, q
    2, e
    2, w
    2, r
    2, t
    2, y
    3, z
    3, c
    3, x
    3, v
    3, b

    and after some work... I have this file:

    1, aa
    1, ss
    1, dd
    1, ff
    2, qq
    2, ee
    2, ww
    2, rr
    2, tt
    2, yy
    3, zz
    3, cc
    3, xx
    3, vv
    3, bb


    and I want to perform an implode:

    1, aassddff
    2, qqeewwrrttyy
    3, zzccxxvvbb


    well, obviously this is a dumb example, but I'd like to do those things.
    Can
    somebody help me with this? I looked in the piggy bank and didn't see
    anything that would do this for me.

    Thanks!
  • Rekha Joshi at Feb 22, 2010 at 4:06 am
    You would require a udf for this.Please check if you already have an existing one in latest pig-udf.jar.
    Or since this is a pretty simple one , you can write one yourself - take the tuple, assess the type , append the strings and return it from your exec() method.

    Cheers,
    /R


    On 2/19/10 11:51 PM, "hc busy" wrote:

    Guys, I know this must be a common use case, but how do you explode and
    implode in pig?

    so, I have a file like this...

    1, asdf
    2, qewrty
    3, zcxvb


    and I want to apply an explode operation to it:

    1, a
    1, s
    1, d
    1, f
    2, q
    2, e
    2, w
    2, r
    2, t
    2, y
    3, z
    3, c
    3, x
    3, v
    3, b

    and after some work... I have this file:

    1, aa
    1, ss
    1, dd
    1, ff
    2, qq
    2, ee
    2, ww
    2, rr
    2, tt
    2, yy
    3, zz
    3, cc
    3, xx
    3, vv
    3, bb


    and I want to perform an implode:

    1, aassddff
    2, qqeewwrrttyy
    3, zzccxxvvbb


    well, obviously this is a dumb example, but I'd like to do those things. Can
    somebody help me with this? I looked in the piggy bank and didn't see
    anything that would do this for me.

    Thanks!
  • Hc busy at Feb 22, 2010 at 5:46 pm
    Thanks, Dmitriy and Rekha . So I understand the flatten on bag explodes to
    multiple rows now.

    The BagConcat seems to work. Actually, doing a simple example using the
    group by, it would appear that the bag contains the results in the order
    that they were before entering the group by. (so, if I group after an order
    by x desc, then when I dump the table it prints the bag, but contents are
    reversed)... So, actually, for my purposes, not having results in order is
    okay.

    what about instead of charsplit, the data I have is this:

    1,a,b,c,d
    2,a,s,d,f

    and I want to explode it into
    1,a
    1,b
    1,c
    1,d
    2,a
    2,s
    2,d
    2,f

    (sorry, I made a mistake in the original question, the string is not a
    string but a tuple.) I think I may be able to get it into:

    1, (a,b,c,d)
    2, (a,s,d,f)

    but still, I need to explode it into several rows to operate on them
    separately.


    On Sun, Feb 21, 2010 at 8:03 PM, Rekha Joshi wrote:

    You would require a udf for this.Please check if you already have an
    existing one in latest pig-udf.jar.
    Or since this is a pretty simple one , you can write one yourself - take
    the tuple, assess the type , append the strings and return it from your
    exec() method.

    Cheers,
    /R


    On 2/19/10 11:51 PM, "hc busy" wrote:

    Guys, I know this must be a common use case, but how do you explode and
    implode in pig?

    so, I have a file like this...

    1, asdf
    2, qewrty
    3, zcxvb


    and I want to apply an explode operation to it:

    1, a
    1, s
    1, d
    1, f
    2, q
    2, e
    2, w
    2, r
    2, t
    2, y
    3, z
    3, c
    3, x
    3, v
    3, b

    and after some work... I have this file:

    1, aa
    1, ss
    1, dd
    1, ff
    2, qq
    2, ee
    2, ww
    2, rr
    2, tt
    2, yy
    3, zz
    3, cc
    3, xx
    3, vv
    3, bb


    and I want to perform an implode:

    1, aassddff
    2, qqeewwrrttyy
    3, zzccxxvvbb


    well, obviously this is a dumb example, but I'd like to do those things.
    Can
    somebody help me with this? I looked in the piggy bank and didn't see
    anything that would do this for me.

    Thanks!
  • Dmitriy Ryaboy at Feb 22, 2010 at 5:58 pm
    Same thing -- a udf to convert a tuple into a bag, then flatten.
    Don't rely on any order you see in bags during testing -- there is
    explicitly no guarantee there, it may change on you version to version and
    execution to execution.

    -D
    On Mon, Feb 22, 2010 at 9:45 AM, hc busy wrote:

    Thanks, Dmitriy and Rekha . So I understand the flatten on bag explodes to
    multiple rows now.

    The BagConcat seems to work. Actually, doing a simple example using the
    group by, it would appear that the bag contains the results in the order
    that they were before entering the group by. (so, if I group after an order
    by x desc, then when I dump the table it prints the bag, but contents are
    reversed)... So, actually, for my purposes, not having results in order is
    okay.

    what about instead of charsplit, the data I have is this:

    1,a,b,c,d
    2,a,s,d,f

    and I want to explode it into
    1,a
    1,b
    1,c
    1,d
    2,a
    2,s
    2,d
    2,f

    (sorry, I made a mistake in the original question, the string is not a
    string but a tuple.) I think I may be able to get it into:

    1, (a,b,c,d)
    2, (a,s,d,f)

    but still, I need to explode it into several rows to operate on them
    separately.


    On Sun, Feb 21, 2010 at 8:03 PM, Rekha Joshi wrote:

    You would require a udf for this.Please check if you already have an
    existing one in latest pig-udf.jar.
    Or since this is a pretty simple one , you can write one yourself - take
    the tuple, assess the type , append the strings and return it from your
    exec() method.

    Cheers,
    /R


    On 2/19/10 11:51 PM, "hc busy" wrote:

    Guys, I know this must be a common use case, but how do you explode and
    implode in pig?

    so, I have a file like this...

    1, asdf
    2, qewrty
    3, zcxvb


    and I want to apply an explode operation to it:

    1, a
    1, s
    1, d
    1, f
    2, q
    2, e
    2, w
    2, r
    2, t
    2, y
    3, z
    3, c
    3, x
    3, v
    3, b

    and after some work... I have this file:

    1, aa
    1, ss
    1, dd
    1, ff
    2, qq
    2, ee
    2, ww
    2, rr
    2, tt
    2, yy
    3, zz
    3, cc
    3, xx
    3, vv
    3, bb


    and I want to perform an implode:

    1, aassddff
    2, qqeewwrrttyy
    3, zzccxxvvbb


    well, obviously this is a dumb example, but I'd like to do those things.
    Can
    somebody help me with this? I looked in the piggy bank and didn't see
    anything that would do this for me.

    Thanks!
  • Hc busy at Feb 22, 2010 at 6:35 pm
    ok, it sounds like I have a plan. So I need to write a UDF from tuple to
    bag(t2b) and bag to tuple(b2t), and then I do

    exploded= foreach foo generate id, FLATTEN(t2b(field1, field2, field3));
    implode= group exploded by id;
    implode= foreach implode generate id, flatten(b2t(implode));

    to (almost) recover original table, except for field order may be messed up.
    Is there a way to write a udf like flatten that preserve order?


    Thanks!



    On Mon, Feb 22, 2010 at 9:57 AM, Dmitriy Ryaboy wrote:

    Same thing -- a udf to convert a tuple into a bag, then flatten.
    Don't rely on any order you see in bags during testing -- there is
    explicitly no guarantee there, it may change on you version to version and
    execution to execution.

    -D
    On Mon, Feb 22, 2010 at 9:45 AM, hc busy wrote:

    Thanks, Dmitriy and Rekha . So I understand the flatten on bag explodes to
    multiple rows now.

    The BagConcat seems to work. Actually, doing a simple example using the
    group by, it would appear that the bag contains the results in the order
    that they were before entering the group by. (so, if I group after an order
    by x desc, then when I dump the table it prints the bag, but contents are
    reversed)... So, actually, for my purposes, not having results in order is
    okay.

    what about instead of charsplit, the data I have is this:

    1,a,b,c,d
    2,a,s,d,f

    and I want to explode it into
    1,a
    1,b
    1,c
    1,d
    2,a
    2,s
    2,d
    2,f

    (sorry, I made a mistake in the original question, the string is not a
    string but a tuple.) I think I may be able to get it into:

    1, (a,b,c,d)
    2, (a,s,d,f)

    but still, I need to explode it into several rows to operate on them
    separately.



    On Sun, Feb 21, 2010 at 8:03 PM, Rekha Joshi <rekhajos@yahoo-inc.com>
    wrote:
    You would require a udf for this.Please check if you already have an
    existing one in latest pig-udf.jar.
    Or since this is a pretty simple one , you can write one yourself -
    take
    the tuple, assess the type , append the strings and return it from your
    exec() method.

    Cheers,
    /R


    On 2/19/10 11:51 PM, "hc busy" wrote:

    Guys, I know this must be a common use case, but how do you explode and
    implode in pig?

    so, I have a file like this...

    1, asdf
    2, qewrty
    3, zcxvb


    and I want to apply an explode operation to it:

    1, a
    1, s
    1, d
    1, f
    2, q
    2, e
    2, w
    2, r
    2, t
    2, y
    3, z
    3, c
    3, x
    3, v
    3, b

    and after some work... I have this file:

    1, aa
    1, ss
    1, dd
    1, ff
    2, qq
    2, ee
    2, ww
    2, rr
    2, tt
    2, yy
    3, zz
    3, cc
    3, xx
    3, vv
    3, bb


    and I want to perform an implode:

    1, aassddff
    2, qqeewwrrttyy
    3, zzccxxvvbb


    well, obviously this is a dumb example, but I'd like to do those
    things.
    Can
    somebody help me with this? I looked in the piggy bank and didn't see
    anything that would do this for me.

    Thanks!
  • Hc busy at Feb 22, 2010 at 7:42 pm
    how do I use this bag? Is there a way for me to specify it in grunt?

    BagFactory.getInstance().newSortedBag(comparator)

    ?
    On Mon, Feb 22, 2010 at 10:34 AM, hc busy wrote:

    ok, it sounds like I have a plan. So I need to write a UDF from tuple to
    bag(t2b) and bag to tuple(b2t), and then I do

    exploded= foreach foo generate id, FLATTEN(t2b(field1, field2, field3));
    implode= group exploded by id;
    implode= foreach implode generate id, flatten(b2t(implode));

    to (almost) recover original table, except for field order may be messed
    up. Is there a way to write a udf like flatten that preserve order?


    Thanks!



    On Mon, Feb 22, 2010 at 9:57 AM, Dmitriy Ryaboy wrote:

    Same thing -- a udf to convert a tuple into a bag, then flatten.
    Don't rely on any order you see in bags during testing -- there is
    explicitly no guarantee there, it may change on you version to version and
    execution to execution.

    -D
    On Mon, Feb 22, 2010 at 9:45 AM, hc busy wrote:

    Thanks, Dmitriy and Rekha . So I understand the flatten on bag explodes to
    multiple rows now.

    The BagConcat seems to work. Actually, doing a simple example using the
    group by, it would appear that the bag contains the results in the order
    that they were before entering the group by. (so, if I group after an order
    by x desc, then when I dump the table it prints the bag, but contents are
    reversed)... So, actually, for my purposes, not having results in order is
    okay.

    what about instead of charsplit, the data I have is this:

    1,a,b,c,d
    2,a,s,d,f

    and I want to explode it into
    1,a
    1,b
    1,c
    1,d
    2,a
    2,s
    2,d
    2,f

    (sorry, I made a mistake in the original question, the string is not a
    string but a tuple.) I think I may be able to get it into:

    1, (a,b,c,d)
    2, (a,s,d,f)

    but still, I need to explode it into several rows to operate on them
    separately.



    On Sun, Feb 21, 2010 at 8:03 PM, Rekha Joshi <rekhajos@yahoo-inc.com>
    wrote:
    You would require a udf for this.Please check if you already have an
    existing one in latest pig-udf.jar.
    Or since this is a pretty simple one , you can write one yourself -
    take
    the tuple, assess the type , append the strings and return it from
    your
    exec() method.

    Cheers,
    /R


    On 2/19/10 11:51 PM, "hc busy" wrote:

    Guys, I know this must be a common use case, but how do you explode
    and
    implode in pig?

    so, I have a file like this...

    1, asdf
    2, qewrty
    3, zcxvb


    and I want to apply an explode operation to it:

    1, a
    1, s
    1, d
    1, f
    2, q
    2, e
    2, w
    2, r
    2, t
    2, y
    3, z
    3, c
    3, x
    3, v
    3, b

    and after some work... I have this file:

    1, aa
    1, ss
    1, dd
    1, ff
    2, qq
    2, ee
    2, ww
    2, rr
    2, tt
    2, yy
    3, zz
    3, cc
    3, xx
    3, vv
    3, bb


    and I want to perform an implode:

    1, aassddff
    2, qqeewwrrttyy
    3, zzccxxvvbb


    well, obviously this is a dumb example, but I'd like to do those
    things.
    Can
    somebody help me with this? I looked in the piggy bank and didn't see
    anything that would do this for me.

    Thanks!
  • Hc busy at Feb 22, 2010 at 10:41 pm
    I guess this is a known issue, but if I have

    A=load 'data' as (a:int, b:int, c:int);

    I am able to do

    B=foreach A generate (1,2,3);

    but not

    B=foreach A generate a, (b,c);


    I mean the udf for this is simple, but why isn't there built-in language
    support for this and the map/tuple operations I am asking for? Does anybody
    else use this kind of thing?


    On Mon, Feb 22, 2010 at 11:41 AM, hc busy wrote:

    how do I use this bag? Is there a way for me to specify it in grunt?

    BagFactory.getInstance().newSortedBag(comparator)

    ?

    On Mon, Feb 22, 2010 at 10:34 AM, hc busy wrote:

    ok, it sounds like I have a plan. So I need to write a UDF from tuple to
    bag(t2b) and bag to tuple(b2t), and then I do

    exploded= foreach foo generate id, FLATTEN(t2b(field1, field2, field3));
    implode= group exploded by id;
    implode= foreach implode generate id, flatten(b2t(implode));

    to (almost) recover original table, except for field order may be messed
    up. Is there a way to write a udf like flatten that preserve order?


    Thanks!



    On Mon, Feb 22, 2010 at 9:57 AM, Dmitriy Ryaboy wrote:

    Same thing -- a udf to convert a tuple into a bag, then flatten.
    Don't rely on any order you see in bags during testing -- there is
    explicitly no guarantee there, it may change on you version to version
    and
    execution to execution.

    -D
    On Mon, Feb 22, 2010 at 9:45 AM, hc busy wrote:

    Thanks, Dmitriy and Rekha . So I understand the flatten on bag explodes to
    multiple rows now.

    The BagConcat seems to work. Actually, doing a simple example using the
    group by, it would appear that the bag contains the results in the order
    that they were before entering the group by. (so, if I group after an order
    by x desc, then when I dump the table it prints the bag, but contents are
    reversed)... So, actually, for my purposes, not having results in order is
    okay.

    what about instead of charsplit, the data I have is this:

    1,a,b,c,d
    2,a,s,d,f

    and I want to explode it into
    1,a
    1,b
    1,c
    1,d
    2,a
    2,s
    2,d
    2,f

    (sorry, I made a mistake in the original question, the string is not a
    string but a tuple.) I think I may be able to get it into:

    1, (a,b,c,d)
    2, (a,s,d,f)

    but still, I need to explode it into several rows to operate on them
    separately.



    On Sun, Feb 21, 2010 at 8:03 PM, Rekha Joshi <rekhajos@yahoo-inc.com>
    wrote:
    You would require a udf for this.Please check if you already have an
    existing one in latest pig-udf.jar.
    Or since this is a pretty simple one , you can write one yourself -
    take
    the tuple, assess the type , append the strings and return it from
    your
    exec() method.

    Cheers,
    /R


    On 2/19/10 11:51 PM, "hc busy" wrote:

    Guys, I know this must be a common use case, but how do you explode
    and
    implode in pig?

    so, I have a file like this...

    1, asdf
    2, qewrty
    3, zcxvb


    and I want to apply an explode operation to it:

    1, a
    1, s
    1, d
    1, f
    2, q
    2, e
    2, w
    2, r
    2, t
    2, y
    3, z
    3, c
    3, x
    3, v
    3, b

    and after some work... I have this file:

    1, aa
    1, ss
    1, dd
    1, ff
    2, qq
    2, ee
    2, ww
    2, rr
    2, tt
    2, yy
    3, zz
    3, cc
    3, xx
    3, vv
    3, bb


    and I want to perform an implode:

    1, aassddff
    2, qqeewwrrttyy
    3, zzccxxvvbb


    well, obviously this is a dumb example, but I'd like to do those
    things.
    Can
    somebody help me with this? I looked in the piggy bank and didn't see
    anything that would do this for me.

    Thanks!

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedFeb 19, '10 at 6:21p
activeFeb 22, '10 at 10:41p
posts8
users3
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase