Grokbase Groups Pig user April 2011
FAQ
No matter what I try, I end up losing the tuples after the initial flatten. I'm using some auto-generated test data with firstn, last and a concatanation for the key. The script and outputs. . .

rows = LOAD 'cassandra://Keyspace2/Standard1' USING CassandraStorage() as (key:chararray, cols:bag{T:tuple(name:chararray, value:chararray) } );
dump rows;

(faaaaaaaaazzzzzzeaaa,{(first,faaaaaaaaa),(last,zzzzzzeaaa)})
(jaaaaaaaaazzzlaaaaaa,{(first,jaaaaaaaaa),(last,zzzlaaaaaa)})
(naaaaaaaaazzzzzpaaaa,{(first,naaaaaaaaa),(last,zzzzzpaaaa)})
(uaaaaaaaaazzzzzsaaaa,{(first,uaaaaaaaaa),(last,zzzzzsaaaa)})
(vaaaaaaaaafaaaaaaaaa,{(first,vaaaaaaaaa),(last,faaaaaaaaa)})
(zuaaaaaaaazpaaaaaaaa,{(first,zuaaaaaaaa),(last,zpaaaaaaaa)})
(zuaaaaaaaazzzzhaaaaa,{(first,zuaaaaaaaa),(last,zzzzhaaaaa)})
(zwaaaaaaaaznaaaaaaaa,{(first,zwaaaaaaaa),(last,znaaaaaaaa)})
(zziaaaaaaazfaaaaaaaa,{(first,zziaaaaaaa),(last,zfaaaaaaaa)})
(zzkaaaaaaazzzdaaaaaa,{(first,zzkaaaaaaa),(last,zzzdaaaaaa)})

So far, so good.


columns = foreach rows generate flatten(cols) as (name, value);
dump columns;

()
()
()
()
()
()
()
()
()
()


Not so good.



I've tried multiple combinations w/ no success. If I just leave bag empty in the initial load, i.e. cols:bag{} and then leave off the as in the flatten I get something that looks like a list of tuples. But, trying to access $1 in that result gives me an Error 1000 index out of range. So, that's not the ticket either.

What I'd really like is to flatten the bag into a map, but I'm about as successful there as well.

Any help is most welcome . (Cassandra 7.4 and Pig 0.8.0)

Search Discussions

  • Jeremy Hanna at Apr 6, 2011 at 10:51 pm
    I'm going to put a UDF up on the pygmalion project hopefully today that will convert that into something more usable. Props to Jacob from infochimps - he and I have been creating UDFs like that lately for use with Cassandra. There's an associated UDF for getting it back into the key, cols form to output to cassandra as well. I'll try to get that pushed tonight but take a look at:
    https://github.com/jeromatron/pygmalion/
    That's where I'll push the code - hopefully that will help.

    What it does is takes the data structure returned from cassandra and allows you say, give me the key and the values for these column names in a bag so for your example it would return:
    {(faaaaaaaaazzzzzzeaaa,faaaaaaaaa,zzzzzzeaaa)}
    and you could assign var names for each like key, first, last within pig.

    Anyway, if that helps, look for that soon. It's helping us use the output as tabular data.
    On Apr 6, 2011, at 5:40 PM, bob wrote:

    No matter what I try, I end up losing the tuples after the initial flatten. I'm using some auto-generated test data with firstn, last and a concatanation for the key. The script and outputs. . .

    rows = LOAD 'cassandra://Keyspace2/Standard1' USING CassandraStorage() as (key:chararray, cols:bag{T:tuple(name:chararray, value:chararray) } );
    dump rows;

    (faaaaaaaaazzzzzzeaaa,{(first,faaaaaaaaa),(last,zzzzzzeaaa)})
    (jaaaaaaaaazzzlaaaaaa,{(first,jaaaaaaaaa),(last,zzzlaaaaaa)})
    (naaaaaaaaazzzzzpaaaa,{(first,naaaaaaaaa),(last,zzzzzpaaaa)})
    (uaaaaaaaaazzzzzsaaaa,{(first,uaaaaaaaaa),(last,zzzzzsaaaa)})
    (vaaaaaaaaafaaaaaaaaa,{(first,vaaaaaaaaa),(last,faaaaaaaaa)})
    (zuaaaaaaaazpaaaaaaaa,{(first,zuaaaaaaaa),(last,zpaaaaaaaa)})
    (zuaaaaaaaazzzzhaaaaa,{(first,zuaaaaaaaa),(last,zzzzhaaaaa)})
    (zwaaaaaaaaznaaaaaaaa,{(first,zwaaaaaaaa),(last,znaaaaaaaa)})
    (zziaaaaaaazfaaaaaaaa,{(first,zziaaaaaaa),(last,zfaaaaaaaa)})
    (zzkaaaaaaazzzdaaaaaa,{(first,zzkaaaaaaa),(last,zzzdaaaaaa)})

    So far, so good.


    columns = foreach rows generate flatten(cols) as (name, value);
    dump columns;

    ()
    ()
    ()
    ()
    ()
    ()
    ()
    ()
    ()
    ()


    Not so good.



    I've tried multiple combinations w/ no success. If I just leave bag empty in the initial load, i.e. cols:bag{} and then leave off the as in the flatten I get something that looks like a list of tuples. But, trying to access $1 in that result gives me an Error 1000 index out of range. So, that's not the ticket either.

    What I'd really like is to flatten the bag into a map, but I'm about as successful there as well.

    Any help is most welcome . (Cassandra 7.4 and Pig 0.8.0)
  • Bob at Apr 6, 2011 at 11:16 pm
    Honestly, I'd rather have a keyed bag of maps on the initial load, but that'd work too. Is it really that hard to get cassandra data out that you need a UDF to do anything besides an initial dump?
    On Apr 6, 2011, at 3:51 PM, Jeremy Hanna wrote:

    I'm going to put a UDF up on the pygmalion project hopefully today that will convert that into something more usable. Props to Jacob from infochimps - he and I have been creating UDFs like that lately for use with Cassandra. There's an associated UDF for getting it back into the key, cols form to output to cassandra as well. I'll try to get that pushed tonight but take a look at:
    https://github.com/jeromatron/pygmalion/
    That's where I'll push the code - hopefully that will help.

    What it does is takes the data structure returned from cassandra and allows you say, give me the key and the values for these column names in a bag so for your example it would return:
    {(faaaaaaaaazzzzzzeaaa,faaaaaaaaa,zzzzzzeaaa)}
    and you could assign var names for each like key, first, last within pig.

    Anyway, if that helps, look for that soon. It's helping us use the output as tabular data.
    On Apr 6, 2011, at 5:40 PM, bob wrote:

    No matter what I try, I end up losing the tuples after the initial flatten. I'm using some auto-generated test data with firstn, last and a concatanation for the key. The script and outputs. . .

    rows = LOAD 'cassandra://Keyspace2/Standard1' USING CassandraStorage() as (key:chararray, cols:bag{T:tuple(name:chararray, value:chararray) } );
    dump rows;

    (faaaaaaaaazzzzzzeaaa,{(first,faaaaaaaaa),(last,zzzzzzeaaa)})
    (jaaaaaaaaazzzlaaaaaa,{(first,jaaaaaaaaa),(last,zzzlaaaaaa)})
    (naaaaaaaaazzzzzpaaaa,{(first,naaaaaaaaa),(last,zzzzzpaaaa)})
    (uaaaaaaaaazzzzzsaaaa,{(first,uaaaaaaaaa),(last,zzzzzsaaaa)})
    (vaaaaaaaaafaaaaaaaaa,{(first,vaaaaaaaaa),(last,faaaaaaaaa)})
    (zuaaaaaaaazpaaaaaaaa,{(first,zuaaaaaaaa),(last,zpaaaaaaaa)})
    (zuaaaaaaaazzzzhaaaaa,{(first,zuaaaaaaaa),(last,zzzzhaaaaa)})
    (zwaaaaaaaaznaaaaaaaa,{(first,zwaaaaaaaa),(last,znaaaaaaaa)})
    (zziaaaaaaazfaaaaaaaa,{(first,zziaaaaaaa),(last,zfaaaaaaaa)})
    (zzkaaaaaaazzzdaaaaaa,{(first,zzkaaaaaaa),(last,zzzdaaaaaa)})

    So far, so good.


    columns = foreach rows generate flatten(cols) as (name, value);
    dump columns;

    ()
    ()
    ()
    ()
    ()
    ()
    ()
    ()
    ()
    ()


    Not so good.



    I've tried multiple combinations w/ no success. If I just leave bag empty in the initial load, i.e. cols:bag{} and then leave off the as in the flatten I get something that looks like a list of tuples. But, trying to access $1 in that result gives me an Error 1000 index out of range. So, that's not the ticket either.

    What I'd really like is to flatten the bag into a map, but I'm about as successful there as well.

    Any help is most welcome . (Cassandra 7.4 and Pig 0.8.0)
  • Jeremy Hanna at Apr 6, 2011 at 11:20 pm

    On Apr 6, 2011, at 6:16 PM, bob wrote:

    Honestly, I'd rather have a keyed bag of maps on the initial load, but that'd work too. Is it really that hard to get cassandra data out that you need a UDF to do anything besides an initial dump?
    That's what we're doing because it just makes it easier to deal with tabular-like data - we don't have to munge through it quite as much. I'm still pretty low on my pig-fu but others on the list might have better answers on how to deal with that data structure.
    On Apr 6, 2011, at 3:51 PM, Jeremy Hanna wrote:

    I'm going to put a UDF up on the pygmalion project hopefully today that will convert that into something more usable. Props to Jacob from infochimps - he and I have been creating UDFs like that lately for use with Cassandra. There's an associated UDF for getting it back into the key, cols form to output to cassandra as well. I'll try to get that pushed tonight but take a look at:
    https://github.com/jeromatron/pygmalion/
    That's where I'll push the code - hopefully that will help.

    What it does is takes the data structure returned from cassandra and allows you say, give me the key and the values for these column names in a bag so for your example it would return:
    {(faaaaaaaaazzzzzzeaaa,faaaaaaaaa,zzzzzzeaaa)}
    and you could assign var names for each like key, first, last within pig.

    Anyway, if that helps, look for that soon. It's helping us use the output as tabular data.
    On Apr 6, 2011, at 5:40 PM, bob wrote:

    No matter what I try, I end up losing the tuples after the initial flatten. I'm using some auto-generated test data with firstn, last and a concatanation for the key. The script and outputs. . .

    rows = LOAD 'cassandra://Keyspace2/Standard1' USING CassandraStorage() as (key:chararray, cols:bag{T:tuple(name:chararray, value:chararray) } );
    dump rows;

    (faaaaaaaaazzzzzzeaaa,{(first,faaaaaaaaa),(last,zzzzzzeaaa)})
    (jaaaaaaaaazzzlaaaaaa,{(first,jaaaaaaaaa),(last,zzzlaaaaaa)})
    (naaaaaaaaazzzzzpaaaa,{(first,naaaaaaaaa),(last,zzzzzpaaaa)})
    (uaaaaaaaaazzzzzsaaaa,{(first,uaaaaaaaaa),(last,zzzzzsaaaa)})
    (vaaaaaaaaafaaaaaaaaa,{(first,vaaaaaaaaa),(last,faaaaaaaaa)})
    (zuaaaaaaaazpaaaaaaaa,{(first,zuaaaaaaaa),(last,zpaaaaaaaa)})
    (zuaaaaaaaazzzzhaaaaa,{(first,zuaaaaaaaa),(last,zzzzhaaaaa)})
    (zwaaaaaaaaznaaaaaaaa,{(first,zwaaaaaaaa),(last,znaaaaaaaa)})
    (zziaaaaaaazfaaaaaaaa,{(first,zziaaaaaaa),(last,zfaaaaaaaa)})
    (zzkaaaaaaazzzdaaaaaa,{(first,zzkaaaaaaa),(last,zzzdaaaaaa)})

    So far, so good.


    columns = foreach rows generate flatten(cols) as (name, value);
    dump columns;

    ()
    ()
    ()
    ()
    ()
    ()
    ()
    ()
    ()
    ()


    Not so good.



    I've tried multiple combinations w/ no success. If I just leave bag empty in the initial load, i.e. cols:bag{} and then leave off the as in the flatten I get something that looks like a list of tuples. But, trying to access $1 in that result gives me an Error 1000 index out of range. So, that's not the ticket either.

    What I'd really like is to flatten the bag into a map, but I'm about as successful there as well.

    Any help is most welcome . (Cassandra 7.4 and Pig 0.8.0)

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedApr 6, '11 at 10:40p
activeApr 6, '11 at 11:20p
posts4
users2
websitepig.apache.org

2 users in discussion

Bob: 2 posts Jeremy Hanna: 2 posts

People

Translate

site design / logo © 2023 Grokbase