Grokbase Groups Pig user July 2011
FAQ
Hello again,

I have a relation with the following schema:

regrouped: {group: (artistid: int,country: int,week:
chararray),projected_joined_albums: {key: (artistid: int,country:
int,week: chararray),timestamp: long,albumid: int,numtracks:
long,reach: int,title_len: long}}

having grouped the projected_joined_albums relation on key.

However, when I store it using the default storage format:

store regrouped into 'dupetest/regrouped';

The resulting file looks like this:

(1000062,83,2011-06-13T00:00:00.000Z)
{(1000062,1308268800,274377251,,1,11),(1000062,1308268800,275105079,,7,13),(1000062,1308268800,270919728,1,67,4)}

The first column is the grouping field ('key'), this is correct.
However the second column is a bag of *flat* tuples, each having just
the artistid (an integer) as the initial element, where I would have
expected to find the entire 'key' tuple.

The rest of the fields of each tuple are exactly as I would expect
them -- timestamp, albumid, numtracks, reach, title_len.

Is this a bug? (Pig 0.8.0 from Cloudera CDH3u0 BTW)

Also, it occurs to me that this may relate to the other question I
posted, about "foreach regrouped" with an inner order-by failing with
the following error:

java.lang.ClassCastException: java.lang.Integer cannot be cast to
org.apache.pig.data.Tuple

Assuming Pig's temp-file version of regrouped looks the same as the
the one I got from store, I could see how foreach might fall over, if
it was expecting the first element to be the key tuple but instead got
the artistid!

Thanks again,

Andrew.

Search Discussions

  • Daniel Dai at Jul 22, 2011 at 10:51 pm
    Yes, it is strongly recommended to use 0.8.1, which we fixed quite a few
    important bugs.

    Daniel
    On Fri, Jul 22, 2011 at 6:30 AM, Andrew Clegg wrote:

    Hello again,

    I have a relation with the following schema:

    regrouped: {group: (artistid: int,country: int,week:
    chararray),projected_joined_albums: {key: (artistid: int,country:
    int,week: chararray),timestamp: long,albumid: int,numtracks:
    long,reach: int,title_len: long}}

    having grouped the projected_joined_albums relation on key.

    However, when I store it using the default storage format:

    store regrouped into 'dupetest/regrouped';

    The resulting file looks like this:

    (1000062,83,2011-06-13T00:00:00.000Z)

    {(1000062,1308268800,274377251,,1,11),(1000062,1308268800,275105079,,7,13),(1000062,1308268800,270919728,1,67,4)}

    The first column is the grouping field ('key'), this is correct.
    However the second column is a bag of *flat* tuples, each having just
    the artistid (an integer) as the initial element, where I would have
    expected to find the entire 'key' tuple.

    The rest of the fields of each tuple are exactly as I would expect
    them -- timestamp, albumid, numtracks, reach, title_len.

    Is this a bug? (Pig 0.8.0 from Cloudera CDH3u0 BTW)

    Also, it occurs to me that this may relate to the other question I
    posted, about "foreach regrouped" with an inner order-by failing with
    the following error:

    java.lang.ClassCastException: java.lang.Integer cannot be cast to
    org.apache.pig.data.Tuple

    Assuming Pig's temp-file version of regrouped looks the same as the
    the one I got from store, I could see how foreach might fall over, if
    it was expecting the first element to be the key tuple but instead got
    the artistid!

    Thanks again,

    Andrew.

    --

    http://tinyurl.com/andrew-clegg-linkedin | http://twitter.com/andrew_clegg

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedJul 22, '11 at 2:30p
activeJul 22, '11 at 10:51p
posts2
users2
websitepig.apache.org

2 users in discussion

Daniel Dai: 1 post Andrew Clegg: 1 post

People

Translate

site design / logo © 2021 Grokbase