Grokbase Groups Pig user October 2010
FAQ
Hi,

Our data contain tuples one of whose fields is a tuple containing a
bag field and we've seen the following exceptions when we access the
bag field:

java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot
be cast to org.apache.pig.data.DataBag
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:479)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:197)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:477)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:197)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:336)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:288)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:433)
at

We can reproduce the exceptions using the following scripts.

1. A = LOAD 'test_input' as (a:int, T:(list:{B:(key:int, value:int)},
world:chararray) );
describe A;
/*
test_input contains:
12 ({(2,13),(4,5)}, 'hello')
24 ({(8,17),(9,11),(3,4)}, 'world')

and got A's schema as:
A: {a: int,T: (list: {B: (key: int,value: int)},world: chararray)}
*/

B = FOREACH A GENERATE T.list, T.world;
describe B;
/*
got:
B: {list: {B: (key: int,value: int)},world: chararray}
*/

dump B;

2.
......

b = foreach a generate member_id, primary_email, year_born;
c = group b by member_id;
d = foreach c generate group as member_id, b;
e = group d by member_id;
f = foreach e generate group as member_id, d;
g = foreach f generate member_id as A, flatten(d);

h = foreach g generate $0 as A, $1 AS B, $2 AS C;
describe h;
/* get the following schema:
h: {A: int,B: int,C: {member_id: int,primary_email: chararray,year_born: int}}
*/

h = foreach h generate $0 as A, Swap($1, $2) AS T;
describe h;
/* We use Swap to generate a tuple out of the last two fields and got
the following schema
h: {A: int,T: (C: {member_id: int,primary_email: chararray,year_born:
int},B: int)}
*/
g = foreach h generate A, T.C;
describe g;

g = limit g 15;
dump g;

Is it a known issue?

Best,
Lin

Search Discussions

  • Thejas M Nair at Oct 14, 2010 at 10:54 pm
    Hi Lin,
    This does not seem to be a known issue. Can you please open a new jira ?
    fyi, I get a java.lang.NullPointerException when I tried running query 1
    with 0.7 or trunk versions.

    Thanks,
    Thejas


    On 10/12/10 3:38 PM, "Lin Guo" wrote:

    Hi,

    Our data contain tuples one of whose fields is a tuple containing a
    bag field and we've seen the following exceptions when we access the
    bag field:

    java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot
    be cast to org.apache.pig.data.DataBag
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperator
    s.POProject.processInputBag(POProject.java:479)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperator
    s.POProject.getNext(POProject.java:197)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperator
    s.POProject.processInputBag(POProject.java:477)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperator
    s.POProject.getNext(POProject.java:197)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperator
    s.POForEach.processPlan(POForEach.java:336)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperator
    s.POForEach.getNext(POForEach.java:288)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Redu
    ce.runPipeline(PigMapReduce.java:433)
    at

    We can reproduce the exceptions using the following scripts.

    1. A = LOAD 'test_input' as (a:int, T:(list:{B:(key:int, value:int)},
    world:chararray) );
    describe A;
    /*
    test_input contains:
    12 ({(2,13),(4,5)}, 'hello')
    24 ({(8,17),(9,11),(3,4)}, 'world')

    and got A's schema as:
    A: {a: int,T: (list: {B: (key: int,value: int)},world: chararray)}
    */

    B = FOREACH A GENERATE T.list, T.world;
    describe B;
    /*
    got:
    B: {list: {B: (key: int,value: int)},world: chararray}
    */

    dump B;

    2.
    ......

    b = foreach a generate member_id, primary_email, year_born;
    c = group b by member_id;
    d = foreach c generate group as member_id, b;
    e = group d by member_id;
    f = foreach e generate group as member_id, d;
    g = foreach f generate member_id as A, flatten(d);

    h = foreach g generate $0 as A, $1 AS B, $2 AS C;
    describe h;
    /* get the following schema:
    h: {A: int,B: int,C: {member_id: int,primary_email: chararray,year_born: int}}
    */

    h = foreach h generate $0 as A, Swap($1, $2) AS T;
    describe h;
    /* We use Swap to generate a tuple out of the last two fields and got
    the following schema
    h: {A: int,T: (C: {member_id: int,primary_email: chararray,year_born:
    int},B: int)}
    */
    g = foreach h generate A, T.C;
    describe g;

    g = limit g 15;
    dump g;

    Is it a known issue?

    Best,
    Lin

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedOct 12, '10 at 10:38p
activeOct 14, '10 at 10:54p
posts2
users2
websitepig.apache.org

2 users in discussion

Thejas M Nair: 1 post Lin Guo: 1 post

People

Translate

site design / logo © 2021 Grokbase