FAQ
I'm fairly new to Pig and am having a problem with a pig script that works fine in local mode, but fails in Hadoop mode. I'm using Cloudera CDH2, which includes Pig 0.5.0 and Hadoop 0.20.1.

The line my script fails on is:
flattened = FOREACH joined GENERATE flatten($0) AS (session, feature, action), $3 AS count;

When I dump joined, each tuple looks like the following:

(fWq2XYmhvZWdAZ2FpbGJvcmRlbi5pbmZvLFE2L1RET0k4bUp1UHJRPT01yZ4kx,{(fWq2XYmhvZWdAZ2FpbGJvcmRlbi5pbmZvLFE2L1RET0k4bUp1UHJRPT01yZ4kx,mail_delete,btn)},,)

And here is what I get from describe on joined:

joined: {crossed::group: (dist_sessions::session: bytearray,dist_actions::feature: chararray,dist_actions::action: chararray),crossed::crossed: {dist_sessions::session: bytearray,dist_actions::feature: chararray,dist_actions::action: chararray},counted::id: (session: bytearray,feature: chararray,action: chararray),counted::count: long}

The exception I get is:

Backend error message
---------------------
java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to org.apache.pig.data.Tuple
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:309)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:418)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:386)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:366)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:238)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
at org.apache.hadoop.mapred.Child.main(Child.java:170)

Any help would be appreciated.

Thanks,
Jon

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedFeb 22, '10 at 4:20a
activeFeb 22, '10 at 4:20a
posts1
users1
websitepig.apache.org

1 user in discussion

Jon Armstrong: 1 post

People

Translate

site design / logo © 2021 Grokbase