Grokbase Groups Pig user January 2009
FAQ
Another bags(?) related issue.

The code below generally join three data files into one.

targetWords = load 'targetWords' as (
word: chararray, phrases: bag{t: tuple(id: chararray)});
historyWords = load 'historyWords' as (
word: chararray, phrases: bag{t: tuple(id: chararray)});
searchWords = load 'searchWords' as (
word: chararray, phrases: bag{t: tuple(id: chararray)});

a = cogroup targetWords by word, historyWords by word, searchWords by word;
b = foreach a generate
group as word,
JoinBagsOfBags(searchWords.phrases, historyWords.phrases,
targetWords.phrases) as phrases;
c = foreach b {
sorted = order phrases by id;
unique = distinct sorted;
generate
word,
COUNT(unique) as length,
unique as phrases;
}
d = order c by word;
store d into 'words';

No error. But when last FOREACH was changed to kill duplicates before sorting

c = foreach b {
unique = distinct phrases;
sorted = order unique by id;
generate
word,
COUNT(sorted) as length,
sorted as phrases;
}

It gives this error stack:

java.io.IOException: Unable to store for alias: 49 [java.lang.String
cannot be cast to org.apache.pig.data.Tuple]
at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:178)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:668)
at org.apache.pig.PigServer.execute(PigServer.java:659)
at org.apache.pig.PigServer.registerQuery(PigServer.java:281)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:439)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:249)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
at org.apache.pig.Main.main(Main.java:306)
Caused by: org.apache.pig.backend.executionengine.ExecException:
java.lang.String cannot be cast to org.apache.pig.data.Tuple
... 9 more
Caused by: java.lang.ClassCastException: java.lang.String cannot be
cast to org.apache.pig.data.Tuple
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:279)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSort$SortComparator.getResult(POSort.java:197)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSort$SortComparator.compare(POSort.java:142)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSort$SortComparator.compare(POSort.java:128)
at java.util.Arrays.mergeSort(Arrays.java:1270)
at java.util.Arrays.sort(Arrays.java:1210)
at java.util.Collections.sort(Collections.java:159)
at org.apache.pig.data.SortedDataBag$SortedDataBagIterator.(SortedDataBag.java:93)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSort.getNext(POSort.java:271)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:276)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:366)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:171)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:129)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:181)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:247)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:265)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:197)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:226)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSort.getNext(POSort.java:253)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:226)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.store(POStore.java:137)
at org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:62)
at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:166)
... 8 more

Am I doing something wrong in that code?

Sample data files (one line per file):

3b {(-Nic@X8D3A),(_+N~<^_:^W),(mdo&"wJ6.;),(>`R?:Z'&Uo),(1Srjv8^G"i),(`"A:9c\q6P)}

023 {(;6behhtrfb),(M*`VDirI\I),(Wi%2EbZ$J=),(?_9JR@zQp%),(h:!s+|*\\j)}

5kw {(AgDvp2zsm;),(6Z-T:2TqIv),(jC7`=4+7t9)}

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedJan 10, '09 at 1:44p
activeJan 10, '09 at 1:44p
posts1
users1
websitepig.apache.org

1 user in discussion

Daga: 1 post

People

Translate

site design / logo © 2021 Grokbase