Hello,
We have a file heirarchy we want to be accessable with MR/Hive/Pig. In this
way everyone can pick favorites :)
Currently the layout looks like this.
/user/root/data/datepartition1/subpartition2/{sequence file1, sequence
fileN)
I have just installed pig-0.6.0. I am trying to follow the advice here (
http://stackoverflow.com/questions/2423949/storing-data-to-sequencefile-from-apache-pig
)
REGISTER /opt/pig-0.6.0/contrib/piggybank/java/piggybank.jar;
DEFINE SequenceFileLoader
org.apache.pig.piggybank.storage.SequenceFileLoader();
raw = load 'datafile' USING SequenceFileLoader as (version:chararray,
id:int,date:chararray);
2010-04-20 12:10:46,821 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 2999: Unexpected internal error.
org.apache.pig.impl.logicalLayer.FrontendException cannot be cast to
java.lang.Error
[root@rs01 piggybank]# more /root/pig_1271779744816.log
Pig Stack Trace
---------------
ERROR 2999: Unexpected internal error.
org.apache.pig.impl.logicalLayer.FrontendException cannot be cast to
java.lan
g.Error
java.lang.ClassCastException:
org.apache.pig.impl.logicalLayer.FrontendException cannot be cast to
java.lang.Error
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1440)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:949)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:738)
at
org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1036)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:986)
at org.apache.pig.PigServer.registerQuery(PigServer.java:386)
at
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:720)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
at org.apache.pig.Main.main(Main.java:352)
So it seems like I have a bug, or have I done something wrong. looks like a
bug because if Pig can't cast the error correctly something is wrong.
Two questions:
1) Can I load all the files in a directory rather then operating on one
file?
raw = load '/datadir/*' USING SequenceFileLoader as (version:chararray,
id:int,date:chararray);
Rather then
raw = load '/datafile' USING SequenceFileLoader as (version:chararray,
id:int,date:chararray);
2) PigStorage seems to let me specify a tab delimeter. How does once specify
a tab delimeter with SequenceFileLoader? Or does one have to pass the entire
line to some other Pig Component to be tokenized.
Thank you,