Hi everybody,
I'm trying to use vanilla Pig 0.7.0 to generate monthly consolidations
of log files with relatively long lines: 95 fields and growing, of which
I'll be using just 7. Just so I didn't have to declare all the fields in
the LOAD command, I tried to define the schema in my first
FOREACH...GENERATE, so the first lines of my script look like this:
input = LOAD '/tmp/test.log';
A = FILTER input BY SIZE(*) >= 95;
B = FOREACH A GENERATE (long)$94, (chararray)$93, (long)$16, (long)$27,
(long)$23, (int)$2, (int)$3
AS publisher, associate, site, category,
story, hits, comments;
As you can guess by now, Pig complains while still parsing:
ERROR 1000: Error during parsing. Invalid alias: category in null
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error
during parsing. Invalid alias: associate in null
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1170)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
at
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:73)
Am I overlooking anything? Should I give up and declare a 95-field
schema? Write a LOAD UDF? Or is there a simpler way to do what I want?
Thank you!
Marcos Rubinelli