getting using Pig.
I need to parse a large JSON file so I grabbed kimsterv's (
https://gist.github.com/601331) JSON loader, compiled it and successfully
tested it on my laptop via -x local. However, when I try to run it on the
edgenode of our dev hadoop instance I am unable to get it to work, even if I
run it in -x local. I get
"org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
create input splits for test.json". I looked through the mailing list for
this message, only to find a mention of it being related to LZO compression
issues. I'm not using any file compression and this error still occurs when
running in -x local on the edgenode of the dev cluster. Is there some
environment variables I'm missing? Maybe some permissions issues I'm unaware
of? Suggestions and theories welcome!
Hadoop version: Hadoop 0.20.2+737
Pig version: 0.7.0+16 (compiled against the pig 0.7.0 jar)
Command line:
java -cp '/usr/lib/pig/*:/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:libs/*:.'
org.apache.pig.Main -v -x local json.pig
Pig script:
REGISTER /home/geoffeg/pig-functions/jsontester.jar;
-- file:// should specify the local FS, remove file:// to specify HDFS
A = LOAD 'file://home/geoffeg/test.json' using
org.geoffeg.hadoop.pig.loader.PigJsonLoader() as ( json: map[] );
B = foreach A generate json#'_keyword';
DUMP B;
Full error/log:
2011-01-09 22:33:29,692 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
to hadoop file system at: file:///
2011-01-09 22:33:30,345 [main] INFO
org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned
for A
2011-01-09 22:33:30,345 [main] INFO
org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - Map key required
for A: $0->[_keyword]
2011-01-09 22:33:30,455 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
Store(file:/tmp/temp1814319995/tmp1141533149:org.apache.pig.builtin.BinStorage)
- 1-36 Operator Key: 1-36)
2011-01-09 22:33:30,482 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2011-01-09 22:33:30,482 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
2011-01-09 22:33:30,517 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with
processName=JobTracker, sessionId=
2011-01-09 22:33:30,522 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2011-01-09 22:33:32,520 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2011-01-09 22:33:32,552 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-01-09 22:33:32,552 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2011-01-09 22:33:32,562 [Thread-2] WARN org.apache.hadoop.mapred.JobClient
- Use GenericOptionsParser for parsing the arguments. Applications should
implement Tool for the same.
2011-01-09 22:33:32,692 [Thread-2] INFO org.apache.hadoop.mapred.JobClient
- Cleaning up the staging area
file:/tmp/hadoop-geoffeg/mapred/staging/geoffeg395595954/.staging/job_local_0001
2011-01-09 22:33:33,054 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2011-01-09 22:33:33,054 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2011-01-09 22:33:33,054 [main] ERROR
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map reduce job(s) failed!
2011-01-09 22:33:33,064 [main] ERROR
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed to produce result in: "file:/tmp/temp1814319995/tmp1141533149"
2011-01-09 22:33:33,064 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Records written : Unable to determine number of records written
2011-01-09 22:33:33,065 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Bytes written : Unable to determine number of bytes written
2011-01-09 22:33:33,065 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Spillable Memory Manager spill count : 0
2011-01-09 22:33:33,065 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Proactive spill count : 0
2011-01-09 22:33:33,065 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed!
2011-01-09 22:33:33,133 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 2997: Unable to recreate exception from backend error:
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
create input splits for: file://home/geoffeg/test.json
2011-01-09 22:33:33,134 [main] ERROR org.apache.pig.tools.grunt.Grunt -
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
open iterator for alias B
at org.apache.pig.PigServer.openIterator(PigServer.java:607)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:545)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:163)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:139)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
at org.apache.pig.Main.main(Main.java:414)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997:
Unable to recreate exception from backend error:
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
create input splits for: file://home/geoffeg/test.json
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:169)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:270)
at
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1007)
at org.apache.pig.PigServer.store(PigServer.java:697)
at org.apache.pig.PigServer.openIterator(PigServer.java:590)
... 6 more
--
Sent from my email client.