Grokbase Groups Pig user January 2011
FAQ
Hello, I'm looking for some clues to help me fix an annoying error I'm
getting using Pig.

I need to parse a large JSON file so I grabbed kimsterv's (
https://gist.github.com/601331) JSON loader, compiled it and successfully
tested it on my laptop via -x local. However, when I try to run it on the
edgenode of our dev hadoop instance I am unable to get it to work, even if I
run it in -x local. I get
"org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
create input splits for test.json". I looked through the mailing list for
this message, only to find a mention of it being related to LZO compression
issues. I'm not using any file compression and this error still occurs when
running in -x local on the edgenode of the dev cluster. Is there some
environment variables I'm missing? Maybe some permissions issues I'm unaware
of? Suggestions and theories welcome!

Hadoop version: Hadoop 0.20.2+737
Pig version: 0.7.0+16 (compiled against the pig 0.7.0 jar)

Command line:
java -cp '/usr/lib/pig/*:/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:libs/*:.'
org.apache.pig.Main -v -x local json.pig

Pig script:
REGISTER /home/geoffeg/pig-functions/jsontester.jar;
-- file:// should specify the local FS, remove file:// to specify HDFS
A = LOAD 'file://home/geoffeg/test.json' using
org.geoffeg.hadoop.pig.loader.PigJsonLoader() as ( json: map[] );
B = foreach A generate json#'_keyword';
DUMP B;

Full error/log:
2011-01-09 22:33:29,692 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
to hadoop file system at: file:///
2011-01-09 22:33:30,345 [main] INFO
org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned
for A
2011-01-09 22:33:30,345 [main] INFO
org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - Map key required
for A: $0->[_keyword]
2011-01-09 22:33:30,455 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
Store(file:/tmp/temp1814319995/tmp1141533149:org.apache.pig.builtin.BinStorage)
- 1-36 Operator Key: 1-36)
2011-01-09 22:33:30,482 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2011-01-09 22:33:30,482 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
2011-01-09 22:33:30,517 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with
processName=JobTracker, sessionId=
2011-01-09 22:33:30,522 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2011-01-09 22:33:32,520 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2011-01-09 22:33:32,552 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-01-09 22:33:32,552 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2011-01-09 22:33:32,562 [Thread-2] WARN org.apache.hadoop.mapred.JobClient
- Use GenericOptionsParser for parsing the arguments. Applications should
implement Tool for the same.
2011-01-09 22:33:32,692 [Thread-2] INFO org.apache.hadoop.mapred.JobClient
- Cleaning up the staging area
file:/tmp/hadoop-geoffeg/mapred/staging/geoffeg395595954/.staging/job_local_0001
2011-01-09 22:33:33,054 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2011-01-09 22:33:33,054 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2011-01-09 22:33:33,054 [main] ERROR
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map reduce job(s) failed!
2011-01-09 22:33:33,064 [main] ERROR
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed to produce result in: "file:/tmp/temp1814319995/tmp1141533149"
2011-01-09 22:33:33,064 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Records written : Unable to determine number of records written
2011-01-09 22:33:33,065 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Bytes written : Unable to determine number of bytes written
2011-01-09 22:33:33,065 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Spillable Memory Manager spill count : 0
2011-01-09 22:33:33,065 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Proactive spill count : 0
2011-01-09 22:33:33,065 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed!
2011-01-09 22:33:33,133 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 2997: Unable to recreate exception from backend error:
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
create input splits for: file://home/geoffeg/test.json
2011-01-09 22:33:33,134 [main] ERROR org.apache.pig.tools.grunt.Grunt -
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
open iterator for alias B
at org.apache.pig.PigServer.openIterator(PigServer.java:607)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:545)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:163)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:139)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
at org.apache.pig.Main.main(Main.java:414)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997:
Unable to recreate exception from backend error:
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
create input splits for: file://home/geoffeg/test.json
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:169)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:270)
at
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1007)
at org.apache.pig.PigServer.store(PigServer.java:697)
at org.apache.pig.PigServer.openIterator(PigServer.java:590)
... 6 more

--
Sent from my email client.

Search Discussions

  • Daniel Dai at Jan 11, 2011 at 7:27 pm
    I tried JSON loader you mentioned on 0.7, seems works fine for me. I
    didn't get the error message you mention. Are you still seeing those errors?

    Daniel

    Geoffrey Gallaway wrote:
    Hello, I'm looking for some clues to help me fix an annoying error I'm
    getting using Pig.

    I need to parse a large JSON file so I grabbed kimsterv's (
    https://gist.github.com/601331) JSON loader, compiled it and successfully
    tested it on my laptop via -x local. However, when I try to run it on the
    edgenode of our dev hadoop instance I am unable to get it to work, even if I
    run it in -x local. I get
    "org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
    create input splits for test.json". I looked through the mailing list for
    this message, only to find a mention of it being related to LZO compression
    issues. I'm not using any file compression and this error still occurs when
    running in -x local on the edgenode of the dev cluster. Is there some
    environment variables I'm missing? Maybe some permissions issues I'm unaware
    of? Suggestions and theories welcome!

    Hadoop version: Hadoop 0.20.2+737
    Pig version: 0.7.0+16 (compiled against the pig 0.7.0 jar)

    Command line:
    java -cp '/usr/lib/pig/*:/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:libs/*:.'
    org.apache.pig.Main -v -x local json.pig

    Pig script:
    REGISTER /home/geoffeg/pig-functions/jsontester.jar;
    -- file:// should specify the local FS, remove file:// to specify HDFS
    A = LOAD 'file://home/geoffeg/test.json' using
    org.geoffeg.hadoop.pig.loader.PigJsonLoader() as ( json: map[] );
    B = foreach A generate json#'_keyword';
    DUMP B;

    Full error/log:
    2011-01-09 22:33:29,692 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
    to hadoop file system at: file:///
    2011-01-09 22:33:30,345 [main] INFO
    org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned
    for A
    2011-01-09 22:33:30,345 [main] INFO
    org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - Map key required
    for A: $0->[_keyword]
    2011-01-09 22:33:30,455 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
    Store(file:/tmp/temp1814319995/tmp1141533149:org.apache.pig.builtin.BinStorage)
    - 1-36 Operator Key: 1-36)
    2011-01-09 22:33:30,482 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size before optimization: 1
    2011-01-09 22:33:30,482 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size after optimization: 1
    2011-01-09 22:33:30,517 [main] INFO
    org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with
    processName=JobTracker, sessionId=
    2011-01-09 22:33:30,522 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2011-01-09 22:33:32,520 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Setting up single store job
    2011-01-09 22:33:32,552 [main] INFO
    org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
    with processName=JobTracker, sessionId= - already initialized
    2011-01-09 22:33:32,552 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 1 map-reduce job(s) waiting for submission.
    2011-01-09 22:33:32,562 [Thread-2] WARN org.apache.hadoop.mapred.JobClient
    - Use GenericOptionsParser for parsing the arguments. Applications should
    implement Tool for the same.
    2011-01-09 22:33:32,692 [Thread-2] INFO org.apache.hadoop.mapred.JobClient
    - Cleaning up the staging area
    file:/tmp/hadoop-geoffeg/mapred/staging/geoffeg395595954/.staging/job_local_0001
    2011-01-09 22:33:33,054 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    2011-01-09 22:33:33,054 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 100% complete
    2011-01-09 22:33:33,054 [main] ERROR
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 1 map reduce job(s) failed!
    2011-01-09 22:33:33,064 [main] ERROR
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Failed to produce result in: "file:/tmp/temp1814319995/tmp1141533149"
    2011-01-09 22:33:33,064 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Records written : Unable to determine number of records written
    2011-01-09 22:33:33,065 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Bytes written : Unable to determine number of bytes written
    2011-01-09 22:33:33,065 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Spillable Memory Manager spill count : 0
    2011-01-09 22:33:33,065 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Proactive spill count : 0
    2011-01-09 22:33:33,065 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Failed!
    2011-01-09 22:33:33,133 [main] ERROR org.apache.pig.tools.grunt.Grunt -
    ERROR 2997: Unable to recreate exception from backend error:
    org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
    create input splits for: file://home/geoffeg/test.json
    2011-01-09 22:33:33,134 [main] ERROR org.apache.pig.tools.grunt.Grunt -
    org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
    open iterator for alias B
    at org.apache.pig.PigServer.openIterator(PigServer.java:607)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:545)
    at
    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:163)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:139)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
    at org.apache.pig.Main.main(Main.java:414)
    Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997:
    Unable to recreate exception from backend error:
    org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
    create input splits for: file://home/geoffeg/test.json
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:169)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:270)
    at
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308)
    at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1007)
    at org.apache.pig.PigServer.store(PigServer.java:697)
    at org.apache.pig.PigServer.openIterator(PigServer.java:590)
    ... 6 more
  • Joe Crobak at Jan 12, 2011 at 1:44 pm
    A = LOAD 'file://home/geoffeg/test.json' will try to load using a relative
    path. Pig will understand file:/home/geoffeg/test.json or
    file:///home/geoffeg/test.json to load the absolute path. Same goes for a
    file in hdfs://

    HTH,
    Joe
    On Sun, Jan 9, 2011 at 11:47 PM, Geoffrey Gallaway wrote:

    Hello, I'm looking for some clues to help me fix an annoying error I'm
    getting using Pig.

    I need to parse a large JSON file so I grabbed kimsterv's (
    https://gist.github.com/601331) JSON loader, compiled it and successfully
    tested it on my laptop via -x local. However, when I try to run it on the
    edgenode of our dev hadoop instance I am unable to get it to work, even if
    I
    run it in -x local. I get
    "org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable
    to
    create input splits for test.json". I looked through the mailing list for
    this message, only to find a mention of it being related to LZO compression
    issues. I'm not using any file compression and this error still occurs when
    running in -x local on the edgenode of the dev cluster. Is there some
    environment variables I'm missing? Maybe some permissions issues I'm
    unaware
    of? Suggestions and theories welcome!

    Hadoop version: Hadoop 0.20.2+737
    Pig version: 0.7.0+16 (compiled against the pig 0.7.0 jar)

    Command line:
    java -cp '/usr/lib/pig/*:/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:libs/*:.'
    org.apache.pig.Main -v -x local json.pig

    Pig script:
    REGISTER /home/geoffeg/pig-functions/jsontester.jar;
    -- file:// should specify the local FS, remove file:// to specify HDFS
    A = LOAD 'file://home/geoffeg/test.json' using
    org.geoffeg.hadoop.pig.loader.PigJsonLoader() as ( json: map[] );
    B = foreach A generate json#'_keyword';
    DUMP B;

    Full error/log:
    2011-01-09 22:33:29,692 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
    Connecting
    to hadoop file system at: file:///
    2011-01-09 22:33:30,345 [main] INFO
    org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned
    for A
    2011-01-09 22:33:30,345 [main] INFO
    org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - Map key required
    for A: $0->[_keyword]
    2011-01-09 22:33:30,455 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:

    Store(file:/tmp/temp1814319995/tmp1141533149:org.apache.pig.builtin.BinStorage)
    - 1-36 Operator Key: 1-36)
    2011-01-09 22:33:30,482 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size before optimization: 1
    2011-01-09 22:33:30,482 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size after optimization: 1
    2011-01-09 22:33:30,517 [main] INFO
    org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with
    processName=JobTracker, sessionId=
    2011-01-09 22:33:30,522 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2011-01-09 22:33:32,520 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Setting up single store job
    2011-01-09 22:33:32,552 [main] INFO
    org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
    with processName=JobTracker, sessionId= - already initialized
    2011-01-09 22:33:32,552 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 1 map-reduce job(s) waiting for submission.
    2011-01-09 22:33:32,562 [Thread-2] WARN org.apache.hadoop.mapred.JobClient
    - Use GenericOptionsParser for parsing the arguments. Applications should
    implement Tool for the same.
    2011-01-09 22:33:32,692 [Thread-2] INFO org.apache.hadoop.mapred.JobClient
    - Cleaning up the staging area

    file:/tmp/hadoop-geoffeg/mapred/staging/geoffeg395595954/.staging/job_local_0001
    2011-01-09 22:33:33,054 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    2011-01-09 22:33:33,054 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 100% complete
    2011-01-09 22:33:33,054 [main] ERROR

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 1 map reduce job(s) failed!
    2011-01-09 22:33:33,064 [main] ERROR

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Failed to produce result in: "file:/tmp/temp1814319995/tmp1141533149"
    2011-01-09 22:33:33,064 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Records written : Unable to determine number of records written
    2011-01-09 22:33:33,065 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Bytes written : Unable to determine number of bytes written
    2011-01-09 22:33:33,065 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Spillable Memory Manager spill count : 0
    2011-01-09 22:33:33,065 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Proactive spill count : 0
    2011-01-09 22:33:33,065 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Failed!
    2011-01-09 22:33:33,133 [main] ERROR org.apache.pig.tools.grunt.Grunt -
    ERROR 2997: Unable to recreate exception from backend error:
    org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
    create input splits for: file://home/geoffeg/test.json
    2011-01-09 22:33:33,134 [main] ERROR org.apache.pig.tools.grunt.Grunt -
    org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
    open iterator for alias B
    at org.apache.pig.PigServer.openIterator(PigServer.java:607)
    at
    org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:545)
    at

    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
    at

    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:163)
    at

    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:139)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
    at org.apache.pig.Main.main(Main.java:414)
    Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
    2997:
    Unable to recreate exception from backend error:
    org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
    create input splits for: file://home/geoffeg/test.json
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:169)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:270)
    at

    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308)
    at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1007)
    at org.apache.pig.PigServer.store(PigServer.java:697)
    at org.apache.pig.PigServer.openIterator(PigServer.java:590)
    ... 6 more

    --
    Sent from my email client.
  • Geoffrey Gallaway at Jan 12, 2011 at 9:25 pm
    Thanks to Joe and Daniel, I was able to fix this issue.

    It was a combination of ambiguity about file paths (which Joe's message
    helped me confirm) and an error in my Java that wasn't causing an exception
    and failing silently.

    Thanks,
    Geoff
    On Wed, Jan 12, 2011 at 7:43 AM, Joe Crobak wrote:

    A = LOAD 'file://home/geoffeg/test.json' will try to load using a relative
    path. Pig will understand file:/home/geoffeg/test.json or
    file:///home/geoffeg/test.json to load the absolute path. Same goes for a
    file in hdfs://

    HTH,
    Joe

    On Sun, Jan 9, 2011 at 11:47 PM, Geoffrey Gallaway <geoffeg@geoffeg.org
    wrote:
    Hello, I'm looking for some clues to help me fix an annoying error I'm
    getting using Pig.

    I need to parse a large JSON file so I grabbed kimsterv's (
    https://gist.github.com/601331) JSON loader, compiled it and
    successfully
    tested it on my laptop via -x local. However, when I try to run it on the
    edgenode of our dev hadoop instance I am unable to get it to work, even if
    I
    run it in -x local. I get
    "org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable
    to
    create input splits for test.json". I looked through the mailing list for
    this message, only to find a mention of it being related to LZO
    compression
    issues. I'm not using any file compression and this error still occurs when
    running in -x local on the edgenode of the dev cluster. Is there some
    environment variables I'm missing? Maybe some permissions issues I'm
    unaware
    of? Suggestions and theories welcome!

    Hadoop version: Hadoop 0.20.2+737
    Pig version: 0.7.0+16 (compiled against the pig 0.7.0 jar)

    Command line:
    java -cp
    '/usr/lib/pig/*:/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:libs/*:.'
    org.apache.pig.Main -v -x local json.pig

    Pig script:
    REGISTER /home/geoffeg/pig-functions/jsontester.jar;
    -- file:// should specify the local FS, remove file:// to specify HDFS
    A = LOAD 'file://home/geoffeg/test.json' using
    org.geoffeg.hadoop.pig.loader.PigJsonLoader() as ( json: map[] );
    B = foreach A generate json#'_keyword';
    DUMP B;

    Full error/log:
    2011-01-09 22:33:29,692 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
    Connecting
    to hadoop file system at: file:///
    2011-01-09 22:33:30,345 [main] INFO
    org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned
    for A
    2011-01-09 22:33:30,345 [main] INFO
    org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - Map key required
    for A: $0->[_keyword]
    2011-01-09 22:33:30,455 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:

    Store(file:/tmp/temp1814319995/tmp1141533149:org.apache.pig.builtin.BinStorage)
    - 1-36 Operator Key: 1-36)
    2011-01-09 22:33:30,482 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size before optimization: 1
    2011-01-09 22:33:30,482 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size after optimization: 1
    2011-01-09 22:33:30,517 [main] INFO
    org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with
    processName=JobTracker, sessionId=
    2011-01-09 22:33:30,522 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2011-01-09 22:33:32,520 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Setting up single store job
    2011-01-09 22:33:32,552 [main] INFO
    org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
    with processName=JobTracker, sessionId= - already initialized
    2011-01-09 22:33:32,552 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 1 map-reduce job(s) waiting for submission.
    2011-01-09 22:33:32,562 [Thread-2] WARN
    org.apache.hadoop.mapred.JobClient
    - Use GenericOptionsParser for parsing the arguments. Applications should
    implement Tool for the same.
    2011-01-09 22:33:32,692 [Thread-2] INFO
    org.apache.hadoop.mapred.JobClient
    - Cleaning up the staging area

    file:/tmp/hadoop-geoffeg/mapred/staging/geoffeg395595954/.staging/job_local_0001
    2011-01-09 22:33:33,054 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    2011-01-09 22:33:33,054 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 100% complete
    2011-01-09 22:33:33,054 [main] ERROR

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 1 map reduce job(s) failed!
    2011-01-09 22:33:33,064 [main] ERROR

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Failed to produce result in: "file:/tmp/temp1814319995/tmp1141533149"
    2011-01-09 22:33:33,064 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Records written : Unable to determine number of records written
    2011-01-09 22:33:33,065 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Bytes written : Unable to determine number of bytes written
    2011-01-09 22:33:33,065 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Spillable Memory Manager spill count : 0
    2011-01-09 22:33:33,065 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Proactive spill count : 0
    2011-01-09 22:33:33,065 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Failed!
    2011-01-09 22:33:33,133 [main] ERROR org.apache.pig.tools.grunt.Grunt -
    ERROR 2997: Unable to recreate exception from backend error:
    org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
    create input splits for: file://home/geoffeg/test.json
    2011-01-09 22:33:33,134 [main] ERROR org.apache.pig.tools.grunt.Grunt -
    org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
    open iterator for alias B
    at org.apache.pig.PigServer.openIterator(PigServer.java:607)
    at
    org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:545)
    at

    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
    at

    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:163)
    at

    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:139)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
    at org.apache.pig.Main.main(Main.java:414)
    Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
    2997:
    Unable to recreate exception from backend error:
    org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
    create input splits for: file://home/geoffeg/test.json
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:169)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:270)
    at

    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1007)
    at org.apache.pig.PigServer.store(PigServer.java:697)
    at org.apache.pig.PigServer.openIterator(PigServer.java:590)
    ... 6 more

    --
    Sent from my email client.


    --
    Sent from my email client.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedJan 10, '11 at 4:48a
activeJan 12, '11 at 9:25p
posts4
users3
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase