Grokbase Groups Pig user October 2010
FAQ
Hi all,Facing a weird problem and wondering if anyone has run into this before. I've been playing with PigServer to programmatically run some simple pig scripts and it does not seem to be connecting to HDFS when I pass in ExecType.MAPREDUCE.I am running in pseudo-distributed mode and have the tasktracker and namenode both running on default ports. When I run scripts by using "pig script.pig" or from the grunt console it connects to hdfs and works fine.Do I need to specify some additional properties in the PigServer constructor, or construct a custom PigContext? I had assumed that by passing ExecType.MAPREDUCE and using the defaults, everything would be fine.Would really appreciate any insight or anecdotes of others using PigServer and how they have it set up. Thanks a bunch!-ZachHere is the code I'm using:PigServer pigServer = new PigServer("mapreduce");pigServer.setBatchOn();pigServer.registerScript("/Users/zach/Desktop/test.pig");List<ExecJob> jobs = pigServer.executeBat
ch();and
here is the log output:0    [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine  - Connecting to hadoop file system at: file:///622  [main] INFO  org.apache.pig.impl.logicalLayer.optimizer.PruneColumns  - No column pruned for pages622  [main] INFO  org.apache.pig.impl.logicalLayer.optimizer.PruneColumns  - No map keys pruned for pages659  [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics  - Initializing JVM Metrics with processName=JobTracker, sessionId=751  [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine  - (Name: Store(file:///output:PigStorage) - 1-70 Operator Key: 1-70)789  [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer  - MR plan size before optimization: 1790  [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer  - MR plan size after optimization: 1815  [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetri
cs  - C
annot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized822  [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics  - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized822  [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler  - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.32534 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler  - Setting up single store job2582 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics  - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized2582 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - 1 map-reduce job(s) waiting for submission.2590 [Thread-4] WARN  org.apache.hadoop.mapred.JobClient  - Use GenericOptionsParser for parsing the arguments. Applications should imp
lement T
ool for the same.2746 [Thread-4] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics  - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized2765 [Thread-4] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics  - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized3083 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - 0% complete3084 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - 100% complete3084 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - 1 map reduce job(s) failed!3085 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher  - There is no log file to write to.3085 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher  - Backend error message during job submissionorg.apache.pig.backend.executionengine
.ExecExc
eption: ERROR 2118: Unable to create input splits for: file:///input at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:269) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378) at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279) at java.lang.Thread.run(Thread.java:637)Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/input at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224) at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:55) at org.apac
he.hadoo
p.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:258) ... 7 more3092 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - Failed to produce result in: "file:///output"3092 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - Failed!

Search Discussions

  • Jeff Zhang at Oct 28, 2010 at 1:08 am
    You need to put hadoop conf file under class path, otherwise you will
    always connect local file system


    On Wed, Oct 27, 2010 at 4:25 PM, Zach Bailey wrote:

    Hi all,Facing a weird problem and wondering if anyone has run into this before. I've been playing with PigServer to programmatically run some simple pig scripts and it does not seem to be connecting to HDFS when I pass in ExecType.MAPREDUCE.I am running in pseudo-distributed mode and have the tasktracker and namenode both running on default ports. When I run scripts by using "pig script.pig" or from the grunt console it connects to hdfs and works fine.Do I need to specify some additional properties in the PigServer constructor, or construct a custom PigContext? I had assumed that by passing ExecType.MAPREDUCE and using the defaults, everything would be fine.Would really appreciate any insight or anecdotes of others using PigServer and how they have it set up. Thanks a bunch!-ZachHere is the code I'm using:PigServer pigServer = new PigServer("mapreduce");pigServer.setBatchOn();pigServer.registerScript("/Users/zach/Desktop/test.pig");List<ExecJob> jobs = pigServer.executeBat
    ch();and
    here is the log output:0    [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine  - Connecting to hadoop file system at: file:///622  [main] INFO  org.apache.pig.impl.logicalLayer.optimizer.PruneColumns  - No column pruned for pages622  [main] INFO  org.apache.pig.impl.logicalLayer.optimizer.PruneColumns  - No map keys pruned for pages659  [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics  - Initializing JVM Metrics with processName=JobTracker, sessionId=751  [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine  - (Name: Store(file:///output:PigStorage) - 1-70 Operator Key: 1-70)789  [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer  - MR plan size before optimization: 1790  [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer  - MR plan size after optimization: 1815  [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetri
    cs  - C
    annot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized822  [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics  - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized822  [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler  - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.32534 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler  - Setting up single store job2582 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics  - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized2582 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - 1 map-reduce job(s) waiting for submission.2590 [Thread-4] WARN  org.apache.hadoop.mapred.JobClient  - Use GenericOptionsParser for parsing the arguments. Applications should imp
    lement T
    ool for the same.2746 [Thread-4] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics  - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized2765 [Thread-4] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics  - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized3083 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - 0% complete3084 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - 100% complete3084 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - 1 map reduce job(s) failed!3085 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher  - There is no log file to write to.3085 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher  - Backend error message during job submissionorg.apache.pig.backend.executionengine
    .ExecExc
    eption: ERROR 2118: Unable to create input splits for: file:///input    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:269)       at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)     at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)     at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378) at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)   at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)      at java.lang.Thread.run(Thread.java:637)Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/input  at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)   at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:55)    at org.apac
    he.hadoo
    p.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)       at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:258)       ... 7 more3092 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - Failed to produce result in: "file:///output"3092 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - Failed!


    --
    Best Regards

    Jeff Zhang
  • Harsh J at Oct 28, 2010 at 5:53 am
    Pig needs to know where your HDFS is, doesn't it? :)
    http://pig.apache.org/docs/r0.7.0/setup.html#Embedded+Programs details
    on what needs to be set for embedded programs to use Pig.
    Specifically, the $HADOOPDIR part.

    You could also put the conf files into the classpath as Jeff pointed :)
    On Thu, Oct 28, 2010 at 4:55 AM, Zach Bailey wrote:

    Hi all,Facing a weird problem and wondering if anyone has run into this before. I've been playing with PigServer to programmatically run some simple pig scripts and it does not seem to be connecting to HDFS when I pass in ExecType.MAPREDUCE.I am running in pseudo-distributed mode and have the tasktracker and namenode both running on default ports. When I run scripts by using "pig script.pig" or from the grunt console it connects to hdfs and works fine.Do I need to specify some additional properties in the PigServer constructor, or construct a custom PigContext? I had assumed that by passing ExecType.MAPREDUCE and using the defaults, everything would be fine.Would really appreciate any insight or anecdotes of others using PigServer and how they have it set up. Thanks a bunch!-ZachHere is the code I'm using:PigServer pigServer = new PigServer("mapreduce");pigServer.setBatchOn();pigServer.registerScript("/Users/zach/Desktop/test.pig");List<ExecJob> jobs = pigServer.executeBat
    ch();and
    here is the log output:0    [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine  - Connecting to hadoop file system at: file:///622  [main] INFO  org.apache.pig.impl.logicalLayer.optimizer.PruneColumns  - No column pruned for pages622  [main] INFO  org.apache.pig.impl.logicalLayer.optimizer.PruneColumns  - No map keys pruned for pages659  [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics  - Initializing JVM Metrics with processName=JobTracker, sessionId=751  [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine  - (Name: Store(file:///output:PigStorage) - 1-70 Operator Key: 1-70)789  [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer  - MR plan size before optimization: 1790  [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer  - MR plan size after optimization: 1815  [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetri
    cs  - C
    annot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized822  [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics  - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized822  [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler  - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.32534 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler  - Setting up single store job2582 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics  - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized2582 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - 1 map-reduce job(s) waiting for submission.2590 [Thread-4] WARN  org.apache.hadoop.mapred.JobClient  - Use GenericOptionsParser for parsing the arguments. Applications should imp
    lement T
    ool for the same.2746 [Thread-4] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics  - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized2765 [Thread-4] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics  - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized3083 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - 0% complete3084 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - 100% complete3084 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - 1 map reduce job(s) failed!3085 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher  - There is no log file to write to.3085 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher  - Backend error message during job submissionorg.apache.pig.backend.executionengine
    .ExecExc
    eption: ERROR 2118: Unable to create input splits for: file:///input    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:269)       at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)     at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)     at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378) at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)   at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)      at java.lang.Thread.run(Thread.java:637)Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/input  at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)   at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:55)    at org.apac
    he.hadoo
    p.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)       at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:258)       ... 7 more3092 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - Failed to produce result in: "file:///output"3092 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - Failed!


    --
    Harsh J
    www.harshj.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedOct 27, '10 at 11:26p
activeOct 28, '10 at 5:53a
posts3
users3
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase