Grokbase Groups Pig user August 2010
FAQ
Hi ,

I am trying to integrate Pig with Hadoop for processing of jobs.

I am able to run Pig in local mode and Hadoop with streaming api perfectly.

But when I try to run Pig with Hadoop I get follwong Error:

Pig Stack Trace
---------------
ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out

org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
at org.apache.pig.PigServer.validate(PigServer.java:930)
at org.apache.pig.PigServer.compileLp(PigServer.java:910)
at org.apache.pig.PigServer.compileLp(PigServer.java:871)
at org.apache.pig.PigServer.compileLp(PigServer.java:852)
at org.apache.pig.PigServer.execute(PigServer.java:816)
at org.apache.pig.PigServer.access$100(PigServer.java:105)
at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
at org.apache.pig.Main.main(Main.java:391)
Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
... 16 more
Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
at org.apache.hadoop.ipc.Client.call(Client.java:743)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
at org.apache.hadoop.mapred.JobClient.(Job.java:50)
at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
... 24 more
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
================================================================================

Did anyone got the same error. I think it related to connection between pig and hadoop.

Can someone tell me how to connect Pig and hadoop.

Thanks.

Search Discussions

  • Jeff Zhang at Aug 27, 2010 at 1:11 am
    Do you put the hadoop conf on classpath ? It seems you are still using
    local file system but conncect Hadoop's JobTracker.
    Make sure you set the correct configuration in core-site.xml
    hdfs-site.xml, mapred-site.xml, and put them on classpath.


    On Thu, Aug 26, 2010 at 5:32 PM, rahul wrote:
    Hi ,

    I am trying to integrate Pig with Hadoop for processing of jobs.

    I am able to run Pig in local mode and Hadoop with streaming api perfectly.

    But when I try to run Pig with Hadoop I get follwong Error:

    Pig Stack Trace
    ---------------
    ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out

    org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
    at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
    at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
    at org.apache.pig.PigServer.validate(PigServer.java:930)
    at org.apache.pig.PigServer.compileLp(PigServer.java:910)
    at org.apache.pig.PigServer.compileLp(PigServer.java:871)
    at org.apache.pig.PigServer.compileLp(PigServer.java:852)
    at org.apache.pig.PigServer.execute(PigServer.java:816)
    at org.apache.pig.PigServer.access$100(PigServer.java:105)
    at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
    at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
    at org.apache.pig.Main.main(Main.java:391)
    Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
    at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
    at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
    at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
    at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
    at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
    ... 16 more
    Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
    at org.apache.hadoop.ipc.Client.call(Client.java:743)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
    at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
    at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
    at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
    at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
    ... 24 more
    Caused by: java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
    ================================================================================

    Did anyone got the same error. I think it related to connection between pig and hadoop.

    Can someone tell me how to connect Pig and hadoop.

    Thanks.


    --
    Best Regards

    Jeff Zhang
  • Rahul at Aug 27, 2010 at 1:22 am
    Hi Jeff,

    I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable.

    But I have both Pig and hadoop running at the same machine, so localhost should not make a difference.

    So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.

    Please let me know if my understanding is correct ?

    I am attaching the conf files as well :
    hdfs-site.xml:

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
    <description>The name of the default file system. A URI whose
    scheme and authority determine the FileSystem implementation. The
    uri's scheme determines the config property (fs.SCHEME.impl) naming
    the FileSystem implementation class. The uri's authority is used to
    determine the host, port, etc. for a filesystem.</description>
    </property>

    <property>
    <name>dfs.replication</name>
    <value>1</value>
    <description>Default block replication.
    The actual number of replications can be specified when the file is created.
    The default is used if replication is not specified in create time.
    </description>
    </property>

    </configuration>

    core-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
    <description>A base for other temporary directories.</description>
    </property>
    </configuration>

    mapred-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
    <description>The host and port that the MapReduce job tracker runs
    at. If "local", then jobs are run in-process as a single map
    and reduce task.
    </description>
    </property>

    <property>
    <name>mapred.tasktracker.tasks.maximum</name>
    <value>8</value>
    <description>The maximum number of tasks that will be run simultaneously by a
    a task tracker
    </description>
    </property>
    </configuration>

    Please let me know if there is a issue in my configurations ? Any input is valuable for me.

    Thanks,
    Rahul
    On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:

    Do you put the hadoop conf on classpath ? It seems you are still using
    local file system but conncect Hadoop's JobTracker.
    Make sure you set the correct configuration in core-site.xml
    hdfs-site.xml, mapred-site.xml, and put them on classpath.


    On Thu, Aug 26, 2010 at 5:32 PM, rahul wrote:
    Hi ,

    I am trying to integrate Pig with Hadoop for processing of jobs.

    I am able to run Pig in local mode and Hadoop with streaming api perfectly.

    But when I try to run Pig with Hadoop I get follwong Error:

    Pig Stack Trace
    ---------------
    ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out

    org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
    at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
    at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
    at org.apache.pig.PigServer.validate(PigServer.java:930)
    at org.apache.pig.PigServer.compileLp(PigServer.java:910)
    at org.apache.pig.PigServer.compileLp(PigServer.java:871)
    at org.apache.pig.PigServer.compileLp(PigServer.java:852)
    at org.apache.pig.PigServer.execute(PigServer.java:816)
    at org.apache.pig.PigServer.access$100(PigServer.java:105)
    at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
    at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
    at org.apache.pig.Main.main(Main.java:391)
    Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
    at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
    at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
    at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
    at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
    at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
    ... 16 more
    Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
    at org.apache.hadoop.ipc.Client.call(Client.java:743)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
    at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
    at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
    at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
    at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
    ... 24 more
    Caused by: java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
    ================================================================================

    Did anyone got the same error. I think it related to connection between pig and hadoop.

    Can someone tell me how to connect Pig and hadoop.

    Thanks.


    --
    Best Regards

    Jeff Zhang
  • Santhosh Srinivasan at Aug 27, 2010 at 1:26 am
    Can you try replacing localhost with the fully qualified name of your host?

    Santhosh


    -----Original Message-----
    From: rahul
    Sent: Thursday, August 26, 2010 6:22 PM
    To: pig-user@hadoop.apache.org
    Subject: Re: Pig and Hadoop Integration Error

    Hi Jeff,

    I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable.

    But I have both Pig and hadoop running at the same machine, so localhost should not make a difference.

    So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.

    Please let me know if my understanding is correct ?

    I am attaching the conf files as well :
    hdfs-site.xml:

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
    <description>The name of the default file system. A URI whose
    scheme and authority determine the FileSystem implementation. The
    uri's scheme determines the config property (fs.SCHEME.impl) naming
    the FileSystem implementation class. The uri's authority is used to
    determine the host, port, etc. for a filesystem.</description> </property>

    <property>
    <name>dfs.replication</name>
    <value>1</value>
    <description>Default block replication.
    The actual number of replications can be specified when the file is created.
    The default is used if replication is not specified in create time.
    </description>
    </property>

    </configuration>

    core-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
    <description>A base for other temporary directories.</description> </property> </configuration>

    mapred-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
    <description>The host and port that the MapReduce job tracker runs
    at. If "local", then jobs are run in-process as a single map
    and reduce task.
    </description>
    </property>

    <property>
    <name>mapred.tasktracker.tasks.maximum</name>
    <value>8</value>
    <description>The maximum number of tasks that will be run simultaneously by a a task tracker </description> </property> </configuration>

    Please let me know if there is a issue in my configurations ? Any input is valuable for me.

    Thanks,
    Rahul
    On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:

    Do you put the hadoop conf on classpath ? It seems you are still using
    local file system but conncect Hadoop's JobTracker.
    Make sure you set the correct configuration in core-site.xml
    hdfs-site.xml, mapred-site.xml, and put them on classpath.


    On Thu, Aug 26, 2010 at 5:32 PM, rahul wrote:
    Hi ,

    I am trying to integrate Pig with Hadoop for processing of jobs.

    I am able to run Pig in local mode and Hadoop with streaming api perfectly.

    But when I try to run Pig with Hadoop I get follwong Error:

    Pig Stack Trace
    ---------------
    ERROR 2116: Unexpected error. Could not validate the output
    specification for:
    file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out

    org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
    at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
    at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
    at org.apache.pig.PigServer.validate(PigServer.java:930)
    at org.apache.pig.PigServer.compileLp(PigServer.java:910)
    at org.apache.pig.PigServer.compileLp(PigServer.java:871)
    at org.apache.pig.PigServer.compileLp(PigServer.java:852)
    at org.apache.pig.PigServer.execute(PigServer.java:816)
    at org.apache.pig.PigServer.access$100(PigServer.java:105)
    at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
    at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
    at org.apache.pig.Main.main(Main.java:391)
    Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
    at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
    at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
    at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
    at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
    at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
    ... 16 more
    Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
    at org.apache.hadoop.ipc.Client.call(Client.java:743)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
    at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
    at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
    at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
    at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
    ... 24 more
    Caused by: java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
    at
    org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
    =====================================================================
    ===========

    Did anyone got the same error. I think it related to connection between pig and hadoop.

    Can someone tell me how to connect Pig and hadoop.

    Thanks.


    --
    Best Regards

    Jeff Zhang
  • Rahul at Aug 27, 2010 at 1:51 am
    Hi Santhosh I tried with absolute path as well but error remains the same.

    I think absolute path should not be a issue as both Pig and Hadoop are at the same location.

    Please let me know if there is some gap in my understanding ?

    Thanks,
    Rahul
    On Aug 26, 2010, at 6:25 PM, Santhosh Srinivasan wrote:

    Can you try replacing localhost with the fully qualified name of your host?

    Santhosh


    -----Original Message-----
    From: rahul
    Sent: Thursday, August 26, 2010 6:22 PM
    To: pig-user@hadoop.apache.org
    Subject: Re: Pig and Hadoop Integration Error

    Hi Jeff,

    I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable.

    But I have both Pig and hadoop running at the same machine, so localhost should not make a difference.

    So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.

    Please let me know if my understanding is correct ?

    I am attaching the conf files as well :
    hdfs-site.xml:

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
    <description>The name of the default file system. A URI whose
    scheme and authority determine the FileSystem implementation. The
    uri's scheme determines the config property (fs.SCHEME.impl) naming
    the FileSystem implementation class. The uri's authority is used to
    determine the host, port, etc. for a filesystem.</description> </property>

    <property>
    <name>dfs.replication</name>
    <value>1</value>
    <description>Default block replication.
    The actual number of replications can be specified when the file is created.
    The default is used if replication is not specified in create time.
    </description>
    </property>

    </configuration>

    core-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
    <description>A base for other temporary directories.</description> </property> </configuration>

    mapred-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
    <description>The host and port that the MapReduce job tracker runs
    at. If "local", then jobs are run in-process as a single map
    and reduce task.
    </description>
    </property>

    <property>
    <name>mapred.tasktracker.tasks.maximum</name>
    <value>8</value>
    <description>The maximum number of tasks that will be run simultaneously by a a task tracker </description> </property> </configuration>

    Please let me know if there is a issue in my configurations ? Any input is valuable for me.

    Thanks,
    Rahul
    On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:

    Do you put the hadoop conf on classpath ? It seems you are still using
    local file system but conncect Hadoop's JobTracker.
    Make sure you set the correct configuration in core-site.xml
    hdfs-site.xml, mapred-site.xml, and put them on classpath.


    On Thu, Aug 26, 2010 at 5:32 PM, rahul wrote:
    Hi ,

    I am trying to integrate Pig with Hadoop for processing of jobs.

    I am able to run Pig in local mode and Hadoop with streaming api perfectly.

    But when I try to run Pig with Hadoop I get follwong Error:

    Pig Stack Trace
    ---------------
    ERROR 2116: Unexpected error. Could not validate the output
    specification for:
    file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out

    org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
    at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
    at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
    at org.apache.pig.PigServer.validate(PigServer.java:930)
    at org.apache.pig.PigServer.compileLp(PigServer.java:910)
    at org.apache.pig.PigServer.compileLp(PigServer.java:871)
    at org.apache.pig.PigServer.compileLp(PigServer.java:852)
    at org.apache.pig.PigServer.execute(PigServer.java:816)
    at org.apache.pig.PigServer.access$100(PigServer.java:105)
    at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
    at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
    at org.apache.pig.Main.main(Main.java:391)
    Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
    at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
    at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
    at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
    at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
    at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
    ... 16 more
    Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
    at org.apache.hadoop.ipc.Client.call(Client.java:743)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
    at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
    at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
    at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
    at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
    ... 24 more
    Caused by: java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
    at
    org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
    =====================================================================
    ===========

    Did anyone got the same error. I think it related to connection between pig and hadoop.

    Can someone tell me how to connect Pig and hadoop.

    Thanks.


    --
    Best Regards

    Jeff Zhang
  • Jeff Zhang at Aug 27, 2010 at 1:30 am
    But according the errer log:
    "Could not validate the output specification for:
    file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out"

    It still try to access local file system rather than HDFS File System


    On Thu, Aug 26, 2010 at 6:22 PM, rahul wrote:
    Hi Jeff,

    I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable.

    But I have both Pig and hadoop running at the same machine, so localhost should not make a difference.

    So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.

    Please let me know if my understanding is correct ?

    I am attaching the conf files as well :
    hdfs-site.xml:

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
    <description>The name of the default file system.  A URI whose
    scheme and authority determine the FileSystem implementation.  The
    uri's scheme determines the config property (fs.SCHEME.impl) naming
    the FileSystem implementation class.  The uri's authority is used to
    determine the host, port, etc. for a filesystem.</description>
    </property>

    <property>
    <name>dfs.replication</name>
    <value>1</value>
    <description>Default block replication.
    The actual number of replications can be specified when the file is created.
    The default is used if replication is not specified in create time.
    </description>
    </property>

    </configuration>

    core-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
    <description>A base for other temporary directories.</description>
    </property>
    </configuration>

    mapred-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
    <description>The host and port that the MapReduce job tracker runs
    at. If "local", then jobs are run in-process as a single map
    and reduce task.
    </description>
    </property>

    <property>
    <name>mapred.tasktracker.tasks.maximum</name>
    <value>8</value>
    <description>The maximum number of tasks that will be run simultaneously by a
    a task tracker
    </description>
    </property>
    </configuration>

    Please let me know if there is a issue in my configurations ? Any input is valuable for me.

    Thanks,
    Rahul
    On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:

    Do you put the hadoop conf on classpath ? It seems you are still using
    local file system but conncect Hadoop's JobTracker.
    Make sure you set the correct configuration in core-site.xml
    hdfs-site.xml, mapred-site.xml, and put them on classpath.


    On Thu, Aug 26, 2010 at 5:32 PM, rahul wrote:
    Hi ,

    I am trying to integrate Pig with Hadoop for processing of jobs.

    I am able to run Pig in local mode and Hadoop with streaming api perfectly.

    But when I try to run Pig with Hadoop I get follwong Error:

    Pig Stack Trace
    ---------------
    ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out

    org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
    at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
    at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
    at org.apache.pig.PigServer.validate(PigServer.java:930)
    at org.apache.pig.PigServer.compileLp(PigServer.java:910)
    at org.apache.pig.PigServer.compileLp(PigServer.java:871)
    at org.apache.pig.PigServer.compileLp(PigServer.java:852)
    at org.apache.pig.PigServer.execute(PigServer.java:816)
    at org.apache.pig.PigServer.access$100(PigServer.java:105)
    at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
    at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
    at org.apache.pig.Main.main(Main.java:391)
    Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
    at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
    at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
    at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
    at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
    at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
    ... 16 more
    Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
    at org.apache.hadoop.ipc.Client.call(Client.java:743)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
    at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
    at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
    at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
    at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
    ... 24 more
    Caused by: java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
    ================================================================================

    Did anyone got the same error. I think it related to connection between pig and hadoop.

    Can someone tell me how to connect Pig and hadoop.

    Thanks.


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang
  • Jeff Zhang at Aug 27, 2010 at 1:33 am
    Try to put the hadoop xml configuration file to pig/conf folder


    On Thu, Aug 26, 2010 at 6:22 PM, rahul wrote:
    Hi Jeff,

    I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable.

    But I have both Pig and hadoop running at the same machine, so localhost should not make a difference.

    So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.

    Please let me know if my understanding is correct ?

    I am attaching the conf files as well :
    hdfs-site.xml:

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
    <description>The name of the default file system.  A URI whose
    scheme and authority determine the FileSystem implementation.  The
    uri's scheme determines the config property (fs.SCHEME.impl) naming
    the FileSystem implementation class.  The uri's authority is used to
    determine the host, port, etc. for a filesystem.</description>
    </property>

    <property>
    <name>dfs.replication</name>
    <value>1</value>
    <description>Default block replication.
    The actual number of replications can be specified when the file is created.
    The default is used if replication is not specified in create time.
    </description>
    </property>

    </configuration>

    core-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
    <description>A base for other temporary directories.</description>
    </property>
    </configuration>

    mapred-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
    <description>The host and port that the MapReduce job tracker runs
    at. If "local", then jobs are run in-process as a single map
    and reduce task.
    </description>
    </property>

    <property>
    <name>mapred.tasktracker.tasks.maximum</name>
    <value>8</value>
    <description>The maximum number of tasks that will be run simultaneously by a
    a task tracker
    </description>
    </property>
    </configuration>

    Please let me know if there is a issue in my configurations ? Any input is valuable for me.

    Thanks,
    Rahul
    On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:

    Do you put the hadoop conf on classpath ? It seems you are still using
    local file system but conncect Hadoop's JobTracker.
    Make sure you set the correct configuration in core-site.xml
    hdfs-site.xml, mapred-site.xml, and put them on classpath.


    On Thu, Aug 26, 2010 at 5:32 PM, rahul wrote:
    Hi ,

    I am trying to integrate Pig with Hadoop for processing of jobs.

    I am able to run Pig in local mode and Hadoop with streaming api perfectly.

    But when I try to run Pig with Hadoop I get follwong Error:

    Pig Stack Trace
    ---------------
    ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out

    org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
    at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
    at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
    at org.apache.pig.PigServer.validate(PigServer.java:930)
    at org.apache.pig.PigServer.compileLp(PigServer.java:910)
    at org.apache.pig.PigServer.compileLp(PigServer.java:871)
    at org.apache.pig.PigServer.compileLp(PigServer.java:852)
    at org.apache.pig.PigServer.execute(PigServer.java:816)
    at org.apache.pig.PigServer.access$100(PigServer.java:105)
    at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
    at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
    at org.apache.pig.Main.main(Main.java:391)
    Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
    at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
    at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
    at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
    at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
    at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
    ... 16 more
    Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
    at org.apache.hadoop.ipc.Client.call(Client.java:743)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
    at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
    at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
    at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
    at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
    ... 24 more
    Caused by: java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
    ================================================================================

    Did anyone got the same error. I think it related to connection between pig and hadoop.

    Can someone tell me how to connect Pig and hadoop.

    Thanks.


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang
  • Rahul at Aug 27, 2010 at 1:50 am
    Hi Jeff,

    I transferred the hadoop conf files to the pig/conf location but still i get the same error.

    Does the issue is with the configuration files or with the hdfs files system ?

    Can test the connection to hdfs(localhost/127.0.0.1:9001) in some way ?

    Steps I did :

    1. I have formatted initially my local file system using the ./hadoop namenode -format command. I believe this mounts the local file system to HDFS.
    2. Then I configured the hadoop conf files and started ./start-all script.
    3. Started Pig with a custom pig script which should read hdfs as I passed the HADOOP_CONF_DIR as parameter.
    The command was java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main script1-hadoop.pig

    Please let me know if these step miss something ?

    Thanks,
    Rahul

    On Aug 26, 2010, at 6:33 PM, Jeff Zhang wrote:

    Try to put the hadoop xml configuration file to pig/conf folder


    On Thu, Aug 26, 2010 at 6:22 PM, rahul wrote:
    Hi Jeff,

    I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable.

    But I have both Pig and hadoop running at the same machine, so localhost should not make a difference.

    So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.

    Please let me know if my understanding is correct ?

    I am attaching the conf files as well :
    hdfs-site.xml:

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
    <description>The name of the default file system. A URI whose
    scheme and authority determine the FileSystem implementation. The
    uri's scheme determines the config property (fs.SCHEME.impl) naming
    the FileSystem implementation class. The uri's authority is used to
    determine the host, port, etc. for a filesystem.</description>
    </property>

    <property>
    <name>dfs.replication</name>
    <value>1</value>
    <description>Default block replication.
    The actual number of replications can be specified when the file is created.
    The default is used if replication is not specified in create time.
    </description>
    </property>

    </configuration>

    core-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
    <description>A base for other temporary directories.</description>
    </property>
    </configuration>

    mapred-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
    <description>The host and port that the MapReduce job tracker runs
    at. If "local", then jobs are run in-process as a single map
    and reduce task.
    </description>
    </property>

    <property>
    <name>mapred.tasktracker.tasks.maximum</name>
    <value>8</value>
    <description>The maximum number of tasks that will be run simultaneously by a
    a task tracker
    </description>
    </property>
    </configuration>

    Please let me know if there is a issue in my configurations ? Any input is valuable for me.

    Thanks,
    Rahul
    On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:

    Do you put the hadoop conf on classpath ? It seems you are still using
    local file system but conncect Hadoop's JobTracker.
    Make sure you set the correct configuration in core-site.xml
    hdfs-site.xml, mapred-site.xml, and put them on classpath.


    On Thu, Aug 26, 2010 at 5:32 PM, rahul wrote:
    Hi ,

    I am trying to integrate Pig with Hadoop for processing of jobs.

    I am able to run Pig in local mode and Hadoop with streaming api perfectly.

    But when I try to run Pig with Hadoop I get follwong Error:

    Pig Stack Trace
    ---------------
    ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out

    org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
    at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
    at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
    at org.apache.pig.PigServer.validate(PigServer.java:930)
    at org.apache.pig.PigServer.compileLp(PigServer.java:910)
    at org.apache.pig.PigServer.compileLp(PigServer.java:871)
    at org.apache.pig.PigServer.compileLp(PigServer.java:852)
    at org.apache.pig.PigServer.execute(PigServer.java:816)
    at org.apache.pig.PigServer.access$100(PigServer.java:105)
    at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
    at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
    at org.apache.pig.Main.main(Main.java:391)
    Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
    at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
    at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
    at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
    at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
    at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
    ... 16 more
    Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
    at org.apache.hadoop.ipc.Client.call(Client.java:743)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
    at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
    at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
    at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
    at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
    ... 24 more
    Caused by: java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
    ================================================================================

    Did anyone got the same error. I think it related to connection between pig and hadoop.

    Can someone tell me how to connect Pig and hadoop.

    Thanks.


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang
  • Jeff Zhang at Aug 27, 2010 at 1:59 am
    Execute command jps in shell to see whether namenode and jobtracker is
    running correctly.


    On Fri, Aug 27, 2010 at 9:49 AM, rahul wrote:
    Hi Jeff,

    I transferred the hadoop conf files to the pig/conf location but still i get the same error.

    Does the issue is with the configuration files or with the hdfs files system ?

    Can test the connection to hdfs(localhost/127.0.0.1:9001) in some way ?

    Steps I did :

    1. I have formatted initially my local file system using the ./hadoop namenode -format command. I believe this mounts the local file system to HDFS.
    2. Then I configured the hadoop conf files and started ./start-all script.
    3. Started Pig with a custom pig script which should read hdfs as I passed the HADOOP_CONF_DIR as parameter.
    The command was java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main script1-hadoop.pig

    Please let me know if these step miss something ?

    Thanks,
    Rahul

    On Aug 26, 2010, at 6:33 PM, Jeff Zhang wrote:

    Try to put the hadoop xml configuration file to pig/conf folder


    On Thu, Aug 26, 2010 at 6:22 PM, rahul wrote:
    Hi Jeff,

    I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable.

    But I have both Pig and hadoop running at the same machine, so localhost should not make a difference.

    So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.

    Please let me know if my understanding is correct ?

    I am attaching the conf files as well :
    hdfs-site.xml:

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
    <description>The name of the default file system.  A URI whose
    scheme and authority determine the FileSystem implementation.  The
    uri's scheme determines the config property (fs.SCHEME.impl) naming
    the FileSystem implementation class.  The uri's authority is used to
    determine the host, port, etc. for a filesystem.</description>
    </property>

    <property>
    <name>dfs.replication</name>
    <value>1</value>
    <description>Default block replication.
    The actual number of replications can be specified when the file is created.
    The default is used if replication is not specified in create time.
    </description>
    </property>

    </configuration>

    core-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
    <description>A base for other temporary directories.</description>
    </property>
    </configuration>

    mapred-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
    <description>The host and port that the MapReduce job tracker runs
    at. If "local", then jobs are run in-process as a single map
    and reduce task.
    </description>
    </property>

    <property>
    <name>mapred.tasktracker.tasks.maximum</name>
    <value>8</value>
    <description>The maximum number of tasks that will be run simultaneously by a
    a task tracker
    </description>
    </property>
    </configuration>

    Please let me know if there is a issue in my configurations ? Any input is valuable for me.

    Thanks,
    Rahul
    On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:

    Do you put the hadoop conf on classpath ? It seems you are still using
    local file system but conncect Hadoop's JobTracker.
    Make sure you set the correct configuration in core-site.xml
    hdfs-site.xml, mapred-site.xml, and put them on classpath.


    On Thu, Aug 26, 2010 at 5:32 PM, rahul wrote:
    Hi ,

    I am trying to integrate Pig with Hadoop for processing of jobs.

    I am able to run Pig in local mode and Hadoop with streaming api perfectly.

    But when I try to run Pig with Hadoop I get follwong Error:

    Pig Stack Trace
    ---------------
    ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out

    org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
    at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
    at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
    at org.apache.pig.PigServer.validate(PigServer.java:930)
    at org.apache.pig.PigServer.compileLp(PigServer.java:910)
    at org.apache.pig.PigServer.compileLp(PigServer.java:871)
    at org.apache.pig.PigServer.compileLp(PigServer.java:852)
    at org.apache.pig.PigServer.execute(PigServer.java:816)
    at org.apache.pig.PigServer.access$100(PigServer.java:105)
    at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
    at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
    at org.apache.pig.Main.main(Main.java:391)
    Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
    at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
    at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
    at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
    at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
    at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
    ... 16 more
    Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
    at org.apache.hadoop.ipc.Client.call(Client.java:743)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
    at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
    at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
    at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
    at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
    ... 24 more
    Caused by: java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
    ================================================================================

    Did anyone got the same error. I think it related to connection between pig and hadoop.

    Can someone tell me how to connect Pig and hadoop.

    Thanks.


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang
  • Rahul at Aug 27, 2010 at 2:01 am
    Yes they are running.
    On Aug 26, 2010, at 6:59 PM, Jeff Zhang wrote:

    Execute command jps in shell to see whether namenode and jobtracker is
    running correctly.


    On Fri, Aug 27, 2010 at 9:49 AM, rahul wrote:
    Hi Jeff,

    I transferred the hadoop conf files to the pig/conf location but still i get the same error.

    Does the issue is with the configuration files or with the hdfs files system ?

    Can test the connection to hdfs(localhost/127.0.0.1:9001) in some way ?

    Steps I did :

    1. I have formatted initially my local file system using the ./hadoop namenode -format command. I believe this mounts the local file system to HDFS.
    2. Then I configured the hadoop conf files and started ./start-all script.
    3. Started Pig with a custom pig script which should read hdfs as I passed the HADOOP_CONF_DIR as parameter.
    The command was java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main script1-hadoop.pig

    Please let me know if these step miss something ?

    Thanks,
    Rahul

    On Aug 26, 2010, at 6:33 PM, Jeff Zhang wrote:

    Try to put the hadoop xml configuration file to pig/conf folder


    On Thu, Aug 26, 2010 at 6:22 PM, rahul wrote:
    Hi Jeff,

    I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable.

    But I have both Pig and hadoop running at the same machine, so localhost should not make a difference.

    So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.

    Please let me know if my understanding is correct ?

    I am attaching the conf files as well :
    hdfs-site.xml:

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
    <description>The name of the default file system. A URI whose
    scheme and authority determine the FileSystem implementation. The
    uri's scheme determines the config property (fs.SCHEME.impl) naming
    the FileSystem implementation class. The uri's authority is used to
    determine the host, port, etc. for a filesystem.</description>
    </property>

    <property>
    <name>dfs.replication</name>
    <value>1</value>
    <description>Default block replication.
    The actual number of replications can be specified when the file is created.
    The default is used if replication is not specified in create time.
    </description>
    </property>

    </configuration>

    core-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
    <description>A base for other temporary directories.</description>
    </property>
    </configuration>

    mapred-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
    <description>The host and port that the MapReduce job tracker runs
    at. If "local", then jobs are run in-process as a single map
    and reduce task.
    </description>
    </property>

    <property>
    <name>mapred.tasktracker.tasks.maximum</name>
    <value>8</value>
    <description>The maximum number of tasks that will be run simultaneously by a
    a task tracker
    </description>
    </property>
    </configuration>

    Please let me know if there is a issue in my configurations ? Any input is valuable for me.

    Thanks,
    Rahul
    On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:

    Do you put the hadoop conf on classpath ? It seems you are still using
    local file system but conncect Hadoop's JobTracker.
    Make sure you set the correct configuration in core-site.xml
    hdfs-site.xml, mapred-site.xml, and put them on classpath.


    On Thu, Aug 26, 2010 at 5:32 PM, rahul wrote:
    Hi ,

    I am trying to integrate Pig with Hadoop for processing of jobs.

    I am able to run Pig in local mode and Hadoop with streaming api perfectly.

    But when I try to run Pig with Hadoop I get follwong Error:

    Pig Stack Trace
    ---------------
    ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out

    org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
    at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
    at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
    at org.apache.pig.PigServer.validate(PigServer.java:930)
    at org.apache.pig.PigServer.compileLp(PigServer.java:910)
    at org.apache.pig.PigServer.compileLp(PigServer.java:871)
    at org.apache.pig.PigServer.compileLp(PigServer.java:852)
    at org.apache.pig.PigServer.execute(PigServer.java:816)
    at org.apache.pig.PigServer.access$100(PigServer.java:105)
    at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
    at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
    at org.apache.pig.Main.main(Main.java:391)
    Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
    at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
    at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
    at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
    at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
    at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
    ... 16 more
    Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
    at org.apache.hadoop.ipc.Client.call(Client.java:743)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
    at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
    at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
    at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
    at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
    ... 24 more
    Caused by: java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
    ================================================================================

    Did anyone got the same error. I think it related to connection between pig and hadoop.

    Can someone tell me how to connect Pig and hadoop.

    Thanks.


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang
  • Jeff Zhang at Aug 27, 2010 at 2:08 am
    Can you look at the jobtracker log or access jobtracker web ui ?
    It seems you can not connect to jobtracker according your log

    "Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001
    failed on local exception: java.io.EOFException"


    On Fri, Aug 27, 2010 at 10:00 AM, rahul wrote:
    Yes they are running.
    On Aug 26, 2010, at 6:59 PM, Jeff Zhang wrote:

    Execute command jps in shell to see whether namenode and jobtracker is
    running correctly.


    On Fri, Aug 27, 2010 at 9:49 AM, rahul wrote:
    Hi Jeff,

    I transferred the hadoop conf files to the pig/conf location but still i get the same error.

    Does the issue is with the configuration files or with the hdfs files system ?

    Can test the connection to hdfs(localhost/127.0.0.1:9001) in some way ?

    Steps I did :

    1. I have formatted initially my local file system using the ./hadoop namenode -format command. I believe this mounts the local file system to HDFS.
    2. Then I configured the hadoop conf files and started ./start-all script.
    3. Started Pig with a custom pig script which should read hdfs as I passed the HADOOP_CONF_DIR as parameter.
    The command was java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main script1-hadoop.pig

    Please let me know if these step miss something ?

    Thanks,
    Rahul

    On Aug 26, 2010, at 6:33 PM, Jeff Zhang wrote:

    Try to put the hadoop xml configuration file to pig/conf folder


    On Thu, Aug 26, 2010 at 6:22 PM, rahul wrote:
    Hi Jeff,

    I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable.

    But I have both Pig and hadoop running at the same machine, so localhost should not make a difference.

    So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.

    Please let me know if my understanding is correct ?

    I am attaching the conf files as well :
    hdfs-site.xml:

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
    <description>The name of the default file system.  A URI whose
    scheme and authority determine the FileSystem implementation.  The
    uri's scheme determines the config property (fs.SCHEME.impl) naming
    the FileSystem implementation class.  The uri's authority is used to
    determine the host, port, etc. for a filesystem.</description>
    </property>

    <property>
    <name>dfs.replication</name>
    <value>1</value>
    <description>Default block replication.
    The actual number of replications can be specified when the file is created.
    The default is used if replication is not specified in create time.
    </description>
    </property>

    </configuration>

    core-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
    <description>A base for other temporary directories.</description>
    </property>
    </configuration>

    mapred-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
    <description>The host and port that the MapReduce job tracker runs
    at. If "local", then jobs are run in-process as a single map
    and reduce task.
    </description>
    </property>

    <property>
    <name>mapred.tasktracker.tasks.maximum</name>
    <value>8</value>
    <description>The maximum number of tasks that will be run simultaneously by a
    a task tracker
    </description>
    </property>
    </configuration>

    Please let me know if there is a issue in my configurations ? Any input is valuable for me.

    Thanks,
    Rahul
    On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:

    Do you put the hadoop conf on classpath ? It seems you are still using
    local file system but conncect Hadoop's JobTracker.
    Make sure you set the correct configuration in core-site.xml
    hdfs-site.xml, mapred-site.xml, and put them on classpath.


    On Thu, Aug 26, 2010 at 5:32 PM, rahul wrote:
    Hi ,

    I am trying to integrate Pig with Hadoop for processing of jobs.

    I am able to run Pig in local mode and Hadoop with streaming api perfectly.

    But when I try to run Pig with Hadoop I get follwong Error:

    Pig Stack Trace
    ---------------
    ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out

    org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
    at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
    at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
    at org.apache.pig.PigServer.validate(PigServer.java:930)
    at org.apache.pig.PigServer.compileLp(PigServer.java:910)
    at org.apache.pig.PigServer.compileLp(PigServer.java:871)
    at org.apache.pig.PigServer.compileLp(PigServer.java:852)
    at org.apache.pig.PigServer.execute(PigServer.java:816)
    at org.apache.pig.PigServer.access$100(PigServer.java:105)
    at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
    at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
    at org.apache.pig.Main.main(Main.java:391)
    Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
    at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
    at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
    at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
    at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
    at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
    ... 16 more
    Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
    at org.apache.hadoop.ipc.Client.call(Client.java:743)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
    at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
    at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
    at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
    at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
    ... 24 more
    Caused by: java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
    ================================================================================

    Did anyone got the same error. I think it related to connection between pig and hadoop.

    Can someone tell me how to connect Pig and hadoop.

    Thanks.


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang
  • Rahul at Aug 27, 2010 at 2:13 am
    Hi Jeff,

    I can connect to the jobtracker web UI using the following URL : http://localhost:50030/jobtracker.jsp

    And also I can see jobs which I ran directly using the streaming api on hadoop.

    I also see it tries to connect to localhost/127.0.0.1:9001 which I have specified in the hadoop conf file
    and I have also tried changing this location to localhost:50030 but still the error remains the same.

    Can you suggest something further ?

    Thanks,
    Rahul
    On Aug 26, 2010, at 7:07 PM, Jeff Zhang wrote:

    Can you look at the jobtracker log or access jobtracker web ui ?
    It seems you can not connect to jobtracker according your log

    "Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001
    failed on local exception: java.io.EOFException"


    On Fri, Aug 27, 2010 at 10:00 AM, rahul wrote:
    Yes they are running.
    On Aug 26, 2010, at 6:59 PM, Jeff Zhang wrote:

    Execute command jps in shell to see whether namenode and jobtracker is
    running correctly.


    On Fri, Aug 27, 2010 at 9:49 AM, rahul wrote:
    Hi Jeff,

    I transferred the hadoop conf files to the pig/conf location but still i get the same error.

    Does the issue is with the configuration files or with the hdfs files system ?

    Can test the connection to hdfs(localhost/127.0.0.1:9001) in some way ?

    Steps I did :

    1. I have formatted initially my local file system using the ./hadoop namenode -format command. I believe this mounts the local file system to HDFS.
    2. Then I configured the hadoop conf files and started ./start-all script.
    3. Started Pig with a custom pig script which should read hdfs as I passed the HADOOP_CONF_DIR as parameter.
    The command was java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main script1-hadoop.pig

    Please let me know if these step miss something ?

    Thanks,
    Rahul

    On Aug 26, 2010, at 6:33 PM, Jeff Zhang wrote:

    Try to put the hadoop xml configuration file to pig/conf folder


    On Thu, Aug 26, 2010 at 6:22 PM, rahul wrote:
    Hi Jeff,

    I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable.

    But I have both Pig and hadoop running at the same machine, so localhost should not make a difference.

    So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.

    Please let me know if my understanding is correct ?

    I am attaching the conf files as well :
    hdfs-site.xml:

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
    <description>The name of the default file system. A URI whose
    scheme and authority determine the FileSystem implementation. The
    uri's scheme determines the config property (fs.SCHEME.impl) naming
    the FileSystem implementation class. The uri's authority is used to
    determine the host, port, etc. for a filesystem.</description>
    </property>

    <property>
    <name>dfs.replication</name>
    <value>1</value>
    <description>Default block replication.
    The actual number of replications can be specified when the file is created.
    The default is used if replication is not specified in create time.
    </description>
    </property>

    </configuration>

    core-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
    <description>A base for other temporary directories.</description>
    </property>
    </configuration>

    mapred-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
    <description>The host and port that the MapReduce job tracker runs
    at. If "local", then jobs are run in-process as a single map
    and reduce task.
    </description>
    </property>

    <property>
    <name>mapred.tasktracker.tasks.maximum</name>
    <value>8</value>
    <description>The maximum number of tasks that will be run simultaneously by a
    a task tracker
    </description>
    </property>
    </configuration>

    Please let me know if there is a issue in my configurations ? Any input is valuable for me.

    Thanks,
    Rahul
    On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:

    Do you put the hadoop conf on classpath ? It seems you are still using
    local file system but conncect Hadoop's JobTracker.
    Make sure you set the correct configuration in core-site.xml
    hdfs-site.xml, mapred-site.xml, and put them on classpath.


    On Thu, Aug 26, 2010 at 5:32 PM, rahul wrote:
    Hi ,

    I am trying to integrate Pig with Hadoop for processing of jobs.

    I am able to run Pig in local mode and Hadoop with streaming api perfectly.

    But when I try to run Pig with Hadoop I get follwong Error:

    Pig Stack Trace
    ---------------
    ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out

    org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
    at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
    at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
    at org.apache.pig.PigServer.validate(PigServer.java:930)
    at org.apache.pig.PigServer.compileLp(PigServer.java:910)
    at org.apache.pig.PigServer.compileLp(PigServer.java:871)
    at org.apache.pig.PigServer.compileLp(PigServer.java:852)
    at org.apache.pig.PigServer.execute(PigServer.java:816)
    at org.apache.pig.PigServer.access$100(PigServer.java:105)
    at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
    at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
    at org.apache.pig.Main.main(Main.java:391)
    Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
    at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
    at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
    at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
    at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
    at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
    ... 16 more
    Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
    at org.apache.hadoop.ipc.Client.call(Client.java:743)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
    at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
    at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
    at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
    at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
    ... 24 more
    Caused by: java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
    ================================================================================

    Did anyone got the same error. I think it related to connection between pig and hadoop.

    Can someone tell me how to connect Pig and hadoop.

    Thanks.


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang
  • Jeff Zhang at Aug 27, 2010 at 2:24 am
    Connect to 9001 is right, this is jobtracker's ipc port while 50030
    is its http server port.
    And have you ever try to run the grunt shell ?
    On Thu, Aug 26, 2010 at 7:12 PM, rahul wrote:
    Hi Jeff,

    I can connect to the jobtracker web UI using the following URL : http://localhost:50030/jobtracker.jsp

    And also I can see jobs which I ran directly using the streaming api on hadoop.

    I also see it tries to connect to localhost/127.0.0.1:9001 which I have specified in the hadoop conf file
    and I have also tried changing this location to localhost:50030 but still the error remains the same.

    Can you suggest something further ?

    Thanks,
    Rahul
    On Aug 26, 2010, at 7:07 PM, Jeff Zhang wrote:

    Can you look at the jobtracker log or access jobtracker web ui ?
    It seems you can  not connect to jobtracker according your log

    "Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001
    failed on local exception: java.io.EOFException"


    On Fri, Aug 27, 2010 at 10:00 AM, rahul wrote:
    Yes they are running.
    On Aug 26, 2010, at 6:59 PM, Jeff Zhang wrote:

    Execute command jps in shell to see whether namenode and jobtracker is
    running correctly.


    On Fri, Aug 27, 2010 at 9:49 AM, rahul wrote:
    Hi Jeff,

    I transferred the hadoop conf files to the pig/conf location but still i get the same error.

    Does the issue is with the configuration files or with the hdfs files system ?

    Can test the connection to hdfs(localhost/127.0.0.1:9001) in some way ?

    Steps I did :

    1. I have formatted initially my local file system using the ./hadoop namenode -format command. I believe this mounts the local file system to HDFS.
    2. Then I configured the hadoop conf files and started ./start-all script.
    3. Started Pig with a custom pig script which should read hdfs as I passed the HADOOP_CONF_DIR as parameter.
    The command was java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main script1-hadoop.pig

    Please let me know if these step miss something ?

    Thanks,
    Rahul

    On Aug 26, 2010, at 6:33 PM, Jeff Zhang wrote:

    Try to put the hadoop xml configuration file to pig/conf folder


    On Thu, Aug 26, 2010 at 6:22 PM, rahul wrote:
    Hi Jeff,

    I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable.

    But I have both Pig and hadoop running at the same machine, so localhost should not make a difference.

    So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.

    Please let me know if my understanding is correct ?

    I am attaching the conf files as well :
    hdfs-site.xml:

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
    <description>The name of the default file system.  A URI whose
    scheme and authority determine the FileSystem implementation.  The
    uri's scheme determines the config property (fs.SCHEME.impl) naming
    the FileSystem implementation class.  The uri's authority is used to
    determine the host, port, etc. for a filesystem.</description>
    </property>

    <property>
    <name>dfs.replication</name>
    <value>1</value>
    <description>Default block replication.
    The actual number of replications can be specified when the file is created.
    The default is used if replication is not specified in create time.
    </description>
    </property>

    </configuration>

    core-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
    <description>A base for other temporary directories.</description>
    </property>
    </configuration>

    mapred-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
    <description>The host and port that the MapReduce job tracker runs
    at. If "local", then jobs are run in-process as a single map
    and reduce task.
    </description>
    </property>

    <property>
    <name>mapred.tasktracker.tasks.maximum</name>
    <value>8</value>
    <description>The maximum number of tasks that will be run simultaneously by a
    a task tracker
    </description>
    </property>
    </configuration>

    Please let me know if there is a issue in my configurations ? Any input is valuable for me.

    Thanks,
    Rahul
    On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:

    Do you put the hadoop conf on classpath ? It seems you are still using
    local file system but conncect Hadoop's JobTracker.
    Make sure you set the correct configuration in core-site.xml
    hdfs-site.xml, mapred-site.xml, and put them on classpath.


    On Thu, Aug 26, 2010 at 5:32 PM, rahul wrote:
    Hi ,

    I am trying to integrate Pig with Hadoop for processing of jobs.

    I am able to run Pig in local mode and Hadoop with streaming api perfectly.

    But when I try to run Pig with Hadoop I get follwong Error:

    Pig Stack Trace
    ---------------
    ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out

    org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
    at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
    at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
    at org.apache.pig.PigServer.validate(PigServer.java:930)
    at org.apache.pig.PigServer.compileLp(PigServer.java:910)
    at org.apache.pig.PigServer.compileLp(PigServer.java:871)
    at org.apache.pig.PigServer.compileLp(PigServer.java:852)
    at org.apache.pig.PigServer.execute(PigServer.java:816)
    at org.apache.pig.PigServer.access$100(PigServer.java:105)
    at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
    at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
    at org.apache.pig.Main.main(Main.java:391)
    Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
    at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
    at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
    at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
    at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
    at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
    ... 16 more
    Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
    at org.apache.hadoop.ipc.Client.call(Client.java:743)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
    at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
    at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
    at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
    at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
    ... 24 more
    Caused by: java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
    ================================================================================

    Did anyone got the same error. I think it related to connection between pig and hadoop.

    Can someone tell me how to connect Pig and hadoop.

    Thanks.


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang
  • Rahul at Aug 27, 2010 at 2:50 am
    Hi ,

    I tried the grunt shell as well but that also does not connects to hadoop. It throws a warning and runs the job in standalone mode. So it tried it using the pig.jar.

    Do you have any further suggestion on that ?

    Rahul
    On Aug 26, 2010, at 7:23 PM, Jeff Zhang wrote:

    Connect to 9001 is right, this is jobtracker's ipc port while 50030
    is its http server port.
    And have you ever try to run the grunt shell ?
    On Thu, Aug 26, 2010 at 7:12 PM, rahul wrote:
    Hi Jeff,

    I can connect to the jobtracker web UI using the following URL : http://localhost:50030/jobtracker.jsp

    And also I can see jobs which I ran directly using the streaming api on hadoop.

    I also see it tries to connect to localhost/127.0.0.1:9001 which I have specified in the hadoop conf file
    and I have also tried changing this location to localhost:50030 but still the error remains the same.

    Can you suggest something further ?

    Thanks,
    Rahul
    On Aug 26, 2010, at 7:07 PM, Jeff Zhang wrote:

    Can you look at the jobtracker log or access jobtracker web ui ?
    It seems you can not connect to jobtracker according your log

    "Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001
    failed on local exception: java.io.EOFException"


    On Fri, Aug 27, 2010 at 10:00 AM, rahul wrote:
    Yes they are running.
    On Aug 26, 2010, at 6:59 PM, Jeff Zhang wrote:

    Execute command jps in shell to see whether namenode and jobtracker is
    running correctly.


    On Fri, Aug 27, 2010 at 9:49 AM, rahul wrote:
    Hi Jeff,

    I transferred the hadoop conf files to the pig/conf location but still i get the same error.

    Does the issue is with the configuration files or with the hdfs files system ?

    Can test the connection to hdfs(localhost/127.0.0.1:9001) in some way ?

    Steps I did :

    1. I have formatted initially my local file system using the ./hadoop namenode -format command. I believe this mounts the local file system to HDFS.
    2. Then I configured the hadoop conf files and started ./start-all script.
    3. Started Pig with a custom pig script which should read hdfs as I passed the HADOOP_CONF_DIR as parameter.
    The command was java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main script1-hadoop.pig

    Please let me know if these step miss something ?

    Thanks,
    Rahul

    On Aug 26, 2010, at 6:33 PM, Jeff Zhang wrote:

    Try to put the hadoop xml configuration file to pig/conf folder


    On Thu, Aug 26, 2010 at 6:22 PM, rahul wrote:
    Hi Jeff,

    I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable.

    But I have both Pig and hadoop running at the same machine, so localhost should not make a difference.

    So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.

    Please let me know if my understanding is correct ?

    I am attaching the conf files as well :
    hdfs-site.xml:

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
    <description>The name of the default file system. A URI whose
    scheme and authority determine the FileSystem implementation. The
    uri's scheme determines the config property (fs.SCHEME.impl) naming
    the FileSystem implementation class. The uri's authority is used to
    determine the host, port, etc. for a filesystem.</description>
    </property>

    <property>
    <name>dfs.replication</name>
    <value>1</value>
    <description>Default block replication.
    The actual number of replications can be specified when the file is created.
    The default is used if replication is not specified in create time.
    </description>
    </property>

    </configuration>

    core-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
    <description>A base for other temporary directories.</description>
    </property>
    </configuration>

    mapred-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
    <description>The host and port that the MapReduce job tracker runs
    at. If "local", then jobs are run in-process as a single map
    and reduce task.
    </description>
    </property>

    <property>
    <name>mapred.tasktracker.tasks.maximum</name>
    <value>8</value>
    <description>The maximum number of tasks that will be run simultaneously by a
    a task tracker
    </description>
    </property>
    </configuration>

    Please let me know if there is a issue in my configurations ? Any input is valuable for me.

    Thanks,
    Rahul
    On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:

    Do you put the hadoop conf on classpath ? It seems you are still using
    local file system but conncect Hadoop's JobTracker.
    Make sure you set the correct configuration in core-site.xml
    hdfs-site.xml, mapred-site.xml, and put them on classpath.


    On Thu, Aug 26, 2010 at 5:32 PM, rahul wrote:
    Hi ,

    I am trying to integrate Pig with Hadoop for processing of jobs.

    I am able to run Pig in local mode and Hadoop with streaming api perfectly.

    But when I try to run Pig with Hadoop I get follwong Error:

    Pig Stack Trace
    ---------------
    ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out

    org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
    at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
    at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
    at org.apache.pig.PigServer.validate(PigServer.java:930)
    at org.apache.pig.PigServer.compileLp(PigServer.java:910)
    at org.apache.pig.PigServer.compileLp(PigServer.java:871)
    at org.apache.pig.PigServer.compileLp(PigServer.java:852)
    at org.apache.pig.PigServer.execute(PigServer.java:816)
    at org.apache.pig.PigServer.access$100(PigServer.java:105)
    at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
    at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
    at org.apache.pig.Main.main(Main.java:391)
    Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
    at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
    at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
    at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
    at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
    at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
    ... 16 more
    Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
    at org.apache.hadoop.ipc.Client.call(Client.java:743)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
    at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
    at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
    at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
    at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
    ... 24 more
    Caused by: java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
    ================================================================================

    Did anyone got the same error. I think it related to connection between pig and hadoop.

    Can someone tell me how to connect Pig and hadoop.

    Thanks.


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang
  • Jeff Zhang at Aug 27, 2010 at 3:17 am
    It's weird. I doubt maybe there's other configuration file on your
    class path which override your real conf files.
    Could you download a new pig release and follow the instructions on
    http://hadoop.apache.org/pig/docs/r0.7.0/setup.html on a new
    environment ?


    On Thu, Aug 26, 2010 at 7:49 PM, rahul wrote:
    Hi ,

    I tried the grunt shell as well but that also does not connects to hadoop. It throws a warning and runs the job in standalone mode. So it tried it using the pig.jar.

    Do you have any further suggestion on that ?

    Rahul
    On Aug 26, 2010, at 7:23 PM, Jeff Zhang wrote:

    Connect to 9001 is right,  this is jobtracker's ipc port while 5003
    is its http server port.
    And have you ever try to run the grunt shell ?
    On Thu, Aug 26, 2010 at 7:12 PM, rahul wrote:
    Hi Jeff,

    I can connect to the jobtracker web UI using the following URL : http://localhost:50030/jobtracker.jsp

    And also I can see jobs which I ran directly using the streaming api on hadoop.

    I also see it tries to connect to localhost/127.0.0.1:9001 which I have specified in the hadoop conf file
    and I have also tried changing this location to localhost:50030 but still the error remains the same.

    Can you suggest something further ?

    Thanks,
    Rahul
    On Aug 26, 2010, at 7:07 PM, Jeff Zhang wrote:

    Can you look at the jobtracker log or access jobtracker web ui ?
    It seems you can  not connect to jobtracker according your log

    "Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001
    failed on local exception: java.io.EOFException"


    On Fri, Aug 27, 2010 at 10:00 AM, rahul wrote:
    Yes they are running.
    On Aug 26, 2010, at 6:59 PM, Jeff Zhang wrote:

    Execute command jps in shell to see whether namenode and jobtracker is
    running correctly.


    On Fri, Aug 27, 2010 at 9:49 AM, rahul wrote:
    Hi Jeff,

    I transferred the hadoop conf files to the pig/conf location but still i get the same error.

    Does the issue is with the configuration files or with the hdfs files system ?

    Can test the connection to hdfs(localhost/127.0.0.1:9001) in some way ?

    Steps I did :

    1. I have formatted initially my local file system using the ./hadoop namenode -format command. I believe this mounts the local file system to HDFS.
    2. Then I configured the hadoop conf files and started ./start-all script.
    3. Started Pig with a custom pig script which should read hdfs as I passed the HADOOP_CONF_DIR as parameter.
    The command was java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main script1-hadoop.pig

    Please let me know if these step miss something ?

    Thanks,
    Rahul

    On Aug 26, 2010, at 6:33 PM, Jeff Zhang wrote:

    Try to put the hadoop xml configuration file to pig/conf folder


    On Thu, Aug 26, 2010 at 6:22 PM, rahul wrote:
    Hi Jeff,

    I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable.

    But I have both Pig and hadoop running at the same machine, so localhost should not make a difference.

    So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.

    Please let me know if my understanding is correct ?

    I am attaching the conf files as well :
    hdfs-site.xml:

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
    <description>The name of the default file system.  A URI whose
    scheme and authority determine the FileSystem implementation.  The
    uri's scheme determines the config property (fs.SCHEME.impl) naming
    the FileSystem implementation class.  The uri's authority is used to
    determine the host, port, etc. for a filesystem.</description>
    </property>

    <property>
    <name>dfs.replication</name>
    <value>1</value>
    <description>Default block replication.
    The actual number of replications can be specified when the file is created.
    The default is used if replication is not specified in create time.
    </description>
    </property>

    </configuration>

    core-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
    <description>A base for other temporary directories.</description>
    </property>
    </configuration>

    mapred-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
    <description>The host and port that the MapReduce job tracker runs
    at. If "local", then jobs are run in-process as a single map
    and reduce task.
    </description>
    </property>

    <property>
    <name>mapred.tasktracker.tasks.maximum</name>
    <value>8</value>
    <description>The maximum number of tasks that will be run simultaneously by a
    a task tracker
    </description>
    </property>
    </configuration>

    Please let me know if there is a issue in my configurations ? Any input is valuable for me.

    Thanks,
    Rahul
    On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:

    Do you put the hadoop conf on classpath ? It seems you are still using
    local file system but conncect Hadoop's JobTracker.
    Make sure you set the correct configuration in core-site.xml
    hdfs-site.xml, mapred-site.xml, and put them on classpath.


    On Thu, Aug 26, 2010 at 5:32 PM, rahul wrote:
    Hi ,

    I am trying to integrate Pig with Hadoop for processing of jobs.

    I am able to run Pig in local mode and Hadoop with streaming api perfectly.

    But when I try to run Pig with Hadoop I get follwong Error:

    Pig Stack Trace
    ---------------
    ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out

    org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
    at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
    at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
    at org.apache.pig.PigServer.validate(PigServer.java:930)
    at org.apache.pig.PigServer.compileLp(PigServer.java:910)
    at org.apache.pig.PigServer.compileLp(PigServer.java:871)
    at org.apache.pig.PigServer.compileLp(PigServer.java:852)
    at org.apache.pig.PigServer.execute(PigServer.java:816)
    at org.apache.pig.PigServer.access$100(PigServer.java:105)
    at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
    at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
    at org.apache.pig.Main.main(Main.java:391)
    Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
    at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
    at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
    at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
    at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
    at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
    ... 16 more
    Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
    at org.apache.hadoop.ipc.Client.call(Client.java:743)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
    at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
    at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
    at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
    at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
    ... 24 more
    Caused by: java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
    ================================================================================

    Did anyone got the same error. I think it related to connection between pig and hadoop.

    Can someone tell me how to connect Pig and hadoop.

    Thanks.


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang
  • Rahul at Aug 27, 2010 at 4:17 pm
    Sure Zhang.

    Thanks for help.

    -Rahul
    On Aug 26, 2010, at 8:17 PM, Jeff Zhang wrote:

    It's weird. I doubt maybe there's other configuration file on your
    class path which override your real conf files.
    Could you download a new pig release and follow the instructions on
    http://hadoop.apache.org/pig/docs/r0.7.0/setup.html on a new
    environment ?


    On Thu, Aug 26, 2010 at 7:49 PM, rahul wrote:
    Hi ,

    I tried the grunt shell as well but that also does not connects to hadoop. It throws a warning and runs the job in standalone mode. So it tried it using the pig.jar.

    Do you have any further suggestion on that ?

    Rahul
    On Aug 26, 2010, at 7:23 PM, Jeff Zhang wrote:

    Connect to 9001 is right, this is jobtracker's ipc port while 50030
    is its http server port.
    And have you ever try to run the grunt shell ?
    On Thu, Aug 26, 2010 at 7:12 PM, rahul wrote:
    Hi Jeff,

    I can connect to the jobtracker web UI using the following URL : http://localhost:50030/jobtracker.jsp

    And also I can see jobs which I ran directly using the streaming api on hadoop.

    I also see it tries to connect to localhost/127.0.0.1:9001 which I have specified in the hadoop conf file
    and I have also tried changing this location to localhost:50030 but still the error remains the same.

    Can you suggest something further ?

    Thanks,
    Rahul
    On Aug 26, 2010, at 7:07 PM, Jeff Zhang wrote:

    Can you look at the jobtracker log or access jobtracker web ui ?
    It seems you can not connect to jobtracker according your log

    "Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001
    failed on local exception: java.io.EOFException"


    On Fri, Aug 27, 2010 at 10:00 AM, rahul wrote:
    Yes they are running.
    On Aug 26, 2010, at 6:59 PM, Jeff Zhang wrote:

    Execute command jps in shell to see whether namenode and jobtracker is
    running correctly.


    On Fri, Aug 27, 2010 at 9:49 AM, rahul wrote:
    Hi Jeff,

    I transferred the hadoop conf files to the pig/conf location but still i get the same error.

    Does the issue is with the configuration files or with the hdfs files system ?

    Can test the connection to hdfs(localhost/127.0.0.1:9001) in some way ?

    Steps I did :

    1. I have formatted initially my local file system using the ./hadoop namenode -format command. I believe this mounts the local file system to HDFS.
    2. Then I configured the hadoop conf files and started ./start-all script.
    3. Started Pig with a custom pig script which should read hdfs as I passed the HADOOP_CONF_DIR as parameter.
    The command was java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main script1-hadoop.pig

    Please let me know if these step miss something ?

    Thanks,
    Rahul

    On Aug 26, 2010, at 6:33 PM, Jeff Zhang wrote:

    Try to put the hadoop xml configuration file to pig/conf folder


    On Thu, Aug 26, 2010 at 6:22 PM, rahul wrote:
    Hi Jeff,

    I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable.

    But I have both Pig and hadoop running at the same machine, so localhost should not make a difference.

    So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.

    Please let me know if my understanding is correct ?

    I am attaching the conf files as well :
    hdfs-site.xml:

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
    <description>The name of the default file system. A URI whose
    scheme and authority determine the FileSystem implementation. The
    uri's scheme determines the config property (fs.SCHEME.impl) naming
    the FileSystem implementation class. The uri's authority is used to
    determine the host, port, etc. for a filesystem.</description>
    </property>

    <property>
    <name>dfs.replication</name>
    <value>1</value>
    <description>Default block replication.
    The actual number of replications can be specified when the file is created.
    The default is used if replication is not specified in create time.
    </description>
    </property>

    </configuration>

    core-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
    <description>A base for other temporary directories.</description>
    </property>
    </configuration>

    mapred-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
    <description>The host and port that the MapReduce job tracker runs
    at. If "local", then jobs are run in-process as a single map
    and reduce task.
    </description>
    </property>

    <property>
    <name>mapred.tasktracker.tasks.maximum</name>
    <value>8</value>
    <description>The maximum number of tasks that will be run simultaneously by a
    a task tracker
    </description>
    </property>
    </configuration>

    Please let me know if there is a issue in my configurations ? Any input is valuable for me.

    Thanks,
    Rahul
    On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:

    Do you put the hadoop conf on classpath ? It seems you are still using
    local file system but conncect Hadoop's JobTracker.
    Make sure you set the correct configuration in core-site.xml
    hdfs-site.xml, mapred-site.xml, and put them on classpath.


    On Thu, Aug 26, 2010 at 5:32 PM, rahul wrote:
    Hi ,

    I am trying to integrate Pig with Hadoop for processing of jobs.

    I am able to run Pig in local mode and Hadoop with streaming api perfectly.

    But when I try to run Pig with Hadoop I get follwong Error:

    Pig Stack Trace
    ---------------
    ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out

    org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
    at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
    at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
    at org.apache.pig.PigServer.validate(PigServer.java:930)
    at org.apache.pig.PigServer.compileLp(PigServer.java:910)
    at org.apache.pig.PigServer.compileLp(PigServer.java:871)
    at org.apache.pig.PigServer.compileLp(PigServer.java:852)
    at org.apache.pig.PigServer.execute(PigServer.java:816)
    at org.apache.pig.PigServer.access$100(PigServer.java:105)
    at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
    at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
    at org.apache.pig.Main.main(Main.java:391)
    Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
    at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
    at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
    at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
    at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
    at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
    at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
    ... 16 more
    Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
    at org.apache.hadoop.ipc.Client.call(Client.java:743)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
    at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
    at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
    at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
    at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
    at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
    ... 24 more
    Caused by: java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
    ================================================================================

    Did anyone got the same error. I think it related to connection between pig and hadoop.

    Can someone tell me how to connect Pig and hadoop.

    Thanks.


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedAug 27, '10 at 12:33a
activeAug 27, '10 at 4:17p
posts16
users3
websitepig.apache.org

People

Translate

site design / logo © 2022 Grokbase