Grokbase Groups Pig user March 2011
FAQ
First off, I am fairly new to both pig and Hadoop. I am having some problems
connecting pig to a local hadoop cluster. I am getting the following error
in the hadoop namenode logs whenever I try and start up pig:



2011-03-21 17:48:17,299 WARN org.apache.hadoop.ipc.Server: Incorrect header
or version mismatch from 127.0.0.1:60928 got version 3 expected version 4



I am using the cloudera deb repository (CDH3b4) installed according to
https://docs.cloudera.com/display/DOC/CDH3+Installation+Guide. The hadoop
version is 20.2 and running in pseudo distributed mode. I am using pig
0.8.0, both the provided tarball and a clone of the 0.8.0 tag compiled
locally. Any help would be appreciated. I am getting the following error in
the pig logs:



Error before Pig is launched

----------------------------

ERROR 2999: Unexpected internal error. Failed to create DataStorage



java.lang.RuntimeException: Failed to create DataStorage

at
org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.jav
a:75)

at
org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.j
ava:58)

at
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecuti
onEngine.java:213)

at
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecuti
onEngine.java:133)

at org.apache.pig.impl.PigContext.connect(PigContext.java:183)

at org.apache.pig.PigServer.(PigServer.java:214)

at org.apache.pig.tools.grunt.Grunt.(Main.java:462)

at org.apache.pig.Main.main(Main.java:107)

Caused by: java.io.IOException: Call to localhost/127.0.0.1:8020 failed on
local exception: java.io.EOFException

at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)

at org.apache.hadoop.ipc.Client.call(Client.java:743)

at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)

at $Proxy0.getProtocolVersion(Unknown Source)

at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)

at
org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)

at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:170)

at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSyste
m.java:82)

at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)

at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)

at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)

at
org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.jav
a:72)

... 9 more

Caused by: java.io.EOFException

at java.io.DataInputStream.readInt(DataInputStream.java:375)

at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)

at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)

============================================================================
====

Search Discussions

  • Josh Devins at Mar 22, 2011 at 8:59 am
    Hey Dan

    This usually means that you have mismatched Hadoop jar versions somewhere. I
    encountered a similar problem with Oozie trying to talk to HDFS. Maybe try
    posting to the Hadoop user list as well. In general, you should just need
    the same hadoop-core.jar as on your cluster when you run Pig. From Pig all
    you should need is pig.jar (and piggybank, etc.) and the pre-compiled jar
    should suffice.

    Cheers,

    Josh


    On 21 March 2011 22:56, Dan Hendry wrote:

    First off, I am fairly new to both pig and Hadoop. I am having some
    problems
    connecting pig to a local hadoop cluster. I am getting the following error
    in the hadoop namenode logs whenever I try and start up pig:



    2011-03-21 17:48:17,299 WARN org.apache.hadoop.ipc.Server: Incorrect header
    or version mismatch from 127.0.0.1:60928 got version 3 expected version 4



    I am using the cloudera deb repository (CDH3b4) installed according to
    https://docs.cloudera.com/display/DOC/CDH3+Installation+Guide. The hadoop
    version is 20.2 and running in pseudo distributed mode. I am using pig
    0.8.0, both the provided tarball and a clone of the 0.8.0 tag compiled
    locally. Any help would be appreciated. I am getting the following error in
    the pig logs:



    Error before Pig is launched

    ----------------------------

    ERROR 2999: Unexpected internal error. Failed to create DataStorage



    java.lang.RuntimeException: Failed to create DataStorage

    at

    org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.jav
    a:75)

    at

    org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.j
    ava:58)

    at

    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecuti
    onEngine.java:213)

    at

    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecuti
    onEngine.java:133)

    at org.apache.pig.impl.PigContext.connect(PigContext.java:183)

    at org.apache.pig.PigServer.<init>(PigServer.java:225)

    at org.apache.pig.PigServer.<init>(PigServer.java:214)

    at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:55)

    at org.apache.pig.Main.run(Main.java:462)

    at org.apache.pig.Main.main(Main.java:107)

    Caused by: java.io.IOException: Call to localhost/127.0.0.1:8020 failed on
    local exception: java.io.EOFException

    at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)

    at org.apache.hadoop.ipc.Client.call(Client.java:743)

    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)

    at $Proxy0.getProtocolVersion(Unknown Source)

    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)

    at
    org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)

    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)

    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)

    at

    org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSyste
    m.java:82)

    at
    org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)

    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)

    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)

    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)

    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)

    at

    org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.jav
    a:72)

    ... 9 more

    Caused by: java.io.EOFException

    at java.io.DataInputStream.readInt(DataInputStream.java:375)

    at
    org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)

    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)


    ============================================================================
    ====
  • Dan Hendry at Mar 22, 2011 at 7:07 pm
    Thanks for the info

    I have not yet verified with the hadoop list but it looks like the CDH3b4 0.20.2 hadoop-core.jar is incompatible or different from the hadoop-core.jar that the pig build script pulls in via ivy. I was able to solve my problem by building pig without hadoop (ant jar-withouthadoop) then manually including the 'correct' hadoop-core.jar in the class path. This is a bug but I don’t know enough about the community to say who's; perhaps Cloudera's?

    I would like to point out one bug I found in the Pig build.xml.

    The main jar target (buildJar) has the following dependencies:
    <zipfileset src="${ivy.lib.dir}/hadoop-core-${hadoop-core.version}.jar" />
    <zipfileset src="${lib.dir}/${automaton.jarfile}" />
    <zipfileset src="${ivy.lib.dir}/junit-${junit.version}.jar" />
    <zipfileset src="${ivy.lib.dir}/jsch-${jsch.version}.jar" />
    <zipfileset src="${ivy.lib.dir}/jline-${jline.version}.jar" />
    <zipfileset src="${ivy.lib.dir}/jackson-mapper-asl-${jackson.version}.jar" />
    <zipfileset src="${ivy.lib.dir}/jackson-core-asl-${jackson.version}.jar" />
    <zipfileset src="${ivy.lib.dir}/joda-time-${joda-time.version}.jar" />
    <zipfileset src="${ivy.lib.dir}/${guava.jar}" />
    <zipgroupfileset dir="${ivy.lib.dir}" includes="commons*.jar"/>
    <zipgroupfileset dir="${ivy.lib.dir}" includes="log4j*.jar"/>
    <zipgroupfileset dir="${ivy.lib.dir}" includes="jsp-api*.jar"/>

    Yet in the 0.8.0 tag, the non-hadoop target (jar-withouthadoop) has:
    <zipfileset src="${ivy.lib.dir}/junit-${junit.version}.jar" />
    <zipfileset src="${ivy.lib.dir}/jsch-${jsch.version}.jar" />
    <zipfileset src="${ivy.lib.dir}/jline-${jline.version}.jar" />
    <zipfileset src="${ivy.lib.dir}/jackson-mapper-asl-${jackson.version}.jar" />
    <zipfileset src="${ivy.lib.dir}/jackson-core-asl-${jackson.version}.jar" />
    <zipfileset src="${ivy.lib.dir}/joda-time-${joda-time.version}.jar" />
    <zipfileset src="${lib.dir}/${automaton.jarfile}" />

    Should it not be the same with the exception of the first line? Among other things, withouthadoop jar is missing the logging dependencies.


    Dan

    -----Original Message-----
    From: joshdevins@gmail.com On Behalf Of Josh Devins
    Sent: March-22-11 4:59
    To: user@pig.apache.org
    Subject: Re: Incorrect header or version mismatch

    Hey Dan

    This usually means that you have mismatched Hadoop jar versions somewhere. I
    encountered a similar problem with Oozie trying to talk to HDFS. Maybe try
    posting to the Hadoop user list as well. In general, you should just need
    the same hadoop-core.jar as on your cluster when you run Pig. From Pig all
    you should need is pig.jar (and piggybank, etc.) and the pre-compiled jar
    should suffice.

    Cheers,

    Josh


    On 21 March 2011 22:56, Dan Hendry wrote:

    First off, I am fairly new to both pig and Hadoop. I am having some
    problems
    connecting pig to a local hadoop cluster. I am getting the following error
    in the hadoop namenode logs whenever I try and start up pig:



    2011-03-21 17:48:17,299 WARN org.apache.hadoop.ipc.Server: Incorrect header
    or version mismatch from 127.0.0.1:60928 got version 3 expected version 4



    I am using the cloudera deb repository (CDH3b4) installed according to
    https://docs.cloudera.com/display/DOC/CDH3+Installation+Guide. The hadoop
    version is 20.2 and running in pseudo distributed mode. I am using pig
    0.8.0, both the provided tarball and a clone of the 0.8.0 tag compiled
    locally. Any help would be appreciated. I am getting the following error in
    the pig logs:



    Error before Pig is launched

    ----------------------------

    ERROR 2999: Unexpected internal error. Failed to create DataStorage



    java.lang.RuntimeException: Failed to create DataStorage

    at

    org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.jav
    a:75)

    at

    org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.j
    ava:58)

    at

    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecuti
    onEngine.java:213)

    at

    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecuti
    onEngine.java:133)

    at org.apache.pig.impl.PigContext.connect(PigContext.java:183)

    at org.apache.pig.PigServer.<init>(PigServer.java:225)

    at org.apache.pig.PigServer.<init>(PigServer.java:214)

    at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:55)

    at org.apache.pig.Main.run(Main.java:462)

    at org.apache.pig.Main.main(Main.java:107)

    Caused by: java.io.IOException: Call to localhost/127.0.0.1:8020 failed on
    local exception: java.io.EOFException

    at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)

    at org.apache.hadoop.ipc.Client.call(Client.java:743)

    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)

    at $Proxy0.getProtocolVersion(Unknown Source)

    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)

    at
    org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)

    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)

    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)

    at

    org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSyste
    m.java:82)

    at
    org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)

    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)

    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)

    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)

    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)

    at

    org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.jav
    a:72)

    ... 9 more

    Caused by: java.io.EOFException

    at java.io.DataInputStream.readInt(DataInputStream.java:375)

    at
    org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)

    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)


    ============================================================================
    ====
    No virus found in this incoming message.
    Checked by AVG - www.avg.com
    Version: 9.0.894 / Virus Database: 271.1.1/3522 - Release Date: 03/22/11 03:34:00
  • Dmitriy Ryaboy at Mar 22, 2011 at 8:12 pm
    Cloudera packages a version of Pig that works with their distribution; you
    are using a different version, which bundles its own hadoop, and there is a
    conflict.. to be expected, I think, not a bug. This is why the
    jar-withouthadoop target is provided.

    D
    On Tue, Mar 22, 2011 at 12:07 PM, Dan Hendry wrote:

    Thanks for the info

    I have not yet verified with the hadoop list but it looks like the CDH3b4
    0.20.2 hadoop-core.jar is incompatible or different from the hadoop-core.jar
    that the pig build script pulls in via ivy. I was able to solve my problem
    by building pig without hadoop (ant jar-withouthadoop) then manually
    including the 'correct' hadoop-core.jar in the class path. This is a bug but
    I don’t know enough about the community to say who's; perhaps Cloudera's?

    I would like to point out one bug I found in the Pig build.xml.

    The main jar target (buildJar) has the following dependencies:
    <zipfileset
    src="${ivy.lib.dir}/hadoop-core-${hadoop-core.version}.jar" />
    <zipfileset src="${lib.dir}/${automaton.jarfile}" />
    <zipfileset src="${ivy.lib.dir}/junit-${junit.version}.jar" />
    <zipfileset src="${ivy.lib.dir}/jsch-${jsch.version}.jar" />
    <zipfileset src="${ivy.lib.dir}/jline-${jline.version}.jar" />
    <zipfileset
    src="${ivy.lib.dir}/jackson-mapper-asl-${jackson.version}.jar" />
    <zipfileset
    src="${ivy.lib.dir}/jackson-core-asl-${jackson.version}.jar" />
    <zipfileset
    src="${ivy.lib.dir}/joda-time-${joda-time.version}.jar" />
    <zipfileset src="${ivy.lib.dir}/${guava.jar}" />
    <zipgroupfileset dir="${ivy.lib.dir}" includes="commons*.jar"/>
    <zipgroupfileset dir="${ivy.lib.dir}" includes="log4j*.jar"/>
    <zipgroupfileset dir="${ivy.lib.dir}" includes="jsp-api*.jar"/>

    Yet in the 0.8.0 tag, the non-hadoop target (jar-withouthadoop) has:
    <zipfileset src="${ivy.lib.dir}/junit-${junit.version}.jar" />
    <zipfileset src="${ivy.lib.dir}/jsch-${jsch.version}.jar" />
    <zipfileset src="${ivy.lib.dir}/jline-${jline.version}.jar" />
    <zipfileset
    src="${ivy.lib.dir}/jackson-mapper-asl-${jackson.version}.jar" />
    <zipfileset
    src="${ivy.lib.dir}/jackson-core-asl-${jackson.version}.jar" />
    <zipfileset
    src="${ivy.lib.dir}/joda-time-${joda-time.version}.jar" />
    <zipfileset src="${lib.dir}/${automaton.jarfile}" />

    Should it not be the same with the exception of the first line? Among other
    things, withouthadoop jar is missing the logging dependencies.


    Dan

    -----Original Message-----
    From: joshdevins@gmail.com On Behalf Of Josh
    Devins
    Sent: March-22-11 4:59
    To: user@pig.apache.org
    Subject: Re: Incorrect header or version mismatch

    Hey Dan

    This usually means that you have mismatched Hadoop jar versions somewhere.
    I
    encountered a similar problem with Oozie trying to talk to HDFS. Maybe try
    posting to the Hadoop user list as well. In general, you should just need
    the same hadoop-core.jar as on your cluster when you run Pig. From Pig all
    you should need is pig.jar (and piggybank, etc.) and the pre-compiled jar
    should suffice.

    Cheers,

    Josh


    On 21 March 2011 22:56, Dan Hendry wrote:

    First off, I am fairly new to both pig and Hadoop. I am having some
    problems
    connecting pig to a local hadoop cluster. I am getting the following error
    in the hadoop namenode logs whenever I try and start up pig:



    2011-03-21 17:48:17,299 WARN org.apache.hadoop.ipc.Server: Incorrect header
    or version mismatch from 127.0.0.1:60928 got version 3 expected version 4


    I am using the cloudera deb repository (CDH3b4) installed according to
    https://docs.cloudera.com/display/DOC/CDH3+Installation+Guide. The hadoop
    version is 20.2 and running in pseudo distributed mode. I am using pig
    0.8.0, both the provided tarball and a clone of the 0.8.0 tag compiled
    locally. Any help would be appreciated. I am getting the following error in
    the pig logs:



    Error before Pig is launched

    ----------------------------

    ERROR 2999: Unexpected internal error. Failed to create DataStorage



    java.lang.RuntimeException: Failed to create DataStorage

    at

    org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.jav
    a:75)

    at

    org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.j
    ava:58)

    at

    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecuti
    onEngine.java:213)

    at

    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecuti
    onEngine.java:133)

    at org.apache.pig.impl.PigContext.connect(PigContext.java:183)

    at org.apache.pig.PigServer.<init>(PigServer.java:225)

    at org.apache.pig.PigServer.<init>(PigServer.java:214)

    at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:55)

    at org.apache.pig.Main.run(Main.java:462)

    at org.apache.pig.Main.main(Main.java:107)

    Caused by: java.io.IOException: Call to localhost/127.0.0.1:8020 failed on
    local exception: java.io.EOFException

    at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)

    at org.apache.hadoop.ipc.Client.call(Client.java:743)

    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)

    at $Proxy0.getProtocolVersion(Unknown Source)

    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)

    at
    org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)

    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)

    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)

    at

    org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSyste
    m.java:82)

    at
    org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)

    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)

    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)

    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)

    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)

    at

    org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.jav
    a:72)

    ... 9 more

    Caused by: java.io.EOFException

    at java.io.DataInputStream.readInt(DataInputStream.java:375)

    at
    org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)

    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)


    ============================================================================
    ====
    No virus found in this incoming message.
    Checked by AVG - www.avg.com
    Version: 9.0.894 / Virus Database: 271.1.1/3522 - Release Date: 03/22/11
    03:34:00

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedMar 21, '11 at 9:57p
activeMar 22, '11 at 8:12p
posts4
users3
websitepig.apache.org

People

Translate

site design / logo © 2022 Grokbase