Grokbase Groups Pig user May 2011
FAQ
Our production environment has undergone software upgrades and now I'm working with:

Hadoop 0.20.2-cdh3u0
Apache Pig version 0.8.0-cdh3u0
HBase 0.90.1-cdh3u0

My research indicates that these all OUGHT to play together nicely... I would kill for someone to
publish a compatibility grid for the misc versions.

Anyway, I'm trying to load from HBase :

visitors = LOAD 'hbase://track' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('open:browser
open:ip open:os open:createdDate', '-caching 1000')
as (browser:chararray, ipAddress:chararray,
os:chararray, createdDate:chararray);

And I'm receiving the following error, which searching around seems to be indicative of
compatibility issues between pig and hadoop:

ERROR 2999: Unexpected internal error. Failed to create DataStorage

java.lang.RuntimeException: Failed to create DataStorage
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.(HExecutionEngine.java:196)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:116)
at org.apache.pig.impl.PigContext.connect(PigContext.java:184)
at org.apache.pig.PigServer.(PigServer.java:228)
at org.apache.pig.tools.grunt.Grunt.(Main.java:545)
at org.apache.pig.Main.main(Main.java:108)
Caused by: java.io.IOException: Call to hadoop001/10.0.0.51:8020 failed on local exception:
java.io.EOFException
at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
at org.apache.hadoop.ipc.Client.call(Client.java:743)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy0.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:170)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)
... 9 more
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)

Am I actually running incompatible versions? Should I bug the Cloudera folks?
--
Jameson Lopp
Software Engineer
Bronto Software, Inc.

Search Discussions

  • Dmitriy Ryaboy at May 25, 2011 at 9:59 pm
    Use Pig 0.8.1

    D
    On Wed, May 25, 2011 at 2:03 PM, Jameson Lopp wrote:
    Our production environment has undergone software upgrades and now I'm
    working with:

    Hadoop 0.20.2-cdh3u0
    Apache Pig version 0.8.0-cdh3u0
    HBase 0.90.1-cdh3u0

    My research indicates that these all OUGHT to play together nicely... I
    would kill for someone to publish a compatibility grid for the misc
    versions.

    Anyway, I'm trying to load from HBase :

    visitors = LOAD 'hbase://track' USING
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('open:browser open:ip
    open:os open:createdDate', '-caching 1000')
    as (browser:chararray,
    ipAddress:chararray, os:chararray, createdDate:chararray);

    And I'm receiving the following error, which searching around seems to be
    indicative of compatibility issues between pig and hadoop:

    ERROR 2999: Unexpected internal error. Failed to create DataStorage

    java.lang.RuntimeException: Failed to create DataStorage
    at
    org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
    at
    org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:58)
    at
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:196)
    at
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:116)
    at org.apache.pig.impl.PigContext.connect(PigContext.java:184)
    at org.apache.pig.PigServer.<init>(PigServer.java:243)
    at org.apache.pig.PigServer.<init>(PigServer.java:228)
    at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:46)
    at org.apache.pig.Main.run(Main.java:545)
    at org.apache.pig.Main.main(Main.java:108)
    Caused by: java.io.IOException: Call to hadoop001/10.0.0.51:8020 failed on
    local exception: java.io.EOFException
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
    at org.apache.hadoop.ipc.Client.call(Client.java:743)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at $Proxy0.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
    at
    org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
    at
    org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
    at
    org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
    at
    org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)
    ... 9 more
    Caused by: java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at
    org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)

    Am I actually running incompatible versions? Should I bug the Cloudera
    folks?
    --
    Jameson Lopp
    Software Engineer
    Bronto Software, Inc.
  • Jonathan Coveney at May 25, 2011 at 10:29 pm
    I wasn't trying to use HBase, but I have had the same problem. To get around
    it, I had to create a pig-nohadoop.jar, pass in the hadoop*.jar in the
    classpath, and register antlr in pig. I think it is a pig/hadoop
    compatibility error because I got the same error, but just to be sure, can
    you run normal hadoop jobs that do not use HBase, just to isolate variables?

    2011/5/25 Dmitriy Ryaboy <dvryaboy@gmail.com>
    Use Pig 0.8.1

    D
    On Wed, May 25, 2011 at 2:03 PM, Jameson Lopp wrote:
    Our production environment has undergone software upgrades and now I'm
    working with:

    Hadoop 0.20.2-cdh3u0
    Apache Pig version 0.8.0-cdh3u0
    HBase 0.90.1-cdh3u0

    My research indicates that these all OUGHT to play together nicely... I
    would kill for someone to publish a compatibility grid for the misc
    versions.

    Anyway, I'm trying to load from HBase :

    visitors = LOAD 'hbase://track' USING
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('open:browser open:ip
    open:os open:createdDate', '-caching 1000')
    as (browser:chararray,
    ipAddress:chararray, os:chararray, createdDate:chararray);

    And I'm receiving the following error, which searching around seems to be
    indicative of compatibility issues between pig and hadoop:

    ERROR 2999: Unexpected internal error. Failed to create DataStorage

    java.lang.RuntimeException: Failed to create DataStorage
    at
    org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
    at
    org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:58)
    at
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:196)
    at
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:116)
    at org.apache.pig.impl.PigContext.connect(PigContext.java:184)
    at org.apache.pig.PigServer.<init>(PigServer.java:243)
    at org.apache.pig.PigServer.<init>(PigServer.java:228)
    at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:46)
    at org.apache.pig.Main.run(Main.java:545)
    at org.apache.pig.Main.main(Main.java:108)
    Caused by: java.io.IOException: Call to hadoop001/10.0.0.51:8020 failed on
    local exception: java.io.EOFException
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
    at org.apache.hadoop.ipc.Client.call(Client.java:743)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at $Proxy0.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
    at
    org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
    at
    org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
    at
    org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
    at
    org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)
    ... 9 more
    Caused by: java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at
    org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)

    Am I actually running incompatible versions? Should I bug the Cloudera
    folks?
    --
    Jameson Lopp
    Software Engineer
    Bronto Software, Inc.
  • Jameson Lopp at May 26, 2011 at 2:50 pm
    Just following up - the root cause of the problem seems to have been remnants of old hadoop / hbase
    versions on the machine. Once I got past the DataStorage error, Pig started throwing
    "java.lang.RuntimeException: hbase-default.xml file seems to be for and old version of HBase (null),
    this version is 0.90.1-cdh3u0" because it was loading that config file that is no longer used.

    In summary, a machine with a clean install of:
    Hadoop 0.20.2-cdh3u0
    Apache Pig version 0.8.0-cdh3u0
    HBase 0.90.1-cdh3u0

    runs just fine without any crazy workarounds needed. I no longer have to manually register jars in
    the pig script or turn split combination off in order for hbase loading to work!
    --
    Jameson Lopp
    Software Engineer
    Bronto Software, Inc.
    On 05/25/2011 06:29 PM, Jonathan Coveney wrote:
    I wasn't trying to use HBase, but I have had the same problem. To get around
    it, I had to create a pig-nohadoop.jar, pass in the hadoop*.jar in the
    classpath, and register antlr in pig. I think it is a pig/hadoop
    compatibility error because I got the same error, but just to be sure, can
    you run normal hadoop jobs that do not use HBase, just to isolate variables?

    2011/5/25 Dmitriy Ryaboy<dvryaboy@gmail.com>
    Use Pig 0.8.1

    D

    On Wed, May 25, 2011 at 2:03 PM, Jameson Loppwrote:
    Our production environment has undergone software upgrades and now I'm
    working with:

    Hadoop 0.20.2-cdh3u0
    Apache Pig version 0.8.0-cdh3u0
    HBase 0.90.1-cdh3u0

    My research indicates that these all OUGHT to play together nicely... I
    would kill for someone to publish a compatibility grid for the misc
    versions.

    Anyway, I'm trying to load from HBase :

    visitors = LOAD 'hbase://track' USING
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('open:browser open:ip
    open:os open:createdDate', '-caching 1000')
    as (browser:chararray,
    ipAddress:chararray, os:chararray, createdDate:chararray);

    And I'm receiving the following error, which searching around seems to be
    indicative of compatibility issues between pig and hadoop:

    ERROR 2999: Unexpected internal error. Failed to create DataStorage

    java.lang.RuntimeException: Failed to create DataStorage
    at
    org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
    at
    org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:58)
    at
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:196)
    at
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:116)
    at org.apache.pig.impl.PigContext.connect(PigContext.java:184)
    at org.apache.pig.PigServer.<init>(PigServer.java:243)
    at org.apache.pig.PigServer.<init>(PigServer.java:228)
    at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:46)
    at org.apache.pig.Main.run(Main.java:545)
    at org.apache.pig.Main.main(Main.java:108)
    Caused by: java.io.IOException: Call to hadoop001/10.0.0.51:8020 failed on
    local exception: java.io.EOFException
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
    at org.apache.hadoop.ipc.Client.call(Client.java:743)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at $Proxy0.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
    at
    org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
    at
    org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
    at
    org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
    at
    org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)
    ... 9 more
    Caused by: java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at
    org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)

    Am I actually running incompatible versions? Should I bug the Cloudera
    folks?
    --
    Jameson Lopp
    Software Engineer
    Bronto Software, Inc.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedMay 25, '11 at 9:04p
activeMay 26, '11 at 2:50p
posts4
users3
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase