Grokbase Groups Pig user March 2011
FAQ
Running Hbase 0.20-0.20.3-1.cloudera - I've tried running this with Pig 0.8 from August 2010 and
from trunk on March 25 2011. Do I need to use an older version?

My pig script is trying to load from hbase via this command:
data = LOAD 'hbase://track' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('open:browser
open:ip open:os', '-caching 1000') as (browser:chararray, ipAddress:chararray, os:chararray);

But the job fails trying to load the data:
Input(s):
Failed to read data from "hbase://track"

When I look at my map reduce job, it fails every time with a ClassNotFoundException:
java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.mapreduce.TableSplit
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:197)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:586)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.mapreduce.TableSplit
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:907)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:185)
... 5 more

Now, perhaps this issue is better suited for a hadoop / map reduce / cloudera mailing list, but
every node in my hadoop cluster has /usr/local/hadoop/lib/hbase-0.20.3-1.cloudera.jar which includes
the TableSplit class... so it seems to me that it should have no problem loading it.

I've run out of ideas at this point - anyone have suggestions? Thanks!
--
Jameson Lopp
Software Engineer
Bronto Software, Inc.

Search Discussions

  • Bill Graham at Mar 25, 2011 at 4:07 pm
    The Pig trunk and Pig 0.8.0 branch both require HBase >= 0.89 (see
    PIG-1680). The Pig 0.8.0 release requires < 0.89 though so you should
    focus on that version of Pig. Or better yet, upgrade HBase to 0.90.1
    if possible.
    On Fri, Mar 25, 2011 at 6:59 AM, Jameson Lopp wrote:
    Running Hbase 0.20-0.20.3-1.cloudera - I've tried running this with Pig 0.8
    from August 2010 and from trunk on March 25 2011. Do I need to use an older
    version?

    My pig script is trying to load from hbase via this command:
    data = LOAD 'hbase://track' USING
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('open:browser open:ip
    open:os', '-caching 1000') as (browser:chararray, ipAddress:chararray,
    os:chararray);

    But the job fails trying to load the data:
    Input(s):
    Failed to read data from "hbase://track"

    When I look at my map reduce job, it fails every time with a
    ClassNotFoundException:
    java.io.IOException: java.lang.ClassNotFoundException:
    org.apache.hadoop.hbase.mapreduce.TableSplit
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:197)
    at
    org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
    at
    org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:586)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)
    Caused by: java.lang.ClassNotFoundException:
    org.apache.hadoop.hbase.mapreduce.TableSplit
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at
    org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:907)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:185)
    ... 5 more

    Now, perhaps this issue is better suited for a hadoop / map reduce /
    cloudera mailing list, but every node in my hadoop cluster has
    /usr/local/hadoop/lib/hbase-0.20.3-1.cloudera.jar which includes the
    TableSplit class... so it seems to me that it should have no problem loading
    it.

    I've run out of ideas at this point - anyone have suggestions? Thanks!
    --
    Jameson Lopp
    Software Engineer
    Bronto Software, Inc.
  • Jameson Lopp at Mar 25, 2011 at 8:02 pm
    Alright, I set up hbase 0.90.1 and pig 0.8.0 and feel like everything is configured, but my pig
    script hangs after connecting to zookeeper... my map reduce job doesn't get scheduled and the
    process looks frozen. Some debug output:

    2011-03-25 15:51:07,344 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged MR job 285
    into MR job 282
    2011-03-25 15:51:07,344 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged MR job 293
    into MR job 282
    2011-03-25 15:51:07,344 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged MR job 313
    into MR job 282
    2011-03-25 15:51:07,345 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Requested
    parallelism of splitter: -1
    2011-03-25 15:51:07,345 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 3
    map-reduce splittees.
    2011-03-25 15:51:07,345 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 3 out of
    total 4 MR operators.
    2011-03-25 15:51:07,345 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size
    after optimization: 8
    2011-03-25 15:51:07,423 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings
    are added to the job
    2011-03-25 15:51:07,434 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler -
    mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2011-03-25 15:51:11,014 [main] DEBUG org.apache.pig.impl.io.InterStorage - Pig Internal storage in use
    2011-03-25 15:51:11,014 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up multi
    store job
    2011-03-25 15:51:11,021 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler -
    BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=0
    2011-03-25 15:51:11,022 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL
    nor default parallelism is set for this job. Setting number of reducers to 1
    2011-03-25 15:51:11,103 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s)
    waiting for submission.
    2011-03-25 15:51:11,504 [Thread-3] DEBUG org.apache.pig.impl.io.InterStorage - Pig Internal storage
    in use
    2011-03-25 15:51:11,611 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete

    [snipped] ...

    2011-03-25 15:47:08,617 [Thread-3-SendThread] INFO org.apache.zookeeper.ClientCnxn - Attempting
    connection to server 10.202.61.184:2181
    2011-03-25 15:47:08,625 [Thread-3-SendThread] INFO org.apache.zookeeper.ClientCnxn - Priming
    connection to java.nio.channels.SocketChannel[connected local=/10.220.25.162:34767
    remote=10.202.61.184:2181]
    2011-03-25 15:47:08,627 [Thread-3-SendThread] INFO org.apache.zookeeper.ClientCnxn - Server
    connection successful

    I found a few threads about people having problems connecting to hbase through zookeeper due to
    misconfiguration / network issues but don't see any where it claims to connect successfully and then
    hangs... weird.

    --
    Jameson Lopp
    Software Engineer
    Bronto Software, Inc.
    On 03/25/2011 12:06 PM, Bill Graham wrote:
    The Pig trunk and Pig 0.8.0 branch both require HBase>= 0.89 (see
    PIG-1680). The Pig 0.8.0 release requires< 0.89 though so you should
    focus on that version of Pig. Or better yet, upgrade HBase to 0.90.1
    if possible.

    On Fri, Mar 25, 2011 at 6:59 AM, Jameson Loppwrote:
    Running Hbase 0.20-0.20.3-1.cloudera - I've tried running this with Pig 0.8
    from August 2010 and from trunk on March 25 2011. Do I need to use an older
    version?

    My pig script is trying to load from hbase via this command:
    data = LOAD 'hbase://track' USING
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('open:browser open:ip
    open:os', '-caching 1000') as (browser:chararray, ipAddress:chararray,
    os:chararray);

    But the job fails trying to load the data:
    Input(s):
    Failed to read data from "hbase://track"

    When I look at my map reduce job, it fails every time with a
    ClassNotFoundException:
    java.io.IOException: java.lang.ClassNotFoundException:
    org.apache.hadoop.hbase.mapreduce.TableSplit
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:197)
    at
    org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
    at
    org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:586)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)
    Caused by: java.lang.ClassNotFoundException:
    org.apache.hadoop.hbase.mapreduce.TableSplit
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at
    org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:907)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:185)
    ... 5 more

    Now, perhaps this issue is better suited for a hadoop / map reduce /
    cloudera mailing list, but every node in my hadoop cluster has
    /usr/local/hadoop/lib/hbase-0.20.3-1.cloudera.jar which includes the
    TableSplit class... so it seems to me that it should have no problem loading
    it.

    I've run out of ideas at this point - anyone have suggestions? Thanks!
    --
    Jameson Lopp
    Software Engineer
    Bronto Software, Inc.
  • Dmitriy Ryaboy at Mar 25, 2011 at 8:54 pm
    Pig 8 distribution or Pig 8 from svn?
    You want the latter (soon-to-be-Pig 0.8.1)

    D
    On Fri, Mar 25, 2011 at 1:02 PM, Jameson Lopp wrote:

    Alright, I set up hbase 0.90.1 and pig 0.8.0 and feel like everything is
    configured, but my pig script hangs after connecting to zookeeper... my map
    reduce job doesn't get scheduled and the process looks frozen. Some debug
    output:

    2011-03-25 15:51:07,344 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - Merged MR job 285 into MR job 282
    2011-03-25 15:51:07,344 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - Merged MR job 293 into MR job 282
    2011-03-25 15:51:07,344 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - Merged MR job 313 into MR job 282
    2011-03-25 15:51:07,345 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - Requested parallelism of splitter: -1
    2011-03-25 15:51:07,345 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - Merged 3 map-reduce splittees.
    2011-03-25 15:51:07,345 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - Merged 3 out of total 4 MR operators.
    2011-03-25 15:51:07,345 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size after optimization: 8
    2011-03-25 15:51:07,423 [main] INFO
    org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added
    to the job
    2011-03-25 15:51:07,434 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2011-03-25 15:51:11,014 [main] DEBUG org.apache.pig.impl.io.InterStorage -
    Pig Internal storage in use
    2011-03-25 15:51:11,014 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Setting up multi store job
    2011-03-25 15:51:11,021 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=0
    2011-03-25 15:51:11,022 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Neither PARALLEL nor default parallelism is set for this job. Setting
    number of reducers to 1
    2011-03-25 15:51:11,103 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 1 map-reduce job(s) waiting for submission.
    2011-03-25 15:51:11,504 [Thread-3] DEBUG
    org.apache.pig.impl.io.InterStorage - Pig Internal storage in use
    2011-03-25 15:51:11,611 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete

    [snipped] ...

    2011-03-25 15:47:08,617 [Thread-3-SendThread] INFO
    org.apache.zookeeper.ClientCnxn - Attempting connection to server
    10.202.61.184:2181
    2011-03-25 15:47:08,625 [Thread-3-SendThread] INFO
    org.apache.zookeeper.ClientCnxn - Priming connection to
    java.nio.channels.SocketChannel[connected local=/10.220.25.162:34767remote=
    10.202.61.184:2181]
    2011-03-25 15:47:08,627 [Thread-3-SendThread] INFO
    org.apache.zookeeper.ClientCnxn - Server connection successful

    I found a few threads about people having problems connecting to hbase
    through zookeeper due to misconfiguration / network issues but don't see any
    where it claims to connect successfully and then hangs... weird.

    --
    Jameson Lopp
    Software Engineer
    Bronto Software, Inc.
    On 03/25/2011 12:06 PM, Bill Graham wrote:

    The Pig trunk and Pig 0.8.0 branch both require HBase>= 0.89 (see
    PIG-1680). The Pig 0.8.0 release requires< 0.89 though so you should
    focus on that version of Pig. Or better yet, upgrade HBase to 0.90.1
    if possible.

    On Fri, Mar 25, 2011 at 6:59 AM, Jameson Loppwrote:
    Running Hbase 0.20-0.20.3-1.cloudera - I've tried running this with Pig
    0.8
    from August 2010 and from trunk on March 25 2011. Do I need to use an
    older
    version?

    My pig script is trying to load from hbase via this command:
    data = LOAD 'hbase://track' USING
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('open:browser open:ip
    open:os', '-caching 1000') as (browser:chararray, ipAddress:chararray,
    os:chararray);

    But the job fails trying to load the data:
    Input(s):
    Failed to read data from "hbase://track"

    When I look at my map reduce job, it fails every time with a
    ClassNotFoundException:
    java.io.IOException: java.lang.ClassNotFoundException:
    org.apache.hadoop.hbase.mapreduce.TableSplit
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:197)
    at

    org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
    at

    org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:586)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)
    Caused by: java.lang.ClassNotFoundException:
    org.apache.hadoop.hbase.mapreduce.TableSplit
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at

    org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:907)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:185)
    ... 5 more

    Now, perhaps this issue is better suited for a hadoop / map reduce /
    cloudera mailing list, but every node in my hadoop cluster has
    /usr/local/hadoop/lib/hbase-0.20.3-1.cloudera.jar which includes the
    TableSplit class... so it seems to me that it should have no problem
    loading
    it.

    I've run out of ideas at this point - anyone have suggestions? Thanks!
    --
    Jameson Lopp
    Software Engineer
    Bronto Software, Inc.
  • Jameson Lopp at Mar 29, 2011 at 1:35 pm
    Just to follow up: I'm running Pig 0.8 from SVN. I finally got it working though I'm not sure why
    this was required. I resolved the Class Not Found errors by manually registering the jars in my Pig
    script:

    REGISTER /path/to/pig_0.8/piggybank.jar;
    REGISTER /path/to/pig_0.8/lib/google-collections-1.0.jar;
    REGISTER /path/to/pig_0.8/lib/hbase-0.20.3-1.cloudera.jar;
    REGISTER /path/to/pig_0.8/lib/zookeeper-hbase-1329.jar

    We had these jars placed in the hadoop /lib directory on all of our hadoop machines, and thus
    figured that they would get loaded for the map reduce jobs. Apparently this is not the case...

    --
    Jameson Lopp
    Software Engineer
    Bronto Software, Inc.
    On 03/25/2011 04:53 PM, Dmitriy Ryaboy wrote:
    Pig 8 distribution or Pig 8 from svn?
    You want the latter (soon-to-be-Pig 0.8.1)

    D

    On Fri, Mar 25, 2011 at 1:02 PM, Jameson Loppwrote:
    Alright, I set up hbase 0.90.1 and pig 0.8.0 and feel like everything is
    configured, but my pig script hangs after connecting to zookeeper... my map
    reduce job doesn't get scheduled and the process looks frozen. Some debug
    output:

    2011-03-25 15:51:07,344 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - Merged MR job 285 into MR job 282
    2011-03-25 15:51:07,344 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - Merged MR job 293 into MR job 282
    2011-03-25 15:51:07,344 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - Merged MR job 313 into MR job 282
    2011-03-25 15:51:07,345 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - Requested parallelism of splitter: -1
    2011-03-25 15:51:07,345 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - Merged 3 map-reduce splittees.
    2011-03-25 15:51:07,345 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - Merged 3 out of total 4 MR operators.
    2011-03-25 15:51:07,345 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size after optimization: 8
    2011-03-25 15:51:07,423 [main] INFO
    org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added
    to the job
    2011-03-25 15:51:07,434 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2011-03-25 15:51:11,014 [main] DEBUG org.apache.pig.impl.io.InterStorage -
    Pig Internal storage in use
    2011-03-25 15:51:11,014 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Setting up multi store job
    2011-03-25 15:51:11,021 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=0
    2011-03-25 15:51:11,022 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Neither PARALLEL nor default parallelism is set for this job. Setting
    number of reducers to 1
    2011-03-25 15:51:11,103 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 1 map-reduce job(s) waiting for submission.
    2011-03-25 15:51:11,504 [Thread-3] DEBUG
    org.apache.pig.impl.io.InterStorage - Pig Internal storage in use
    2011-03-25 15:51:11,611 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete

    [snipped] ...

    2011-03-25 15:47:08,617 [Thread-3-SendThread] INFO
    org.apache.zookeeper.ClientCnxn - Attempting connection to server
    10.202.61.184:2181
    2011-03-25 15:47:08,625 [Thread-3-SendThread] INFO
    org.apache.zookeeper.ClientCnxn - Priming connection to
    java.nio.channels.SocketChannel[connected local=/10.220.25.162:34767remote=
    10.202.61.184:2181]
    2011-03-25 15:47:08,627 [Thread-3-SendThread] INFO
    org.apache.zookeeper.ClientCnxn - Server connection successful

    I found a few threads about people having problems connecting to hbase
    through zookeeper due to misconfiguration / network issues but don't see any
    where it claims to connect successfully and then hangs... weird.

    --
    Jameson Lopp
    Software Engineer
    Bronto Software, Inc.
    On 03/25/2011 12:06 PM, Bill Graham wrote:

    The Pig trunk and Pig 0.8.0 branch both require HBase>= 0.89 (see
    PIG-1680). The Pig 0.8.0 release requires< 0.89 though so you should
    focus on that version of Pig. Or better yet, upgrade HBase to 0.90.1
    if possible.

    On Fri, Mar 25, 2011 at 6:59 AM, Jameson Loppwrote:
    Running Hbase 0.20-0.20.3-1.cloudera - I've tried running this with Pig
    0.8
    from August 2010 and from trunk on March 25 2011. Do I need to use an
    older
    version?

    My pig script is trying to load from hbase via this command:
    data = LOAD 'hbase://track' USING
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('open:browser open:ip
    open:os', '-caching 1000') as (browser:chararray, ipAddress:chararray,
    os:chararray);

    But the job fails trying to load the data:
    Input(s):
    Failed to read data from "hbase://track"

    When I look at my map reduce job, it fails every time with a
    ClassNotFoundException:
    java.io.IOException: java.lang.ClassNotFoundException:
    org.apache.hadoop.hbase.mapreduce.TableSplit
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:197)
    at

    org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
    at

    org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:586)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)
    Caused by: java.lang.ClassNotFoundException:
    org.apache.hadoop.hbase.mapreduce.TableSplit
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at

    org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:907)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:185)
    ... 5 more

    Now, perhaps this issue is better suited for a hadoop / map reduce /
    cloudera mailing list, but every node in my hadoop cluster has
    /usr/local/hadoop/lib/hbase-0.20.3-1.cloudera.jar which includes the
    TableSplit class... so it seems to me that it should have no problem
    loading
    it.

    I've run out of ideas at this point - anyone have suggestions? Thanks!
    --
    Jameson Lopp
    Software Engineer
    Bronto Software, Inc.
  • Dmitriy Ryaboy at Mar 29, 2011 at 4:48 pm
    There's something odd about this jar list.
    You said you are running hbase 91, yet you register a cloudera hbase 20.3
    jar. You are also registering an ancient zookeeper jar. It doesn't sound
    like you are actually running either hbase 91, or Pig 8 from the tip of the
    svn branch.

    D
    On Tue, Mar 29, 2011 at 6:34 AM, Jameson Lopp wrote:

    Just to follow up: I'm running Pig 0.8 from SVN. I finally got it working
    though I'm not sure why this was required. I resolved the Class Not Found
    errors by manually registering the jars in my Pig script:

    REGISTER /path/to/pig_0.8/piggybank.jar;
    REGISTER /path/to/pig_0.8/lib/google-collections-1.0.jar;
    REGISTER /path/to/pig_0.8/lib/hbase-0.20.3-1.cloudera.jar;
    REGISTER /path/to/pig_0.8/lib/zookeeper-hbase-1329.jar

    We had these jars placed in the hadoop /lib directory on all of our hadoop
    machines, and thus figured that they would get loaded for the map reduce
    jobs. Apparently this is not the case...


    --
    Jameson Lopp
    Software Engineer
    Bronto Software, Inc.
    On 03/25/2011 04:53 PM, Dmitriy Ryaboy wrote:

    Pig 8 distribution or Pig 8 from svn?
    You want the latter (soon-to-be-Pig 0.8.1)

    D

    On Fri, Mar 25, 2011 at 1:02 PM, Jameson Loppwrote:

    Alright, I set up hbase 0.90.1 and pig 0.8.0 and feel like everything is
    configured, but my pig script hangs after connecting to zookeeper... my
    map
    reduce job doesn't get scheduled and the process looks frozen. Some debug
    output:

    2011-03-25 15:51:07,344 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - Merged MR job 285 into MR job 282
    2011-03-25 15:51:07,344 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - Merged MR job 293 into MR job 282
    2011-03-25 15:51:07,344 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - Merged MR job 313 into MR job 282
    2011-03-25 15:51:07,345 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - Requested parallelism of splitter: -1
    2011-03-25 15:51:07,345 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - Merged 3 map-reduce splittees.
    2011-03-25 15:51:07,345 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - Merged 3 out of total 4 MR operators.
    2011-03-25 15:51:07,345 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size after optimization: 8
    2011-03-25 15:51:07,423 [main] INFO
    org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
    added
    to the job
    2011-03-25 15:51:07,434 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - mapred.job.reduce.markreset.buffer.percent is not set, set to default
    0.3
    2011-03-25 15:51:11,014 [main] DEBUG org.apache.pig.impl.io.InterStorage
    -
    Pig Internal storage in use
    2011-03-25 15:51:11,014 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Setting up multi store job
    2011-03-25 15:51:11,021 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=0
    2011-03-25 15:51:11,022 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Neither PARALLEL nor default parallelism is set for this job. Setting
    number of reducers to 1
    2011-03-25 15:51:11,103 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 1 map-reduce job(s) waiting for submission.
    2011-03-25 15:51:11,504 [Thread-3] DEBUG
    org.apache.pig.impl.io.InterStorage - Pig Internal storage in use
    2011-03-25 15:51:11,611 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete

    [snipped] ...

    2011-03-25 15:47:08,617 [Thread-3-SendThread] INFO
    org.apache.zookeeper.ClientCnxn - Attempting connection to server
    10.202.61.184:2181
    2011-03-25 15:47:08,625 [Thread-3-SendThread] INFO
    org.apache.zookeeper.ClientCnxn - Priming connection to
    java.nio.channels.SocketChannel[connected local=/10.220.25.162:34767
    remote=
    10.202.61.184:2181]
    2011-03-25 15:47:08,627 [Thread-3-SendThread] INFO
    org.apache.zookeeper.ClientCnxn - Server connection successful

    I found a few threads about people having problems connecting to hbase
    through zookeeper due to misconfiguration / network issues but don't see
    any
    where it claims to connect successfully and then hangs... weird.

    --
    Jameson Lopp
    Software Engineer
    Bronto Software, Inc.

    On 03/25/2011 12:06 PM, Bill Graham wrote:

    The Pig trunk and Pig 0.8.0 branch both require HBase>= 0.89 (see
    PIG-1680). The Pig 0.8.0 release requires< 0.89 though so you should
    focus on that version of Pig. Or better yet, upgrade HBase to 0.90.1
    if possible.

    On Fri, Mar 25, 2011 at 6:59 AM, Jameson Lopp<jameson@bronto.com>
    wrote:

    Running Hbase 0.20-0.20.3-1.cloudera - I've tried running this with Pig
    0.8
    from August 2010 and from trunk on March 25 2011. Do I need to use an
    older
    version?

    My pig script is trying to load from hbase via this command:
    data = LOAD 'hbase://track' USING
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('open:browser open:ip
    open:os', '-caching 1000') as (browser:chararray, ipAddress:chararray,
    os:chararray);

    But the job fails trying to load the data:
    Input(s):
    Failed to read data from "hbase://track"

    When I look at my map reduce job, it fails every time with a
    ClassNotFoundException:
    java.io.IOException: java.lang.ClassNotFoundException:
    org.apache.hadoop.hbase.mapreduce.TableSplit
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:197)
    at


    org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
    at


    org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
    at
    org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:586)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)
    Caused by: java.lang.ClassNotFoundException:
    org.apache.hadoop.hbase.mapreduce.TableSplit
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at


    org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:907)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:185)
    ... 5 more

    Now, perhaps this issue is better suited for a hadoop / map reduce /
    cloudera mailing list, but every node in my hadoop cluster has
    /usr/local/hadoop/lib/hbase-0.20.3-1.cloudera.jar which includes the
    TableSplit class... so it seems to me that it should have no problem
    loading
    it.

    I've run out of ideas at this point - anyone have suggestions? Thanks!
    --
    Jameson Lopp
    Software Engineer
    Bronto Software, Inc.

  • Jameson Lopp at Mar 29, 2011 at 7:09 pm
    You're correct - I didn't mention that we have several environments. Running hbase 0.20 in
    production and upgraded to 0.91 in development, but they ended up rolling back the upgrade due to
    other issues. My point is that it looks like the class not found errors were unrelated to version
    incompatibilities - once I register the appropriate jars in my pig script, the MR jobs run.
    On 03/29/2011 12:47 PM, Dmitriy Ryaboy wrote:
    There's something odd about this jar list.
    You said you are running hbase 91, yet you register a cloudera hbase 20.3
    jar. You are also registering an ancient zookeeper jar. It doesn't sound
    like you are actually running either hbase 91, or Pig 8 from the tip of the
    svn branch.

    D

    On Tue, Mar 29, 2011 at 6:34 AM, Jameson Loppwrote:
    Just to follow up: I'm running Pig 0.8 from SVN. I finally got it working
    though I'm not sure why this was required. I resolved the Class Not Found
    errors by manually registering the jars in my Pig script:

    REGISTER /path/to/pig_0.8/piggybank.jar;
    REGISTER /path/to/pig_0.8/lib/google-collections-1.0.jar;
    REGISTER /path/to/pig_0.8/lib/hbase-0.20.3-1.cloudera.jar;
    REGISTER /path/to/pig_0.8/lib/zookeeper-hbase-1329.jar

    We had these jars placed in the hadoop /lib directory on all of our hadoop
    machines, and thus figured that they would get loaded for the map reduce
    jobs. Apparently this is not the case...


    --
    Jameson Lopp
    Software Engineer
    Bronto Software, Inc.
    On 03/25/2011 04:53 PM, Dmitriy Ryaboy wrote:

    Pig 8 distribution or Pig 8 from svn?
    You want the latter (soon-to-be-Pig 0.8.1)

    D

    On Fri, Mar 25, 2011 at 1:02 PM, Jameson Loppwrote:

    Alright, I set up hbase 0.90.1 and pig 0.8.0 and feel like everything is
    configured, but my pig script hangs after connecting to zookeeper... my
    map
    reduce job doesn't get scheduled and the process looks frozen. Some debug
    output:

    2011-03-25 15:51:07,344 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - Merged MR job 285 into MR job 282
    2011-03-25 15:51:07,344 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - Merged MR job 293 into MR job 282
    2011-03-25 15:51:07,344 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - Merged MR job 313 into MR job 282
    2011-03-25 15:51:07,345 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - Requested parallelism of splitter: -1
    2011-03-25 15:51:07,345 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - Merged 3 map-reduce splittees.
    2011-03-25 15:51:07,345 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - Merged 3 out of total 4 MR operators.
    2011-03-25 15:51:07,345 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size after optimization: 8
    2011-03-25 15:51:07,423 [main] INFO
    org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
    added
    to the job
    2011-03-25 15:51:07,434 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - mapred.job.reduce.markreset.buffer.percent is not set, set to default
    0.3
    2011-03-25 15:51:11,014 [main] DEBUG org.apache.pig.impl.io.InterStorage
    -
    Pig Internal storage in use
    2011-03-25 15:51:11,014 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Setting up multi store job
    2011-03-25 15:51:11,021 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=0
    2011-03-25 15:51:11,022 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Neither PARALLEL nor default parallelism is set for this job. Setting
    number of reducers to 1
    2011-03-25 15:51:11,103 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 1 map-reduce job(s) waiting for submission.
    2011-03-25 15:51:11,504 [Thread-3] DEBUG
    org.apache.pig.impl.io.InterStorage - Pig Internal storage in use
    2011-03-25 15:51:11,611 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete

    [snipped] ...

    2011-03-25 15:47:08,617 [Thread-3-SendThread] INFO
    org.apache.zookeeper.ClientCnxn - Attempting connection to server
    10.202.61.184:2181
    2011-03-25 15:47:08,625 [Thread-3-SendThread] INFO
    org.apache.zookeeper.ClientCnxn - Priming connection to
    java.nio.channels.SocketChannel[connected local=/10.220.25.162:34767
    remote=
    10.202.61.184:2181]
    2011-03-25 15:47:08,627 [Thread-3-SendThread] INFO
    org.apache.zookeeper.ClientCnxn - Server connection successful

    I found a few threads about people having problems connecting to hbase
    through zookeeper due to misconfiguration / network issues but don't see
    any
    where it claims to connect successfully and then hangs... weird.

    --
    Jameson Lopp
    Software Engineer
    Bronto Software, Inc.

    On 03/25/2011 12:06 PM, Bill Graham wrote:

    The Pig trunk and Pig 0.8.0 branch both require HBase>= 0.89 (see
    PIG-1680). The Pig 0.8.0 release requires< 0.89 though so you should
    focus on that version of Pig. Or better yet, upgrade HBase to 0.90.1
    if possible.

    On Fri, Mar 25, 2011 at 6:59 AM, Jameson Lopp<jameson@bronto.com>
    wrote:

    Running Hbase 0.20-0.20.3-1.cloudera - I've tried running this with Pig
    0.8
    from August 2010 and from trunk on March 25 2011. Do I need to use an
    older
    version?

    My pig script is trying to load from hbase via this command:
    data = LOAD 'hbase://track' USING
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('open:browser open:ip
    open:os', '-caching 1000') as (browser:chararray, ipAddress:chararray,
    os:chararray);

    But the job fails trying to load the data:
    Input(s):
    Failed to read data from "hbase://track"

    When I look at my map reduce job, it fails every time with a
    ClassNotFoundException:
    java.io.IOException: java.lang.ClassNotFoundException:
    org.apache.hadoop.hbase.mapreduce.TableSplit
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:197)
    at


    org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
    at


    org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
    at
    org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:586)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)
    Caused by: java.lang.ClassNotFoundException:
    org.apache.hadoop.hbase.mapreduce.TableSplit
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at


    org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:907)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:185)
    ... 5 more

    Now, perhaps this issue is better suited for a hadoop / map reduce /
    cloudera mailing list, but every node in my hadoop cluster has
    /usr/local/hadoop/lib/hbase-0.20.3-1.cloudera.jar which includes the
    TableSplit class... so it seems to me that it should have no problem
    loading
    it.

    I've run out of ideas at this point - anyone have suggestions? Thanks!
    --
    Jameson Lopp
    Software Engineer
    Bronto Software, Inc.

  • Dmitriy Ryaboy at Mar 30, 2011 at 7:46 am
    Ah, ok. The reason I was surprised is that if you are using 91 and latest
    0.8, the hbaseStorage code in Pig is supposed to auto-register the hbase,
    zookeeper, and google-collections jars, so you won't have to do that.

    fwiw, 91 has been MUCH more stable for us than any of the 20 releases. The
    upgrade is worth it.


    D
    On Tue, Mar 29, 2011 at 12:08 PM, Jameson Lopp wrote:

    You're correct - I didn't mention that we have several environments.
    Running hbase 0.20 in production and upgraded to 0.91 in development, but
    they ended up rolling back the upgrade due to other issues. My point is that
    it looks like the class not found errors were unrelated to version
    incompatibilities - once I register the appropriate jars in my pig script,
    the MR jobs run.

    On 03/29/2011 12:47 PM, Dmitriy Ryaboy wrote:

    There's something odd about this jar list.
    You said you are running hbase 91, yet you register a cloudera hbase 20.3
    jar. You are also registering an ancient zookeeper jar. It doesn't sound
    like you are actually running either hbase 91, or Pig 8 from the tip of
    the
    svn branch.

    D

    On Tue, Mar 29, 2011 at 6:34 AM, Jameson Loppwrote:

    Just to follow up: I'm running Pig 0.8 from SVN. I finally got it working
    though I'm not sure why this was required. I resolved the Class Not Found
    errors by manually registering the jars in my Pig script:

    REGISTER /path/to/pig_0.8/piggybank.jar;
    REGISTER /path/to/pig_0.8/lib/google-collections-1.0.jar;
    REGISTER /path/to/pig_0.8/lib/hbase-0.20.3-1.cloudera.jar;
    REGISTER /path/to/pig_0.8/lib/zookeeper-hbase-1329.jar

    We had these jars placed in the hadoop /lib directory on all of our
    hadoop
    machines, and thus figured that they would get loaded for the map reduce
    jobs. Apparently this is not the case...


    --
    Jameson Lopp
    Software Engineer
    Bronto Software, Inc.

    On 03/25/2011 04:53 PM, Dmitriy Ryaboy wrote:

    Pig 8 distribution or Pig 8 from svn?
    You want the latter (soon-to-be-Pig 0.8.1)

    D

    On Fri, Mar 25, 2011 at 1:02 PM, Jameson Lopp<jameson@bronto.com>
    wrote:

    Alright, I set up hbase 0.90.1 and pig 0.8.0 and feel like everything
    is
    configured, but my pig script hangs after connecting to zookeeper... my
    map
    reduce job doesn't get scheduled and the process looks frozen. Some
    debug
    output:

    2011-03-25 15:51:07,344 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - Merged MR job 285 into MR job 282
    2011-03-25 15:51:07,344 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - Merged MR job 293 into MR job 282
    2011-03-25 15:51:07,344 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - Merged MR job 313 into MR job 282
    2011-03-25 15:51:07,345 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - Requested parallelism of splitter: -1
    2011-03-25 15:51:07,345 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - Merged 3 map-reduce splittees.
    2011-03-25 15:51:07,345 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - Merged 3 out of total 4 MR operators.
    2011-03-25 15:51:07,345 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size after optimization: 8
    2011-03-25 15:51:07,423 [main] INFO
    org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
    added
    to the job
    2011-03-25 15:51:07,434 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - mapred.job.reduce.markreset.buffer.percent is not set, set to default
    0.3
    2011-03-25 15:51:11,014 [main] DEBUG
    org.apache.pig.impl.io.InterStorage
    -
    Pig Internal storage in use
    2011-03-25 15:51:11,014 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Setting up multi store job
    2011-03-25 15:51:11,021 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=0
    2011-03-25 15:51:11,022 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Neither PARALLEL nor default parallelism is set for this job. Setting
    number of reducers to 1
    2011-03-25 15:51:11,103 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 1 map-reduce job(s) waiting for submission.
    2011-03-25 15:51:11,504 [Thread-3] DEBUG
    org.apache.pig.impl.io.InterStorage - Pig Internal storage in use
    2011-03-25 15:51:11,611 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete

    [snipped] ...

    2011-03-25 15:47:08,617 [Thread-3-SendThread] INFO
    org.apache.zookeeper.ClientCnxn - Attempting connection to server
    10.202.61.184:2181
    2011-03-25 15:47:08,625 [Thread-3-SendThread] INFO
    org.apache.zookeeper.ClientCnxn - Priming connection to
    java.nio.channels.SocketChannel[connected local=/10.220.25.162:34767
    remote=
    10.202.61.184:2181]
    2011-03-25 15:47:08,627 [Thread-3-SendThread] INFO
    org.apache.zookeeper.ClientCnxn - Server connection successful

    I found a few threads about people having problems connecting to hbase
    through zookeeper due to misconfiguration / network issues but don't
    see
    any
    where it claims to connect successfully and then hangs... weird.

    --
    Jameson Lopp
    Software Engineer
    Bronto Software, Inc.

    On 03/25/2011 12:06 PM, Bill Graham wrote:

    The Pig trunk and Pig 0.8.0 branch both require HBase>= 0.89 (see
    PIG-1680). The Pig 0.8.0 release requires< 0.89 though so you
    should
    focus on that version of Pig. Or better yet, upgrade HBase to 0.90.1
    if possible.

    On Fri, Mar 25, 2011 at 6:59 AM, Jameson Lopp<jameson@bronto.com>
    wrote:

    Running Hbase 0.20-0.20.3-1.cloudera - I've tried running this with
    Pig
    0.8
    from August 2010 and from trunk on March 25 2011. Do I need to use an
    older
    version?

    My pig script is trying to load from hbase via this command:
    data = LOAD 'hbase://track' USING
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('open:browser
    open:ip
    open:os', '-caching 1000') as (browser:chararray,
    ipAddress:chararray,
    os:chararray);

    But the job fails trying to load the data:
    Input(s):
    Failed to read data from "hbase://track"

    When I look at my map reduce job, it fails every time with a
    ClassNotFoundException:
    java.io.IOException: java.lang.ClassNotFoundException:
    org.apache.hadoop.hbase.mapreduce.TableSplit
    at



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:197)
    at



    org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
    at



    org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
    at
    org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:586)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)
    Caused by: java.lang.ClassNotFoundException:
    org.apache.hadoop.hbase.mapreduce.TableSplit
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
    at
    sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at



    org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:907)
    at



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:185)
    ... 5 more

    Now, perhaps this issue is better suited for a hadoop / map reduce /
    cloudera mailing list, but every node in my hadoop cluster has
    /usr/local/hadoop/lib/hbase-0.20.3-1.cloudera.jar which includes the
    TableSplit class... so it seems to me that it should have no problem
    loading
    it.

    I've run out of ideas at this point - anyone have suggestions?
    Thanks!
    --
    Jameson Lopp
    Software Engineer
    Bronto Software, Inc.


Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedMar 25, '11 at 2:00p
activeMar 30, '11 at 7:46a
posts8
users3
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase