Grokbase Groups Pig user April 2011
FAQ
Hello,

I have installed Pig 0.8.0 and Cassandra 0.7.4 and I'm not able to read data from cassandra. I write a simple query just to test:

grunt> A = LOAD 'cassandra://msg_keyspace/messages' USING org.apache.cassandra.hadoop.pig.CassandraStorage();
grunt> dump A;


And i'm getting the following error:
==========================================================================
2011-04-05 15:33:57,669 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2011-04-05 15:33:57,669 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used.
2011-04-05 15:33:57,819 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: A: Store(hdfs://localhost/tmp/temp2037710644/tmp-29784200:org.apache.pig.impl.io.InterStorage) - scope-1 Operator Key: scope-1)
2011-04-05 15:33:57,850 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2011-04-05 15:33:57,877 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2011-04-05 15:33:57,877 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2011-04-05 15:33:57,969 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2011-04-05 15:33:57,990 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2011-04-05 15:34:03,376 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2011-04-05 15:34:03,416 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2011-04-05 15:34:03,929 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2011-04-05 15:34:04,597 [Thread-5] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2011-04-05 15:34:05,942 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201104051459_0008
2011-04-05 15:34:05,943 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201104051459_0008
2011-04-05 15:34:35,912 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201104051459_0008 has failed! Stop running all dependent jobs
2011-04-05 15:34:35,918 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2011-04-05 15:34:35,933 [main] INFO org.apache.pig.tools.pigstats.PigStats - Script Statistics:

HadoopVersion PigVersion UserId StartedAt FinishedAt Features
0.20.2-CDH3B4 0.8.0-SNAPSHOT root 2011-04-05 15:33:57 2011-04-05 15:34:35 UNKNOWN

Failed!

Failed Jobs:
JobId Alias Feature Message Outputs
job_201104051459_0008 A MAP_ONLY Message: Job failed! Error - NA hdfs://localhost/tmp/temp2037710644/tmp-29784200,

Input(s):
Failed to read data from "cassandra://msg_keyspace/messages"

Output(s):
Failed to produce result in "hdfs://localhost/tmp/temp2037710644/tmp-29784200"
==========================================================================

Any idea how to fix this?
Cheers

Search Discussions

  • Jeremy Hanna at Apr 5, 2011 at 2:20 pm
    Fabio,

    Could you post the full stack trace that's found in the pig_<long number>.log that's in the directory that you ran pig?

    Thanks,

    Jeremy
    On Apr 5, 2011, at 8:42 AM, Fabio Souto wrote:

    Hello,

    I have installed Pig 0.8.0 and Cassandra 0.7.4 and I'm not able to read data from cassandra. I write a simple query just to test:

    grunt> A = LOAD 'cassandra://msg_keyspace/messages' USING org.apache.cassandra.hadoop.pig.CassandraStorage();
    grunt> dump A;


    And i'm getting the following error:
    ==========================================================================
    2011-04-05 15:33:57,669 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
    2011-04-05 15:33:57,669 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used.
    2011-04-05 15:33:57,819 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: A: Store(hdfs://localhost/tmp/temp2037710644/tmp-29784200:org.apache.pig.impl.io.InterStorage) - scope-1 Operator Key: scope-1)
    2011-04-05 15:33:57,850 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
    2011-04-05 15:33:57,877 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
    2011-04-05 15:33:57,877 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
    2011-04-05 15:33:57,969 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
    2011-04-05 15:33:57,990 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2011-04-05 15:34:03,376 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
    2011-04-05 15:34:03,416 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
    2011-04-05 15:34:03,929 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
    2011-04-05 15:34:04,597 [Thread-5] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
    2011-04-05 15:34:05,942 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201104051459_0008
    2011-04-05 15:34:05,943 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201104051459_0008
    2011-04-05 15:34:35,912 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201104051459_0008 has failed! Stop running all dependent jobs
    2011-04-05 15:34:35,918 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
    2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
    2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
    2011-04-05 15:34:35,933 [main] INFO org.apache.pig.tools.pigstats.PigStats - Script Statistics:

    HadoopVersion PigVersion UserId StartedAt FinishedAt Features
    0.20.2-CDH3B4 0.8.0-SNAPSHOT root 2011-04-05 15:33:57 2011-04-05 15:34:35 UNKNOWN

    Failed!

    Failed Jobs:
    JobId Alias Feature Message Outputs
    job_201104051459_0008 A MAP_ONLY Message: Job failed! Error - NA hdfs://localhost/tmp/temp2037710644/tmp-29784200,

    Input(s):
    Failed to read data from "cassandra://msg_keyspace/messages"

    Output(s):
    Failed to produce result in "hdfs://localhost/tmp/temp2037710644/tmp-29784200"
    ==========================================================================

    Any idea how to fix this?
    Cheers
  • Fabio Souto at Apr 5, 2011 at 2:38 pm
    Hi Jeremy,

    Of course, here it is:

    Backend error message
    ---------------------
    java.lang.NumberFormatException: null
    at java.lang.Integer.parseInt(Integer.java:417)
    at java.lang.Integer.parseInt(Integer.java:499)
    at org.apache.cassandra.hadoop.ConfigHelper.getRpcPort(ConfigHelper.java:233)
    at org.apache.cassandra.hadoop.pig.CassandraStorage.setConnectionInformation(Unknown Source)
    at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown Source)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:133)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:111)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at org.apache.hadoop.mapred.Child.main(Child.java:234)

    Pig Stack Trace
    ---------------
    ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null

    org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias A. Backend error : Unable to recreate exception from backed error: java.lang.NumberFormatException: null
    at org.apache.pig.PigServer.openIterator(PigServer.java:742)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
    at org.apache.pig.Main.run(Main.java:465)
    at org.apache.pig.Main.main(Main.java:107)
    Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:221)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:151)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:337)
    at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
    at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
    at org.apache.pig.PigServer.storeEx(PigServer.java:874)
    at org.apache.pig.PigServer.store(PigServer.java:816)
    at org.apache.pig.PigServer.openIterator(PigServer.java:728)
    ... 7 more
    ================================================================================


    Thanks for all,
    Fabio

    On 05/04/2011, at 16:19, Jeremy Hanna wrote:

    Fabio,

    Could you post the full stack trace that's found in the pig_<long number>.log that's in the directory that you ran pig?

    Thanks,

    Jeremy
    On Apr 5, 2011, at 8:42 AM, Fabio Souto wrote:

    Hello,

    I have installed Pig 0.8.0 and Cassandra 0.7.4 and I'm not able to read data from cassandra. I write a simple query just to test:

    grunt> A = LOAD 'cassandra://msg_keyspace/messages' USING org.apache.cassandra.hadoop.pig.CassandraStorage();
    grunt> dump A;


    And i'm getting the following error:
    ==========================================================================
    2011-04-05 15:33:57,669 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
    2011-04-05 15:33:57,669 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used.
    2011-04-05 15:33:57,819 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: A: Store(hdfs://localhost/tmp/temp2037710644/tmp-29784200:org.apache.pig.impl.io.InterStorage) - scope-1 Operator Key: scope-1)
    2011-04-05 15:33:57,850 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
    2011-04-05 15:33:57,877 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
    2011-04-05 15:33:57,877 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
    2011-04-05 15:33:57,969 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
    2011-04-05 15:33:57,990 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2011-04-05 15:34:03,376 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
    2011-04-05 15:34:03,416 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
    2011-04-05 15:34:03,929 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
    2011-04-05 15:34:04,597 [Thread-5] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
    2011-04-05 15:34:05,942 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201104051459_0008
    2011-04-05 15:34:05,943 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201104051459_0008
    2011-04-05 15:34:35,912 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201104051459_0008 has failed! Stop running all dependent jobs
    2011-04-05 15:34:35,918 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
    2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
    2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
    2011-04-05 15:34:35,933 [main] INFO org.apache.pig.tools.pigstats.PigStats - Script Statistics:

    HadoopVersion PigVersion UserId StartedAt FinishedAt Features
    0.20.2-CDH3B4 0.8.0-SNAPSHOT root 2011-04-05 15:33:57 2011-04-05 15:34:35 UNKNOWN

    Failed!

    Failed Jobs:
    JobId Alias Feature Message Outputs
    job_201104051459_0008 A MAP_ONLY Message: Job failed! Error - NA hdfs://localhost/tmp/temp2037710644/tmp-29784200,

    Input(s):
    Failed to read data from "cassandra://msg_keyspace/messages"

    Output(s):
    Failed to produce result in "hdfs://localhost/tmp/temp2037710644/tmp-29784200"
    ==========================================================================

    Any idea how to fix this?
    Cheers
  • Jeremy Hanna at Apr 5, 2011 at 3:04 pm
    Fabio,

    It looks like you need to set your environment variables to connect to cassandra. Check out the readme. Quoting here:
    Finally, set the following as environment variables (uppercase,
    underscored), or as Hadoop configuration variables (lowercase, dotted):
    * PIG_RPC_PORT or cassandra.thrift.port : the port thrift is listening on
    * PIG_INITIAL_ADDRESS or cassandra.thrift.address : initial address to connect to
    * PIG_PARTITIONER or cassandra.partitioner.class : cluster partitioner

    So you'll probably want to do:
    export PIG_INITIAL_ADDRESS=localhost
    export PIG_RPC_PORT=9160
    export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner

    Tante belle cose and let me know if this doesn't work,

    Jeremy
    On Apr 5, 2011, at 9:38 AM, Fabio Souto wrote:

    Hi Jeremy,

    Of course, here it is:

    Backend error message
    ---------------------
    java.lang.NumberFormatException: null
    at java.lang.Integer.parseInt(Integer.java:417)
    at java.lang.Integer.parseInt(Integer.java:499)
    at org.apache.cassandra.hadoop.ConfigHelper.getRpcPort(ConfigHelper.java:233)
    at org.apache.cassandra.hadoop.pig.CassandraStorage.setConnectionInformation(Unknown Source)
    at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown Source)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:133)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:111)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at org.apache.hadoop.mapred.Child.main(Child.java:234)

    Pig Stack Trace
    ---------------
    ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null

    org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias A. Backend error : Unable to recreate exception from backed error: java.lang.NumberFormatException: null
    at org.apache.pig.PigServer.openIterator(PigServer.java:742)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
    at org.apache.pig.Main.run(Main.java:465)
    at org.apache.pig.Main.main(Main.java:107)
    Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:221)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:151)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:337)
    at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
    at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
    at org.apache.pig.PigServer.storeEx(PigServer.java:874)
    at org.apache.pig.PigServer.store(PigServer.java:816)
    at org.apache.pig.PigServer.openIterator(PigServer.java:728)
    ... 7 more
    ================================================================================


    Thanks for all,
    Fabio

    On 05/04/2011, at 16:19, Jeremy Hanna wrote:

    Fabio,

    Could you post the full stack trace that's found in the pig_<long number>.log that's in the directory that you ran pig?

    Thanks,

    Jeremy
    On Apr 5, 2011, at 8:42 AM, Fabio Souto wrote:

    Hello,

    I have installed Pig 0.8.0 and Cassandra 0.7.4 and I'm not able to read data from cassandra. I write a simple query just to test:

    grunt> A = LOAD 'cassandra://msg_keyspace/messages' USING org.apache.cassandra.hadoop.pig.CassandraStorage();
    grunt> dump A;


    And i'm getting the following error:
    ==========================================================================
    2011-04-05 15:33:57,669 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
    2011-04-05 15:33:57,669 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used.
    2011-04-05 15:33:57,819 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: A: Store(hdfs://localhost/tmp/temp2037710644/tmp-29784200:org.apache.pig.impl.io.InterStorage) - scope-1 Operator Key: scope-1)
    2011-04-05 15:33:57,850 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
    2011-04-05 15:33:57,877 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
    2011-04-05 15:33:57,877 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
    2011-04-05 15:33:57,969 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
    2011-04-05 15:33:57,990 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2011-04-05 15:34:03,376 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
    2011-04-05 15:34:03,416 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
    2011-04-05 15:34:03,929 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
    2011-04-05 15:34:04,597 [Thread-5] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
    2011-04-05 15:34:05,942 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201104051459_0008
    2011-04-05 15:34:05,943 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201104051459_0008
    2011-04-05 15:34:35,912 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201104051459_0008 has failed! Stop running all dependent jobs
    2011-04-05 15:34:35,918 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
    2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
    2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
    2011-04-05 15:34:35,933 [main] INFO org.apache.pig.tools.pigstats.PigStats - Script Statistics:

    HadoopVersion PigVersion UserId StartedAt FinishedAt Features
    0.20.2-CDH3B4 0.8.0-SNAPSHOT root 2011-04-05 15:33:57 2011-04-05 15:34:35 UNKNOWN

    Failed!

    Failed Jobs:
    JobId Alias Feature Message Outputs
    job_201104051459_0008 A MAP_ONLY Message: Job failed! Error - NA hdfs://localhost/tmp/temp2037710644/tmp-29784200,

    Input(s):
    Failed to read data from "cassandra://msg_keyspace/messages"

    Output(s):
    Failed to produce result in "hdfs://localhost/tmp/temp2037710644/tmp-29784200"
    ==========================================================================

    Any idea how to fix this?
    Cheers
  • Fabio Souto at Apr 5, 2011 at 3:30 pm
    Hi,

    I had a bad enviroment variable
    PIG_PARTITIONER=RandomPartitioner
    instead of
    PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
    but I correct this and still not working. I have the same error

    Just in case I have this on my ~/.bash_profile

    export HADOOPDIR=/etc/hadoop-0.20/conf
    export HADOOP_CLASSPATH=/usr/cassandra/lib/*:$HADOOP_CLASSPATH
    export CLASSPATH=$HADOOPDIR:$CLASSPATH

    export PIG_CONF_DIR=$HADOOPDIR
    export PIG_CLASSPATH=/etc/hadoop/conf
    export PIG_CONF_DIR=$HADOOPDIR

    export PIG_INITIAL_ADDRESS=localhost
    export PIG_RPC_PORT=9160
    export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner


    BTW I'm using the pig version that comes with Cassandra, the one in cassandra/contrib/pig

    Thanks for your time Jeremy! :)
    Fabio
    On 05/04/2011, at 17:04, Jeremy Hanna wrote:

    Fabio,

    It looks like you need to set your environment variables to connect to cassandra. Check out the readme. Quoting here:
    Finally, set the following as environment variables (uppercase,
    underscored), or as Hadoop configuration variables (lowercase, dotted):
    * PIG_RPC_PORT or cassandra.thrift.port : the port thrift is listening on
    * PIG_INITIAL_ADDRESS or cassandra.thrift.address : initial address to connect to
    * PIG_PARTITIONER or cassandra.partitioner.class : cluster partitioner

    So you'll probably want to do:
    export PIG_INITIAL_ADDRESS=localhost
    export PIG_RPC_PORT=9160
    export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner

    Tante belle cose and let me know if this doesn't work,

    Jeremy
    On Apr 5, 2011, at 9:38 AM, Fabio Souto wrote:

    Hi Jeremy,

    Of course, here it is:

    Backend error message
    ---------------------
    java.lang.NumberFormatException: null
    at java.lang.Integer.parseInt(Integer.java:417)
    at java.lang.Integer.parseInt(Integer.java:499)
    at org.apache.cassandra.hadoop.ConfigHelper.getRpcPort(ConfigHelper.java:233)
    at org.apache.cassandra.hadoop.pig.CassandraStorage.setConnectionInformation(Unknown Source)
    at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown Source)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:133)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:111)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at org.apache.hadoop.mapred.Child.main(Child.java:234)

    Pig Stack Trace
    ---------------
    ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null

    org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias A. Backend error : Unable to recreate exception from backed error: java.lang.NumberFormatException: null
    at org.apache.pig.PigServer.openIterator(PigServer.java:742)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
    at org.apache.pig.Main.run(Main.java:465)
    at org.apache.pig.Main.main(Main.java:107)
    Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:221)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:151)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:337)
    at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
    at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
    at org.apache.pig.PigServer.storeEx(PigServer.java:874)
    at org.apache.pig.PigServer.store(PigServer.java:816)
    at org.apache.pig.PigServer.openIterator(PigServer.java:728)
    ... 7 more
    ================================================================================


    Thanks for all,
    Fabio

    On 05/04/2011, at 16:19, Jeremy Hanna wrote:

    Fabio,

    Could you post the full stack trace that's found in the pig_<long number>.log that's in the directory that you ran pig?

    Thanks,

    Jeremy
    On Apr 5, 2011, at 8:42 AM, Fabio Souto wrote:

    Hello,

    I have installed Pig 0.8.0 and Cassandra 0.7.4 and I'm not able to read data from cassandra. I write a simple query just to test:

    grunt> A = LOAD 'cassandra://msg_keyspace/messages' USING org.apache.cassandra.hadoop.pig.CassandraStorage();
    grunt> dump A;


    And i'm getting the following error:
    ==========================================================================
    2011-04-05 15:33:57,669 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
    2011-04-05 15:33:57,669 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used.
    2011-04-05 15:33:57,819 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: A: Store(hdfs://localhost/tmp/temp2037710644/tmp-29784200:org.apache.pig.impl.io.InterStorage) - scope-1 Operator Key: scope-1)
    2011-04-05 15:33:57,850 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
    2011-04-05 15:33:57,877 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
    2011-04-05 15:33:57,877 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
    2011-04-05 15:33:57,969 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
    2011-04-05 15:33:57,990 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2011-04-05 15:34:03,376 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
    2011-04-05 15:34:03,416 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
    2011-04-05 15:34:03,929 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
    2011-04-05 15:34:04,597 [Thread-5] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
    2011-04-05 15:34:05,942 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201104051459_0008
    2011-04-05 15:34:05,943 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201104051459_0008
    2011-04-05 15:34:35,912 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201104051459_0008 has failed! Stop running all dependent jobs
    2011-04-05 15:34:35,918 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
    2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
    2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
    2011-04-05 15:34:35,933 [main] INFO org.apache.pig.tools.pigstats.PigStats - Script Statistics:

    HadoopVersion PigVersion UserId StartedAt FinishedAt Features
    0.20.2-CDH3B4 0.8.0-SNAPSHOT root 2011-04-05 15:33:57 2011-04-05 15:34:35 UNKNOWN

    Failed!

    Failed Jobs:
    JobId Alias Feature Message Outputs
    job_201104051459_0008 A MAP_ONLY Message: Job failed! Error - NA hdfs://localhost/tmp/temp2037710644/tmp-29784200,

    Input(s):
    Failed to read data from "cassandra://msg_keyspace/messages"

    Output(s):
    Failed to produce result in "hdfs://localhost/tmp/temp2037710644/tmp-29784200"
    ==========================================================================

    Any idea how to fix this?
    Cheers
  • Jeremy Hanna at Apr 5, 2011 at 3:56 pm
    are you running with 'pig -x local myscript.pig' or just with 'pig myscript.pig'?
    On Apr 5, 2011, at 10:29 AM, Fabio Souto wrote:

    Hi,

    I had a bad enviroment variable
    PIG_PARTITIONER=RandomPartitioner
    instead of
    PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
    but I correct this and still not working. I have the same error

    Just in case I have this on my ~/.bash_profile

    export HADOOPDIR=/etc/hadoop-0.20/conf
    export HADOOP_CLASSPATH=/usr/cassandra/lib/*:$HADOOP_CLASSPATH
    export CLASSPATH=$HADOOPDIR:$CLASSPATH

    export PIG_CONF_DIR=$HADOOPDIR
    export PIG_CLASSPATH=/etc/hadoop/conf
    export PIG_CONF_DIR=$HADOOPDIR

    export PIG_INITIAL_ADDRESS=localhost
    export PIG_RPC_PORT=9160
    export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner


    BTW I'm using the pig version that comes with Cassandra, the one in cassandra/contrib/pig

    Thanks for your time Jeremy! :)
    Fabio
    On 05/04/2011, at 17:04, Jeremy Hanna wrote:

    Fabio,

    It looks like you need to set your environment variables to connect to cassandra. Check out the readme. Quoting here:
    Finally, set the following as environment variables (uppercase,
    underscored), or as Hadoop configuration variables (lowercase, dotted):
    * PIG_RPC_PORT or cassandra.thrift.port : the port thrift is listening on
    * PIG_INITIAL_ADDRESS or cassandra.thrift.address : initial address to connect to
    * PIG_PARTITIONER or cassandra.partitioner.class : cluster partitioner

    So you'll probably want to do:
    export PIG_INITIAL_ADDRESS=localhost
    export PIG_RPC_PORT=9160
    export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner

    Tante belle cose and let me know if this doesn't work,

    Jeremy
    On Apr 5, 2011, at 9:38 AM, Fabio Souto wrote:

    Hi Jeremy,

    Of course, here it is:

    Backend error message
    ---------------------
    java.lang.NumberFormatException: null
    at java.lang.Integer.parseInt(Integer.java:417)
    at java.lang.Integer.parseInt(Integer.java:499)
    at org.apache.cassandra.hadoop.ConfigHelper.getRpcPort(ConfigHelper.java:233)
    at org.apache.cassandra.hadoop.pig.CassandraStorage.setConnectionInformation(Unknown Source)
    at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown Source)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:133)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:111)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at org.apache.hadoop.mapred.Child.main(Child.java:234)

    Pig Stack Trace
    ---------------
    ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null

    org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias A. Backend error : Unable to recreate exception from backed error: java.lang.NumberFormatException: null
    at org.apache.pig.PigServer.openIterator(PigServer.java:742)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
    at org.apache.pig.Main.run(Main.java:465)
    at org.apache.pig.Main.main(Main.java:107)
    Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:221)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:151)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:337)
    at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
    at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
    at org.apache.pig.PigServer.storeEx(PigServer.java:874)
    at org.apache.pig.PigServer.store(PigServer.java:816)
    at org.apache.pig.PigServer.openIterator(PigServer.java:728)
    ... 7 more
    ================================================================================


    Thanks for all,
    Fabio

    On 05/04/2011, at 16:19, Jeremy Hanna wrote:

    Fabio,

    Could you post the full stack trace that's found in the pig_<long number>.log that's in the directory that you ran pig?

    Thanks,

    Jeremy
    On Apr 5, 2011, at 8:42 AM, Fabio Souto wrote:

    Hello,

    I have installed Pig 0.8.0 and Cassandra 0.7.4 and I'm not able to read data from cassandra. I write a simple query just to test:

    grunt> A = LOAD 'cassandra://msg_keyspace/messages' USING org.apache.cassandra.hadoop.pig.CassandraStorage();
    grunt> dump A;


    And i'm getting the following error:
    ==========================================================================
    2011-04-05 15:33:57,669 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
    2011-04-05 15:33:57,669 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used.
    2011-04-05 15:33:57,819 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: A: Store(hdfs://localhost/tmp/temp2037710644/tmp-29784200:org.apache.pig.impl.io.InterStorage) - scope-1 Operator Key: scope-1)
    2011-04-05 15:33:57,850 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
    2011-04-05 15:33:57,877 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
    2011-04-05 15:33:57,877 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
    2011-04-05 15:33:57,969 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
    2011-04-05 15:33:57,990 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2011-04-05 15:34:03,376 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
    2011-04-05 15:34:03,416 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
    2011-04-05 15:34:03,929 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
    2011-04-05 15:34:04,597 [Thread-5] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
    2011-04-05 15:34:05,942 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201104051459_0008
    2011-04-05 15:34:05,943 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201104051459_0008
    2011-04-05 15:34:35,912 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201104051459_0008 has failed! Stop running all dependent jobs
    2011-04-05 15:34:35,918 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
    2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
    2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
    2011-04-05 15:34:35,933 [main] INFO org.apache.pig.tools.pigstats.PigStats - Script Statistics:

    HadoopVersion PigVersion UserId StartedAt FinishedAt Features
    0.20.2-CDH3B4 0.8.0-SNAPSHOT root 2011-04-05 15:33:57 2011-04-05 15:34:35 UNKNOWN

    Failed!

    Failed Jobs:
    JobId Alias Feature Message Outputs
    job_201104051459_0008 A MAP_ONLY Message: Job failed! Error - NA hdfs://localhost/tmp/temp2037710644/tmp-29784200,

    Input(s):
    Failed to read data from "cassandra://msg_keyspace/messages"

    Output(s):
    Failed to produce result in "hdfs://localhost/tmp/temp2037710644/tmp-29784200"
    ==========================================================================

    Any idea how to fix this?
    Cheers
  • Jeremy Hanna at Apr 5, 2011 at 6:08 pm
    Hmmm, if it's the same error then it's not getting your PIG_RPC_PORT variable still.

    If you're running this in <cassandra_src>/contrib/pig:
    'bin/pig_cassandra -x local myscript.pig'
    then you should only need to set PIG_HOME, and the other environment variables for connecting to cassandra.

    If you want to run it against a cluster, what I've done is had a hadoop configuration locally and point PIG_CONF to <hadoop_home>/conf and put those three variables in the mapred-site.xml like this:
    <property>
    <name>cassandra.thrift.address</name>
    <value>123.45.67.89</value>
    </property>
    <property>
    <name>cassandra.thrift.port</name>
    <value>9160</value>
    </property>
    <property>
    <name>cassandra.partitioner.class</name>
    <value>org.apache.cassandra.dht.RandomPartitioner</value>
    </property>

    I would make sure you can get it to run locally first though.
    On Apr 5, 2011, at 10:29 AM, Fabio Souto wrote:

    Hi,

    I had a bad enviroment variable
    PIG_PARTITIONER=RandomPartitioner
    instead of
    PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
    but I correct this and still not working. I have the same error

    Just in case I have this on my ~/.bash_profile

    export HADOOPDIR=/etc/hadoop-0.20/conf
    export HADOOP_CLASSPATH=/usr/cassandra/lib/*:$HADOOP_CLASSPATH
    export CLASSPATH=$HADOOPDIR:$CLASSPATH

    export PIG_CONF_DIR=$HADOOPDIR
    export PIG_CLASSPATH=/etc/hadoop/conf
    export PIG_CONF_DIR=$HADOOPDIR

    export PIG_INITIAL_ADDRESS=localhost
    export PIG_RPC_PORT=9160
    export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner


    BTW I'm using the pig version that comes with Cassandra, the one in cassandra/contrib/pig

    Thanks for your time Jeremy! :)
    Fabio
    On 05/04/2011, at 17:04, Jeremy Hanna wrote:

    Fabio,

    It looks like you need to set your environment variables to connect to cassandra. Check out the readme. Quoting here:
    Finally, set the following as environment variables (uppercase,
    underscored), or as Hadoop configuration variables (lowercase, dotted):
    * PIG_RPC_PORT or cassandra.thrift.port : the port thrift is listening on
    * PIG_INITIAL_ADDRESS or cassandra.thrift.address : initial address to connect to
    * PIG_PARTITIONER or cassandra.partitioner.class : cluster partitioner

    So you'll probably want to do:
    export PIG_INITIAL_ADDRESS=localhost
    export PIG_RPC_PORT=9160
    export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner

    Tante belle cose and let me know if this doesn't work,

    Jeremy
    On Apr 5, 2011, at 9:38 AM, Fabio Souto wrote:

    Hi Jeremy,

    Of course, here it is:

    Backend error message
    ---------------------
    java.lang.NumberFormatException: null
    at java.lang.Integer.parseInt(Integer.java:417)
    at java.lang.Integer.parseInt(Integer.java:499)
    at org.apache.cassandra.hadoop.ConfigHelper.getRpcPort(ConfigHelper.java:233)
    at org.apache.cassandra.hadoop.pig.CassandraStorage.setConnectionInformation(Unknown Source)
    at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown Source)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:133)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:111)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at org.apache.hadoop.mapred.Child.main(Child.java:234)

    Pig Stack Trace
    ---------------
    ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null

    org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias A. Backend error : Unable to recreate exception from backed error: java.lang.NumberFormatException: null
    at org.apache.pig.PigServer.openIterator(PigServer.java:742)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
    at org.apache.pig.Main.run(Main.java:465)
    at org.apache.pig.Main.main(Main.java:107)
    Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:221)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:151)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:337)
    at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
    at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
    at org.apache.pig.PigServer.storeEx(PigServer.java:874)
    at org.apache.pig.PigServer.store(PigServer.java:816)
    at org.apache.pig.PigServer.openIterator(PigServer.java:728)
    ... 7 more
    ================================================================================


    Thanks for all,
    Fabio

    On 05/04/2011, at 16:19, Jeremy Hanna wrote:

    Fabio,

    Could you post the full stack trace that's found in the pig_<long number>.log that's in the directory that you ran pig?

    Thanks,

    Jeremy
    On Apr 5, 2011, at 8:42 AM, Fabio Souto wrote:

    Hello,

    I have installed Pig 0.8.0 and Cassandra 0.7.4 and I'm not able to read data from cassandra. I write a simple query just to test:

    grunt> A = LOAD 'cassandra://msg_keyspace/messages' USING org.apache.cassandra.hadoop.pig.CassandraStorage();
    grunt> dump A;


    And i'm getting the following error:
    ==========================================================================
    2011-04-05 15:33:57,669 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
    2011-04-05 15:33:57,669 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used.
    2011-04-05 15:33:57,819 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: A: Store(hdfs://localhost/tmp/temp2037710644/tmp-29784200:org.apache.pig.impl.io.InterStorage) - scope-1 Operator Key: scope-1)
    2011-04-05 15:33:57,850 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
    2011-04-05 15:33:57,877 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
    2011-04-05 15:33:57,877 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
    2011-04-05 15:33:57,969 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
    2011-04-05 15:33:57,990 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2011-04-05 15:34:03,376 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
    2011-04-05 15:34:03,416 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
    2011-04-05 15:34:03,929 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
    2011-04-05 15:34:04,597 [Thread-5] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
    2011-04-05 15:34:05,942 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201104051459_0008
    2011-04-05 15:34:05,943 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201104051459_0008
    2011-04-05 15:34:35,912 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201104051459_0008 has failed! Stop running all dependent jobs
    2011-04-05 15:34:35,918 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
    2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
    2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
    2011-04-05 15:34:35,933 [main] INFO org.apache.pig.tools.pigstats.PigStats - Script Statistics:

    HadoopVersion PigVersion UserId StartedAt FinishedAt Features
    0.20.2-CDH3B4 0.8.0-SNAPSHOT root 2011-04-05 15:33:57 2011-04-05 15:34:35 UNKNOWN

    Failed!

    Failed Jobs:
    JobId Alias Feature Message Outputs
    job_201104051459_0008 A MAP_ONLY Message: Job failed! Error - NA hdfs://localhost/tmp/temp2037710644/tmp-29784200,

    Input(s):
    Failed to read data from "cassandra://msg_keyspace/messages"

    Output(s):
    Failed to produce result in "hdfs://localhost/tmp/temp2037710644/tmp-29784200"
    ==========================================================================

    Any idea how to fix this?
    Cheers
  • Fabio Souto at Apr 6, 2011 at 9:16 am
    It works. Thank you for your help Jeremy!!

    Cheers
    Fabio
    On 05/04/2011, at 20:08, Jeremy Hanna wrote:

    Hmmm, if it's the same error then it's not getting your PIG_RPC_PORT variable still.

    If you're running this in <cassandra_src>/contrib/pig:
    'bin/pig_cassandra -x local myscript.pig'
    then you should only need to set PIG_HOME, and the other environment variables for connecting to cassandra.

    If you want to run it against a cluster, what I've done is had a hadoop configuration locally and point PIG_CONF to <hadoop_home>/conf and put those three variables in the mapred-site.xml like this:
    <property>
    <name>cassandra.thrift.address</name>
    <value>123.45.67.89</value>
    </property>
    <property>
    <name>cassandra.thrift.port</name>
    <value>9160</value>
    </property>
    <property>
    <name>cassandra.partitioner.class</name>
    <value>org.apache.cassandra.dht.RandomPartitioner</value>
    </property>

    I would make sure you can get it to run locally first though.
    On Apr 5, 2011, at 10:29 AM, Fabio Souto wrote:

    Hi,

    I had a bad enviroment variable
    PIG_PARTITIONER=RandomPartitioner
    instead of
    PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
    but I correct this and still not working. I have the same error

    Just in case I have this on my ~/.bash_profile

    export HADOOPDIR=/etc/hadoop-0.20/conf
    export HADOOP_CLASSPATH=/usr/cassandra/lib/*:$HADOOP_CLASSPATH
    export CLASSPATH=$HADOOPDIR:$CLASSPATH

    export PIG_CONF_DIR=$HADOOPDIR
    export PIG_CLASSPATH=/etc/hadoop/conf
    export PIG_CONF_DIR=$HADOOPDIR

    export PIG_INITIAL_ADDRESS=localhost
    export PIG_RPC_PORT=9160
    export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner


    BTW I'm using the pig version that comes with Cassandra, the one in cassandra/contrib/pig

    Thanks for your time Jeremy! :)
    Fabio
    On 05/04/2011, at 17:04, Jeremy Hanna wrote:

    Fabio,

    It looks like you need to set your environment variables to connect to cassandra. Check out the readme. Quoting here:
    Finally, set the following as environment variables (uppercase,
    underscored), or as Hadoop configuration variables (lowercase, dotted):
    * PIG_RPC_PORT or cassandra.thrift.port : the port thrift is listening on
    * PIG_INITIAL_ADDRESS or cassandra.thrift.address : initial address to connect to
    * PIG_PARTITIONER or cassandra.partitioner.class : cluster partitioner

    So you'll probably want to do:
    export PIG_INITIAL_ADDRESS=localhost
    export PIG_RPC_PORT=9160
    export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner

    Tante belle cose and let me know if this doesn't work,

    Jeremy
    On Apr 5, 2011, at 9:38 AM, Fabio Souto wrote:

    Hi Jeremy,

    Of course, here it is:

    Backend error message
    ---------------------
    java.lang.NumberFormatException: null
    at java.lang.Integer.parseInt(Integer.java:417)
    at java.lang.Integer.parseInt(Integer.java:499)
    at org.apache.cassandra.hadoop.ConfigHelper.getRpcPort(ConfigHelper.java:233)
    at org.apache.cassandra.hadoop.pig.CassandraStorage.setConnectionInformation(Unknown Source)
    at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown Source)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:133)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:111)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at org.apache.hadoop.mapred.Child.main(Child.java:234)

    Pig Stack Trace
    ---------------
    ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null

    org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias A. Backend error : Unable to recreate exception from backed error: java.lang.NumberFormatException: null
    at org.apache.pig.PigServer.openIterator(PigServer.java:742)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
    at org.apache.pig.Main.run(Main.java:465)
    at org.apache.pig.Main.main(Main.java:107)
    Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:221)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:151)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:337)
    at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
    at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
    at org.apache.pig.PigServer.storeEx(PigServer.java:874)
    at org.apache.pig.PigServer.store(PigServer.java:816)
    at org.apache.pig.PigServer.openIterator(PigServer.java:728)
    ... 7 more
    ================================================================================


    Thanks for all,
    Fabio

    On 05/04/2011, at 16:19, Jeremy Hanna wrote:

    Fabio,

    Could you post the full stack trace that's found in the pig_<long number>.log that's in the directory that you ran pig?

    Thanks,

    Jeremy
    On Apr 5, 2011, at 8:42 AM, Fabio Souto wrote:

    Hello,

    I have installed Pig 0.8.0 and Cassandra 0.7.4 and I'm not able to read data from cassandra. I write a simple query just to test:

    grunt> A = LOAD 'cassandra://msg_keyspace/messages' USING org.apache.cassandra.hadoop.pig.CassandraStorage();
    grunt> dump A;


    And i'm getting the following error:
    ==========================================================================
    2011-04-05 15:33:57,669 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
    2011-04-05 15:33:57,669 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used.
    2011-04-05 15:33:57,819 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: A: Store(hdfs://localhost/tmp/temp2037710644/tmp-29784200:org.apache.pig.impl.io.InterStorage) - scope-1 Operator Key: scope-1)
    2011-04-05 15:33:57,850 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
    2011-04-05 15:33:57,877 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
    2011-04-05 15:33:57,877 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
    2011-04-05 15:33:57,969 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
    2011-04-05 15:33:57,990 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2011-04-05 15:34:03,376 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
    2011-04-05 15:34:03,416 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
    2011-04-05 15:34:03,929 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
    2011-04-05 15:34:04,597 [Thread-5] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
    2011-04-05 15:34:05,942 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201104051459_0008
    2011-04-05 15:34:05,943 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201104051459_0008
    2011-04-05 15:34:35,912 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201104051459_0008 has failed! Stop running all dependent jobs
    2011-04-05 15:34:35,918 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
    2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
    2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
    2011-04-05 15:34:35,933 [main] INFO org.apache.pig.tools.pigstats.PigStats - Script Statistics:

    HadoopVersion PigVersion UserId StartedAt FinishedAt Features
    0.20.2-CDH3B4 0.8.0-SNAPSHOT root 2011-04-05 15:33:57 2011-04-05 15:34:35 UNKNOWN

    Failed!

    Failed Jobs:
    JobId Alias Feature Message Outputs
    job_201104051459_0008 A MAP_ONLY Message: Job failed! Error - NA hdfs://localhost/tmp/temp2037710644/tmp-29784200,

    Input(s):
    Failed to read data from "cassandra://msg_keyspace/messages"

    Output(s):
    Failed to produce result in "hdfs://localhost/tmp/temp2037710644/tmp-29784200"
    ==========================================================================

    Any idea how to fix this?
    Cheers
  • Jeremy Hanna at Apr 6, 2011 at 2:29 pm
    Glad it's working for you! Also, I've started a github project that might be helpful going forward. It's called Pygmalion and is for info, scripts, and UDFs to help running Pig with Cassandra. It only has a few resources now but I am planning on adding a couple more UDFs over the next couple of days. Feel free to add to it as well :).

    https://github.com/jeromatron/pygmalion

    Jeremy
    On Apr 6, 2011, at 4:15 AM, Fabio Souto wrote:

    It works. Thank you for your help Jeremy!!

    Cheers
    Fabio
    On 05/04/2011, at 20:08, Jeremy Hanna wrote:

    Hmmm, if it's the same error then it's not getting your PIG_RPC_PORT variable still.

    If you're running this in <cassandra_src>/contrib/pig:
    'bin/pig_cassandra -x local myscript.pig'
    then you should only need to set PIG_HOME, and the other environment variables for connecting to cassandra.

    If you want to run it against a cluster, what I've done is had a hadoop configuration locally and point PIG_CONF to <hadoop_home>/conf and put those three variables in the mapred-site.xml like this:
    <property>
    <name>cassandra.thrift.address</name>
    <value>123.45.67.89</value>
    </property>
    <property>
    <name>cassandra.thrift.port</name>
    <value>9160</value>
    </property>
    <property>
    <name>cassandra.partitioner.class</name>
    <value>org.apache.cassandra.dht.RandomPartitioner</value>
    </property>

    I would make sure you can get it to run locally first though.
    On Apr 5, 2011, at 10:29 AM, Fabio Souto wrote:

    Hi,

    I had a bad enviroment variable
    PIG_PARTITIONER=RandomPartitioner
    instead of
    PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
    but I correct this and still not working. I have the same error

    Just in case I have this on my ~/.bash_profile

    export HADOOPDIR=/etc/hadoop-0.20/conf
    export HADOOP_CLASSPATH=/usr/cassandra/lib/*:$HADOOP_CLASSPATH
    export CLASSPATH=$HADOOPDIR:$CLASSPATH

    export PIG_CONF_DIR=$HADOOPDIR
    export PIG_CLASSPATH=/etc/hadoop/conf
    export PIG_CONF_DIR=$HADOOPDIR

    export PIG_INITIAL_ADDRESS=localhost
    export PIG_RPC_PORT=9160
    export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner


    BTW I'm using the pig version that comes with Cassandra, the one in cassandra/contrib/pig

    Thanks for your time Jeremy! :)
    Fabio
    On 05/04/2011, at 17:04, Jeremy Hanna wrote:

    Fabio,

    It looks like you need to set your environment variables to connect to cassandra. Check out the readme. Quoting here:
    Finally, set the following as environment variables (uppercase,
    underscored), or as Hadoop configuration variables (lowercase, dotted):
    * PIG_RPC_PORT or cassandra.thrift.port : the port thrift is listening on
    * PIG_INITIAL_ADDRESS or cassandra.thrift.address : initial address to connect to
    * PIG_PARTITIONER or cassandra.partitioner.class : cluster partitioner

    So you'll probably want to do:
    export PIG_INITIAL_ADDRESS=localhost
    export PIG_RPC_PORT=9160
    export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner

    Tante belle cose and let me know if this doesn't work,

    Jeremy
    On Apr 5, 2011, at 9:38 AM, Fabio Souto wrote:

    Hi Jeremy,

    Of course, here it is:

    Backend error message
    ---------------------
    java.lang.NumberFormatException: null
    at java.lang.Integer.parseInt(Integer.java:417)
    at java.lang.Integer.parseInt(Integer.java:499)
    at org.apache.cassandra.hadoop.ConfigHelper.getRpcPort(ConfigHelper.java:233)
    at org.apache.cassandra.hadoop.pig.CassandraStorage.setConnectionInformation(Unknown Source)
    at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown Source)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:133)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:111)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at org.apache.hadoop.mapred.Child.main(Child.java:234)

    Pig Stack Trace
    ---------------
    ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null

    org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias A. Backend error : Unable to recreate exception from backed error: java.lang.NumberFormatException: null
    at org.apache.pig.PigServer.openIterator(PigServer.java:742)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
    at org.apache.pig.Main.run(Main.java:465)
    at org.apache.pig.Main.main(Main.java:107)
    Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:221)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:151)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:337)
    at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
    at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
    at org.apache.pig.PigServer.storeEx(PigServer.java:874)
    at org.apache.pig.PigServer.store(PigServer.java:816)
    at org.apache.pig.PigServer.openIterator(PigServer.java:728)
    ... 7 more
    ================================================================================


    Thanks for all,
    Fabio

    On 05/04/2011, at 16:19, Jeremy Hanna wrote:

    Fabio,

    Could you post the full stack trace that's found in the pig_<long number>.log that's in the directory that you ran pig?

    Thanks,

    Jeremy
    On Apr 5, 2011, at 8:42 AM, Fabio Souto wrote:

    Hello,

    I have installed Pig 0.8.0 and Cassandra 0.7.4 and I'm not able to read data from cassandra. I write a simple query just to test:

    grunt> A = LOAD 'cassandra://msg_keyspace/messages' USING org.apache.cassandra.hadoop.pig.CassandraStorage();
    grunt> dump A;


    And i'm getting the following error:
    ==========================================================================
    2011-04-05 15:33:57,669 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
    2011-04-05 15:33:57,669 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used.
    2011-04-05 15:33:57,819 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: A: Store(hdfs://localhost/tmp/temp2037710644/tmp-29784200:org.apache.pig.impl.io.InterStorage) - scope-1 Operator Key: scope-1)
    2011-04-05 15:33:57,850 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
    2011-04-05 15:33:57,877 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
    2011-04-05 15:33:57,877 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
    2011-04-05 15:33:57,969 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
    2011-04-05 15:33:57,990 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2011-04-05 15:34:03,376 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
    2011-04-05 15:34:03,416 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
    2011-04-05 15:34:03,929 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
    2011-04-05 15:34:04,597 [Thread-5] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
    2011-04-05 15:34:05,942 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201104051459_0008
    2011-04-05 15:34:05,943 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201104051459_0008
    2011-04-05 15:34:35,912 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201104051459_0008 has failed! Stop running all dependent jobs
    2011-04-05 15:34:35,918 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
    2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
    2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
    2011-04-05 15:34:35,933 [main] INFO org.apache.pig.tools.pigstats.PigStats - Script Statistics:

    HadoopVersion PigVersion UserId StartedAt FinishedAt Features
    0.20.2-CDH3B4 0.8.0-SNAPSHOT root 2011-04-05 15:33:57 2011-04-05 15:34:35 UNKNOWN

    Failed!

    Failed Jobs:
    JobId Alias Feature Message Outputs
    job_201104051459_0008 A MAP_ONLY Message: Job failed! Error - NA hdfs://localhost/tmp/temp2037710644/tmp-29784200,

    Input(s):
    Failed to read data from "cassandra://msg_keyspace/messages"

    Output(s):
    Failed to produce result in "hdfs://localhost/tmp/temp2037710644/tmp-29784200"
    ==========================================================================

    Any idea how to fix this?
    Cheers

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedApr 5, '11 at 1:42p
activeApr 6, '11 at 2:29p
posts9
users2
websitepig.apache.org

2 users in discussion

Jeremy Hanna: 5 posts Fabio Souto: 4 posts

People

Translate

site design / logo © 2021 Grokbase