Grokbase Groups Pig user July 2011
FAQ
hey all, i'v been trying to query cassandra using my pig script,
so i used the contrib jar from cassandra. and i'm getting the following
error...
some thrift failure err.... :|

ERROR 2998: Unhandled internal error.
org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V

java.lang.NoSuchMethodError:
org.apache.thrift.meta_data.FieldValueMetaData.(SliceRange.java:149)
at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
Source)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
at
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
at
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
at org.apache.pig.PigServer.storeEx(PigServer.java:874)
at org.apache.pig.PigServer.store(PigServer.java:816)
at org.apache.pig.PigServer.openIterator(PigServer.java:728)
at
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
at org.apache.pig.Main.run(Main.java:465)
at org.apache.pig.Main.main(Main.java:107)


does anyone managed to get this up and running?
i'm considering to rewrite the CassandraStorage.jar using Hector,
Any thoughts about that?

Search Discussions

  • Shai Harel at Jul 31, 2011 at 3:13 pm
    i'v migrated to pig 0.9 and now i get

    Pig Stack Trace
    ---------------
    ERROR 2017: Internal error creating job configuration.

    org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
    open iterator for alias rows
    at org.apache.pig.PigServer.openIterator(PigServer.java:900)
    at
    org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:655)
    at
    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:67)
    at org.apache.pig.Main.run(Main.java:487)
    at org.apache.pig.Main.main(Main.java:108)
    Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias
    rows
    at org.apache.pig.PigServer.storeEx(PigServer.java:999)
    at org.apache.pig.PigServer.store(PigServer.java:962)
    at org.apache.pig.PigServer.openIterator(PigServer.java:875)
    ... 7 more
    Caused by:
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException:
    ERROR 2017: Internal error creating job configuration.
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:712)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:258)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
    at org.apache.pig.PigServer.launchPlan(PigServer.java:1313)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1298)
    at org.apache.pig.PigServer.storeEx(PigServer.java:995)
    ... 9 more
    Caused by: java.lang.NullPointerException
    at org.apache.thrift.TSerializer.serialize(TSerializer.java:79)
    at
    org.apache.cassandra.hadoop.pig.CassandraStorage.cfdefToString(Unknown
    Source)
    at org.apache.cassandra.hadoop.pig.CassandraStorage.initSchema(Unknown
    Source)
    at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
    Source)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:382)
    ... 14 more


    anyone has a clue?

    On Sun, Jul 31, 2011 at 7:48 AM, Shai Harel wrote:

    hey all, i'v been trying to query cassandra using my pig script,
    so i used the contrib jar from cassandra. and i'm getting the following
    error...
    some thrift failure err.... :|

    ERROR 2998: Unhandled internal error.
    org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V

    java.lang.NoSuchMethodError:
    org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
    at org.apache.cassandra.thrift.SliceRange.<clinit>(SliceRange.java:149)
    at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
    Source)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
    at
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
    at org.apache.pig.PigServer.storeEx(PigServer.java:874)
    at org.apache.pig.PigServer.store(PigServer.java:816)
    at org.apache.pig.PigServer.openIterator(PigServer.java:728)
    at
    org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
    at
    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
    at org.apache.pig.Main.run(Main.java:465)
    at org.apache.pig.Main.main(Main.java:107)


    does anyone managed to get this up and running?
    i'm considering to rewrite the CassandraStorage.jar using Hector,
    Any thoughts about that?
  • Jeremy Hanna at Jul 31, 2011 at 9:05 pm
    Try following this and see if it helps getting started:
    https://github.com/jeromatron/pygmalion/wiki/Getting-Started

    I haven't tried it with 0.9 yet but I plan to this week. We use the CassandraStorage jar in production. If you can, validate your data with Cassandra's schema validators. CassandraStorage gets the schema from Cassandra and tries to unmarshal the data into Pig data types with the schema information.

    See if that helps.
    On Jul 31, 2011, at 9:48 AM, Shai Harel wrote:

    hey all, i'v been trying to query cassandra using my pig script,
    so i used the contrib jar from cassandra. and i'm getting the following
    error...
    some thrift failure err.... :|

    ERROR 2998: Unhandled internal error.
    org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V

    java.lang.NoSuchMethodError:
    org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
    at org.apache.cassandra.thrift.SliceRange.<clinit>(SliceRange.java:149)
    at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
    Source)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
    at
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
    at org.apache.pig.PigServer.storeEx(PigServer.java:874)
    at org.apache.pig.PigServer.store(PigServer.java:816)
    at org.apache.pig.PigServer.openIterator(PigServer.java:728)
    at
    org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
    at
    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
    at org.apache.pig.Main.run(Main.java:465)
    at org.apache.pig.Main.main(Main.java:107)


    does anyone managed to get this up and running?
    i'm considering to rewrite the CassandraStorage.jar using Hector,
    Any thoughts about that?
  • Shai Harel at Aug 1, 2011 at 9:42 am
    thanks for the help, i'v tried to be conservative and i'm using pig 0.8 &
    cassandra 0.8
    and still getting this error

    Pig Stack Trace
    ---------------
    ERROR 2998: Unhandled internal error. Could not initialize class
    org.apache.cassandra.thrift.SliceRange

    java.lang.NoClassDefFoundError: Could not initialize class
    org.apache.cassandra.thrift.SliceRange
    at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
    Source)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
    at
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
    at org.apache.pig.PigServer.storeEx(PigServer.java:874)
    at org.apache.pig.PigServer.store(PigServer.java:816)
    at org.apache.pig.PigServer.openIterator(PigServer.java:728)
    at
    org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
    at
    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
    at org.apache.pig.Main.run(Main.java:465)
    at org.apache.pig.Main.main(Main.java:107)

    does anyone else have this problem?

    On Sun, Jul 31, 2011 at 2:04 PM, Jeremy Hanna wrote:

    Try following this and see if it helps getting started:
    https://github.com/jeromatron/pygmalion/wiki/Getting-Started

    I haven't tried it with 0.9 yet but I plan to this week. We use the
    CassandraStorage jar in production. If you can, validate your data with
    Cassandra's schema validators. CassandraStorage gets the schema from
    Cassandra and tries to unmarshal the data into Pig data types with the
    schema information.

    See if that helps.
    On Jul 31, 2011, at 9:48 AM, Shai Harel wrote:

    hey all, i'v been trying to query cassandra using my pig script,
    so i used the contrib jar from cassandra. and i'm getting the following
    error...
    some thrift failure err.... :|

    ERROR 2998: Unhandled internal error.
    org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V

    java.lang.NoSuchMethodError:
    org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
    at
    org.apache.cassandra.thrift.SliceRange.<clinit>(SliceRange.java:149)
    at
    org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
    Source)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
    at
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
    at org.apache.pig.PigServer.storeEx(PigServer.java:874)
    at org.apache.pig.PigServer.store(PigServer.java:816)
    at org.apache.pig.PigServer.openIterator(PigServer.java:728)
    at
    org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
    at
    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
    at org.apache.pig.Main.run(Main.java:465)
    at org.apache.pig.Main.main(Main.java:107)


    does anyone managed to get this up and running?
    i'm considering to rewrite the CassandraStorage.jar using Hector,
    Any thoughts about that?
  • Shai Harel at Aug 1, 2011 at 12:43 pm
    hey all, i'v successfully fixed this problem,
    i was missing the cassandra jars,
    so you actually need to build cassandra (ant) and then you need to jar it
    (ant jar)
    and only then it'll work

    BTW if you have hue installed, remove it first!


    On Mon, Aug 1, 2011 at 12:41 PM, Shai Harel wrote:

    thanks for the help, i'v tried to be conservative and i'm using pig 0.8 &
    cassandra 0.8
    and still getting this error

    Pig Stack Trace
    ---------------
    ERROR 2998: Unhandled internal error. Could not initialize class
    org.apache.cassandra.thrift.SliceRange

    java.lang.NoClassDefFoundError: Could not initialize class
    org.apache.cassandra.thrift.SliceRange

    at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
    Source)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
    at
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
    at org.apache.pig.PigServer.storeEx(PigServer.java:874)
    at org.apache.pig.PigServer.store(PigServer.java:816)
    at org.apache.pig.PigServer.openIterator(PigServer.java:728)
    at
    org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
    at
    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
    at org.apache.pig.Main.run(Main.java:465)
    at org.apache.pig.Main.main(Main.java:107)

    does anyone else have this problem?


    On Sun, Jul 31, 2011 at 2:04 PM, Jeremy Hanna wrote:

    Try following this and see if it helps getting started:
    https://github.com/jeromatron/pygmalion/wiki/Getting-Started

    I haven't tried it with 0.9 yet but I plan to this week. We use the
    CassandraStorage jar in production. If you can, validate your data with
    Cassandra's schema validators. CassandraStorage gets the schema from
    Cassandra and tries to unmarshal the data into Pig data types with the
    schema information.

    See if that helps.
    On Jul 31, 2011, at 9:48 AM, Shai Harel wrote:

    hey all, i'v been trying to query cassandra using my pig script,
    so i used the contrib jar from cassandra. and i'm getting the following
    error...
    some thrift failure err.... :|

    ERROR 2998: Unhandled internal error.
    org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V

    java.lang.NoSuchMethodError:
    org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
    at
    org.apache.cassandra.thrift.SliceRange.<clinit>(SliceRange.java:149)
    at
    org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
    Source)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
    at
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
    at org.apache.pig.PigServer.storeEx(PigServer.java:874)
    at org.apache.pig.PigServer.store(PigServer.java:816)
    at org.apache.pig.PigServer.openIterator(PigServer.java:728)
    at
    org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
    at
    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
    at org.apache.pig.Main.run(Main.java:465)
    at org.apache.pig.Main.main(Main.java:107)


    does anyone managed to get this up and running?
    i'm considering to rewrite the CassandraStorage.jar using Hector,
    Any thoughts about that?
  • Jeremy Hanna at Aug 1, 2011 at 2:55 pm
    Ah - just saw this, glad you got it working - cheers.
    On Aug 1, 2011, at 5:43 AM, Shai Harel wrote:

    hey all, i'v successfully fixed this problem,
    i was missing the cassandra jars,
    so you actually need to build cassandra (ant) and then you need to jar it
    (ant jar)
    and only then it'll work

    BTW if you have hue installed, remove it first!


    On Mon, Aug 1, 2011 at 12:41 PM, Shai Harel wrote:

    thanks for the help, i'v tried to be conservative and i'm using pig 0.8 &
    cassandra 0.8
    and still getting this error

    Pig Stack Trace
    ---------------
    ERROR 2998: Unhandled internal error. Could not initialize class
    org.apache.cassandra.thrift.SliceRange

    java.lang.NoClassDefFoundError: Could not initialize class
    org.apache.cassandra.thrift.SliceRange

    at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
    Source)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
    at
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
    at org.apache.pig.PigServer.storeEx(PigServer.java:874)
    at org.apache.pig.PigServer.store(PigServer.java:816)
    at org.apache.pig.PigServer.openIterator(PigServer.java:728)
    at
    org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
    at
    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
    at org.apache.pig.Main.run(Main.java:465)
    at org.apache.pig.Main.main(Main.java:107)

    does anyone else have this problem?


    On Sun, Jul 31, 2011 at 2:04 PM, Jeremy Hanna wrote:

    Try following this and see if it helps getting started:
    https://github.com/jeromatron/pygmalion/wiki/Getting-Started

    I haven't tried it with 0.9 yet but I plan to this week. We use the
    CassandraStorage jar in production. If you can, validate your data with
    Cassandra's schema validators. CassandraStorage gets the schema from
    Cassandra and tries to unmarshal the data into Pig data types with the
    schema information.

    See if that helps.
    On Jul 31, 2011, at 9:48 AM, Shai Harel wrote:

    hey all, i'v been trying to query cassandra using my pig script,
    so i used the contrib jar from cassandra. and i'm getting the following
    error...
    some thrift failure err.... :|

    ERROR 2998: Unhandled internal error.
    org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V

    java.lang.NoSuchMethodError:
    org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
    at
    org.apache.cassandra.thrift.SliceRange.<clinit>(SliceRange.java:149)
    at
    org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
    Source)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
    at
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
    at org.apache.pig.PigServer.storeEx(PigServer.java:874)
    at org.apache.pig.PigServer.store(PigServer.java:816)
    at org.apache.pig.PigServer.openIterator(PigServer.java:728)
    at
    org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
    at
    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
    at org.apache.pig.Main.run(Main.java:465)
    at org.apache.pig.Main.main(Main.java:107)


    does anyone managed to get this up and running?
    i'm considering to rewrite the CassandraStorage.jar using Hector,
    Any thoughts about that?
  • Shai Harel at Aug 2, 2011 at 11:41 am
    Jeremy, where you able to make it run on AMAZON elastic map reduce
    machines?

    i'v tried to copy the jars (both pig's and cassandra) to the new machine
    set the PIG_HOME environment variable
    even added the hadoop config files to the class path
    and I'm getting this error

    Error before Pig is launched
    ----------------------------
    ERROR 2999: Unexpected internal error. Failed to create DataStorage

    java.lang.RuntimeException: Failed to create DataStorage
    at
    org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
    at
    org.apache.pig.backend.hadoop.datastorage.HDataStorage.(HExecutionEngine.java:213)
    at
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:133)
    at org.apache.pig.impl.PigContext.connect(PigContext.java:183)
    at org.apache.pig.PigServer.(PigServer.java:214)
    at org.apache.pig.tools.grunt.Grunt.(Main.java:462)
    at org.apache.pig.Main.main(Main.java:107)
    Caused by: java.io.IOException: Call to
    ip-10-56-51-167.eu-west-1.compute.internal/10.56.51.167:9000 failed on local
    exception: java.io.EOFExc
    eption
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:1139)
    at org.apache.hadoop.ipc.Client.call(Client.java:1107)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
    at $Proxy0.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:398)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384)
    at
    org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:111)
    at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:180)
    at
    org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
    at
    org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1514)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
    at
    org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1548)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1530)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:111)
    at
    org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)
    ... 9 more
    Caused by: java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at
    org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:812)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:720)
    ================================================================================

    Amazon claims to run hadoop v 0.20, what am i doing wrong?


    On Mon, Aug 1, 2011 at 5:55 PM, Jeremy Hanna wrote:

    Ah - just saw this, glad you got it working - cheers.
    On Aug 1, 2011, at 5:43 AM, Shai Harel wrote:

    hey all, i'v successfully fixed this problem,
    i was missing the cassandra jars,
    so you actually need to build cassandra (ant) and then you need to jar it
    (ant jar)
    and only then it'll work

    BTW if you have hue installed, remove it first!


    On Mon, Aug 1, 2011 at 12:41 PM, Shai Harel wrote:

    thanks for the help, i'v tried to be conservative and i'm using pig 0.8
    &
    cassandra 0.8
    and still getting this error

    Pig Stack Trace
    ---------------
    ERROR 2998: Unhandled internal error. Could not initialize class
    org.apache.cassandra.thrift.SliceRange

    java.lang.NoClassDefFoundError: Could not initialize class
    org.apache.cassandra.thrift.SliceRange

    at
    org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
    Source)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
    at
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
    at org.apache.pig.PigServer.storeEx(PigServer.java:874)
    at org.apache.pig.PigServer.store(PigServer.java:816)
    at org.apache.pig.PigServer.openIterator(PigServer.java:728)
    at
    org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
    at
    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
    at org.apache.pig.Main.run(Main.java:465)
    at org.apache.pig.Main.main(Main.java:107)

    does anyone else have this problem?



    On Sun, Jul 31, 2011 at 2:04 PM, Jeremy Hanna <
    jeremy.hanna1234@gmail.com>wrote:
    Try following this and see if it helps getting started:
    https://github.com/jeromatron/pygmalion/wiki/Getting-Started

    I haven't tried it with 0.9 yet but I plan to this week. We use the
    CassandraStorage jar in production. If you can, validate your data
    with
    Cassandra's schema validators. CassandraStorage gets the schema from
    Cassandra and tries to unmarshal the data into Pig data types with the
    schema information.

    See if that helps.
    On Jul 31, 2011, at 9:48 AM, Shai Harel wrote:

    hey all, i'v been trying to query cassandra using my pig script,
    so i used the contrib jar from cassandra. and i'm getting the
    following
    error...
    some thrift failure err.... :|

    ERROR 2998: Unhandled internal error.
    org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V

    java.lang.NoSuchMethodError:
    org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
    at
    org.apache.cassandra.thrift.SliceRange.<clinit>(SliceRange.java:149)
    at
    org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
    Source)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
    at
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
    at org.apache.pig.PigServer.storeEx(PigServer.java:874)
    at org.apache.pig.PigServer.store(PigServer.java:816)
    at org.apache.pig.PigServer.openIterator(PigServer.java:728)
    at
    org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
    at
    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
    at org.apache.pig.Main.run(Main.java:465)
    at org.apache.pig.Main.main(Main.java:107)


    does anyone managed to get this up and running?
    i'm considering to rewrite the CassandraStorage.jar using Hector,
    Any thoughts about that?
  • Jeremy Hanna at Aug 2, 2011 at 2:01 pm
    afaik, amazon still uses Pig 0.6 on emr, though they've said they were in the process of upgrading in discussion threads.
    http://aws.amazon.com/elasticmapreduce/faqs/#pig-7
    https://forums.aws.amazon.com/thread.jspa?messageID=233903&#249998

    Pig 0.6 doesn't have the concept of loadfunc/storefunc, which was added in 0.7. That's the extension point that Cassandra uses.

    I've heard that you can just deploy a newer version of pig yourself in your emr cluster, but I haven't messed with doing that. We just went with our own cluster in ec2 so that we would control versions after we got some odd errors with emr that we couldn't track down or reproduce.

    Sorry I can't be of more help there.
    On Aug 2, 2011, at 7:40 AM, Shai Harel wrote:

    Jeremy, where you able to make it run on AMAZON elastic map reduce
    machines?

    i'v tried to copy the jars (both pig's and cassandra) to the new machine
    set the PIG_HOME environment variable
    even added the hadoop config files to the class path
    and I'm getting this error

    Error before Pig is launched
    ----------------------------
    ERROR 2999: Unexpected internal error. Failed to create DataStorage

    java.lang.RuntimeException: Failed to create DataStorage
    at
    org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
    at
    org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:58)
    at
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:213)
    at
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:133)
    at org.apache.pig.impl.PigContext.connect(PigContext.java:183)
    at org.apache.pig.PigServer.<init>(PigServer.java:225)
    at org.apache.pig.PigServer.<init>(PigServer.java:214)
    at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:55)
    at org.apache.pig.Main.run(Main.java:462)
    at org.apache.pig.Main.main(Main.java:107)
    Caused by: java.io.IOException: Call to
    ip-10-56-51-167.eu-west-1.compute.internal/10.56.51.167:9000 failed on local
    exception: java.io.EOFExc
    eption
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:1139)
    at org.apache.hadoop.ipc.Client.call(Client.java:1107)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
    at $Proxy0.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:398)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384)
    at
    org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:111)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:213)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:180)
    at
    org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
    at
    org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1514)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
    at
    org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1548)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1530)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:111)
    at
    org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)
    ... 9 more
    Caused by: java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at
    org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:812)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:720)
    ================================================================================

    Amazon claims to run hadoop v 0.20, what am i doing wrong?


    On Mon, Aug 1, 2011 at 5:55 PM, Jeremy Hanna wrote:

    Ah - just saw this, glad you got it working - cheers.
    On Aug 1, 2011, at 5:43 AM, Shai Harel wrote:

    hey all, i'v successfully fixed this problem,
    i was missing the cassandra jars,
    so you actually need to build cassandra (ant) and then you need to jar it
    (ant jar)
    and only then it'll work

    BTW if you have hue installed, remove it first!



    On Mon, Aug 1, 2011 at 12:41 PM, Shai Harel <shai.harel@mythings.com>
    wrote:
    thanks for the help, i'v tried to be conservative and i'm using pig 0.8
    &
    cassandra 0.8
    and still getting this error

    Pig Stack Trace
    ---------------
    ERROR 2998: Unhandled internal error. Could not initialize class
    org.apache.cassandra.thrift.SliceRange

    java.lang.NoClassDefFoundError: Could not initialize class
    org.apache.cassandra.thrift.SliceRange

    at
    org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
    Source)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
    at
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
    at org.apache.pig.PigServer.storeEx(PigServer.java:874)
    at org.apache.pig.PigServer.store(PigServer.java:816)
    at org.apache.pig.PigServer.openIterator(PigServer.java:728)
    at
    org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
    at
    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
    at org.apache.pig.Main.run(Main.java:465)
    at org.apache.pig.Main.main(Main.java:107)

    does anyone else have this problem?



    On Sun, Jul 31, 2011 at 2:04 PM, Jeremy Hanna <
    jeremy.hanna1234@gmail.com>wrote:
    Try following this and see if it helps getting started:
    https://github.com/jeromatron/pygmalion/wiki/Getting-Started

    I haven't tried it with 0.9 yet but I plan to this week. We use the
    CassandraStorage jar in production. If you can, validate your data
    with
    Cassandra's schema validators. CassandraStorage gets the schema from
    Cassandra and tries to unmarshal the data into Pig data types with the
    schema information.

    See if that helps.
    On Jul 31, 2011, at 9:48 AM, Shai Harel wrote:

    hey all, i'v been trying to query cassandra using my pig script,
    so i used the contrib jar from cassandra. and i'm getting the
    following
    error...
    some thrift failure err.... :|

    ERROR 2998: Unhandled internal error.
    org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V

    java.lang.NoSuchMethodError:
    org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
    at
    org.apache.cassandra.thrift.SliceRange.<clinit>(SliceRange.java:149)
    at
    org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
    Source)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
    at
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
    at org.apache.pig.PigServer.storeEx(PigServer.java:874)
    at org.apache.pig.PigServer.store(PigServer.java:816)
    at org.apache.pig.PigServer.openIterator(PigServer.java:728)
    at
    org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
    at
    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
    at org.apache.pig.Main.run(Main.java:465)
    at org.apache.pig.Main.main(Main.java:107)


    does anyone managed to get this up and running?
    i'm considering to rewrite the CassandraStorage.jar using Hector,
    Any thoughts about that?
  • Jeremy Hanna at Aug 1, 2011 at 2:55 pm
    It looks like you don't have the cassandra libraries in your classpath. Are you in the contrib/pig directory of the cassandra source and are you running bin/pig_cassandra? That is a script that puts everything you need from cassandra into the classpath. That would be the first thing to try, if you aren't using that script already. At first try it with the -x local flag too to make sure that it's not an issue with trying to distribute it out to your hadoop cluster.

    The other reason you might be getting that is that if you're running the script in mapreduce mode, it's going to try to distribute it out to your hadoop custer and your task trackers are going to need a couple of jars in their classpath as well. Take a look at http://wiki.apache.org/cassandra/HadoopSupport#ClusterConfig if you want to run it in mapreduce mode.
    On Aug 1, 2011, at 2:41 AM, Shai Harel wrote:

    thanks for the help, i'v tried to be conservative and i'm using pig 0.8 &
    cassandra 0.8
    and still getting this error

    Pig Stack Trace
    ---------------
    ERROR 2998: Unhandled internal error. Could not initialize class
    org.apache.cassandra.thrift.SliceRange

    java.lang.NoClassDefFoundError: Could not initialize class
    org.apache.cassandra.thrift.SliceRange
    at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
    Source)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
    at
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
    at org.apache.pig.PigServer.storeEx(PigServer.java:874)
    at org.apache.pig.PigServer.store(PigServer.java:816)
    at org.apache.pig.PigServer.openIterator(PigServer.java:728)
    at
    org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
    at
    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
    at org.apache.pig.Main.run(Main.java:465)
    at org.apache.pig.Main.main(Main.java:107)

    does anyone else have this problem?

    On Sun, Jul 31, 2011 at 2:04 PM, Jeremy Hanna wrote:

    Try following this and see if it helps getting started:
    https://github.com/jeromatron/pygmalion/wiki/Getting-Started

    I haven't tried it with 0.9 yet but I plan to this week. We use the
    CassandraStorage jar in production. If you can, validate your data with
    Cassandra's schema validators. CassandraStorage gets the schema from
    Cassandra and tries to unmarshal the data into Pig data types with the
    schema information.

    See if that helps.
    On Jul 31, 2011, at 9:48 AM, Shai Harel wrote:

    hey all, i'v been trying to query cassandra using my pig script,
    so i used the contrib jar from cassandra. and i'm getting the following
    error...
    some thrift failure err.... :|

    ERROR 2998: Unhandled internal error.
    org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V

    java.lang.NoSuchMethodError:
    org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
    at
    org.apache.cassandra.thrift.SliceRange.<clinit>(SliceRange.java:149)
    at
    org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
    Source)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
    at
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
    at org.apache.pig.PigServer.storeEx(PigServer.java:874)
    at org.apache.pig.PigServer.store(PigServer.java:816)
    at org.apache.pig.PigServer.openIterator(PigServer.java:728)
    at
    org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
    at
    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
    at org.apache.pig.Main.run(Main.java:465)
    at org.apache.pig.Main.main(Main.java:107)


    does anyone managed to get this up and running?
    i'm considering to rewrite the CassandraStorage.jar using Hector,
    Any thoughts about that?
  • Tamil Selvan at Sep 28, 2011 at 4:52 pm
    Hi,
    I'm trying to integrate pig with cassandra.
    My columnfamily in cassandra is
    name -> xxx
    Age -> yyy
    class -> zzz
    This is how I load data
    rows =LOAD 'cassandra://TestKeySpace/TestPig' USING CassandraStorage()
    as (key,columns:bag{column:tuple(name,value)});

    Now I wish to perform group by based on value of class. I tried

    col_values = FOREACH rows GENERATE (columns.value) as list:bag{};

    This gave me the result in following Schema :bag(:tuple(chararray))
    Ex: on dump col_values i got {(xxx),(yyy),(zzz)}

    Now if I try to access

    list = FOREACH col_values GENERATE (list.$0, list.$1);

    I'm getting undefined index access error. Like
    list.$1 doesn't exist :bag[:tuple(chararray)] has only one column [But
    there are 3]

    How can i access tuple wise data in such cases?
    I couldn't perform group by based on 1 column because of this.

    I tried TOTUPLE but the problem is, it converts the entire bag a tuple
    and applies group by on that.

    Help me out

    Regards,
    Tamil
  • Jeremy Hanna at Sep 28, 2011 at 5:04 pm
    It's been mentioned in this thread, but if you're using tabular (static column names) data, you might consider using Pygmalion. It will extract the values from Cassandra to simplify grouping by values and other operations.
    https://github.com/jeromatron/pygmalion
    What you'll want to look at is the FromCassandraBag udf, which has an example here:
    https://github.com/jeromatron/pygmalion/blob/master/scripts/from_to_cassandra_bag_example.pig

    Hope that helps - we use pygmalion 1.0.0 for all our scripts in production.
    On Sep 28, 2011, at 11:18 AM, Tamil Selvan wrote:

    Hi,
    I'm trying to integrate pig with cassandra.
    My columnfamily in cassandra is
    name -> xxx
    Age -> yyy
    class -> zzz
    This is how I load data
    rows =LOAD 'cassandra://TestKeySpace/TestPig' USING CassandraStorage()
    as (key,columns:bag{column:tuple(name,value)});

    Now I wish to perform group by based on value of class. I tried

    col_values = FOREACH rows GENERATE (columns.value) as list:bag{};

    This gave me the result in following Schema :bag(:tuple(chararray))
    Ex: on dump col_values i got {(xxx),(yyy),(zzz)}

    Now if I try to access

    list = FOREACH col_values GENERATE (list.$0, list.$1);

    I'm getting undefined index access error. Like
    list.$1 doesn't exist :bag[:tuple(chararray)] has only one column [But
    there are 3]

    How can i access tuple wise data in such cases?
    I couldn't perform group by based on 1 column because of this.

    I tried TOTUPLE but the problem is, it converts the entire bag a tuple
    and applies group by on that.

    Help me out

    Regards,
    Tamil

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedJul 31, '11 at 2:49p
activeSep 28, '11 at 5:04p
posts11
users3
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase