Grokbase Groups Pig user April 2011
FAQ
Hi all,

A little while back, I started a project called pygmalion for example scripts and UDFs for people using Pig with Cassandra. Currently there are a few handy UDFs in there like:

FromCassandraBag: a way to convert from what Cassandra returns (key:chararray, columns:bag {column:tuple (name, value)}) to something more tabular (key, value1, value2, value3). You specify the values you want to project - it's good for tabular data.
ToCassandraBag: a way to convert from (key, value1, value2, value3) to what Cassandra expects when writing - (key:chararray, columns:bag {column:tuple (name, value)}) - the column names are extracted from the variable names in the Pig script.
Both contributed by Jacob Perkins with slight revisions by Jeremy Hanna

StringConcat: probably something everyone implements but instead of CONCAT that only does two strings, it does any number of strings.

GenerateTimeUUID: a udf that generates a time uuid with or without a time to base it on.

https://github.com/jeromatron/pygmalion/

It definitely needs more work and examples, but I've been using the UDFs in there for a while with Cassandra 0.7.5 (previously 0.7-branch). Now that 0.7.5 is released, I'd just like to let people know about it if they would like to contribute or even just use it.

Search Discussions

  • Jonathan Ellis at Apr 27, 2011 at 7:40 pm
    Nice!

    On Wed, Apr 27, 2011 at 1:57 PM, Jeremy Hanna
    wrote:
    Hi all,

    A little while back, I started a project called pygmalion for example scripts and UDFs for people using Pig with Cassandra.  Currently there are a few handy UDFs in there like:

    FromCassandraBag: a way to convert from what Cassandra returns (key:chararray, columns:bag {column:tuple (name, value)}) to something more tabular (key, value1, value2, value3).  You specify the values you want to project - it's good for tabular data.
    ToCassandraBag: a way to convert from (key, value1, value2, value3) to what Cassandra expects when writing - (key:chararray, columns:bag {column:tuple (name, value)}) - the column names are extracted from the variable names in the Pig script.
    Both contributed by Jacob Perkins with slight revisions by Jeremy Hanna

    StringConcat: probably something everyone implements but instead of CONCAT that only does two strings, it does any number of strings.

    GenerateTimeUUID: a udf that generates a time uuid with or without a time to base it on.

    https://github.com/jeromatron/pygmalion/

    It definitely needs more work and examples, but I've been using the UDFs in there for a while with Cassandra 0.7.5 (previously 0.7-branch).  Now that 0.7.5 is released, I'd just like to let people know about it if they would like to contribute or even just use it.


    --
    Jonathan Ellis
    Project Chair, Apache Cassandra
    co-founder of DataStax, the source for professional Cassandra support
    http://www.datastax.com
  • Bill Graham at Apr 27, 2011 at 9:54 pm
    Very cool.

    FYI there's a StringConcat in pig like you describe that you can use like this:

    define concat org.apache.pig.builtin.StringConcat();

    Reference JIRA:
    https://issues.apache.org/jira/browse/PIG-1420

    On Wed, Apr 27, 2011 at 12:31 PM, Jonathan Ellis wrote:
    Nice!

    On Wed, Apr 27, 2011 at 1:57 PM, Jeremy Hanna
    wrote:
    Hi all,

    A little while back, I started a project called pygmalion for example scripts and UDFs for people using Pig with Cassandra.  Currently there are a few handy UDFs in there like:

    FromCassandraBag: a way to convert from what Cassandra returns (key:chararray, columns:bag {column:tuple (name, value)}) to something more tabular (key, value1, value2, value3).  You specify the values you want to project - it's good for tabular data.
    ToCassandraBag: a way to convert from (key, value1, value2, value3) to what Cassandra expects when writing - (key:chararray, columns:bag {column:tuple (name, value)}) - the column names are extracted from the variable names in the Pig script.
    Both contributed by Jacob Perkins with slight revisions by Jeremy Hanna

    StringConcat: probably something everyone implements but instead of CONCAT that only does two strings, it does any number of strings.

    GenerateTimeUUID: a udf that generates a time uuid with or without a time to base it on.

    https://github.com/jeromatron/pygmalion/

    It definitely needs more work and examples, but I've been using the UDFs in there for a while with Cassandra 0.7.5 (previously 0.7-branch).  Now that 0.7.5 is released, I'd just like to let people know about it if they would like to contribute or even just use it.


    --
    Jonathan Ellis
    Project Chair, Apache Cassandra
    co-founder of DataStax, the source for professional Cassandra support
    http://www.datastax.com
  • Jeremy Hanna at Apr 27, 2011 at 10:00 pm

    On Apr 27, 2011, at 4:53 PM, Bill Graham wrote:

    Very cool.

    FYI there's a StringConcat in pig like you describe that you can use like this:

    define concat org.apache.pig.builtin.StringConcat();

    Reference JIRA:
    https://issues.apache.org/jira/browse/PIG-1420
    Oh cool - gtk, thanks Bill!
    On Wed, Apr 27, 2011 at 12:31 PM, Jonathan Ellis wrote:
    Nice!

    On Wed, Apr 27, 2011 at 1:57 PM, Jeremy Hanna
    wrote:
    Hi all,

    A little while back, I started a project called pygmalion for example scripts and UDFs for people using Pig with Cassandra. Currently there are a few handy UDFs in there like:

    FromCassandraBag: a way to convert from what Cassandra returns (key:chararray, columns:bag {column:tuple (name, value)}) to something more tabular (key, value1, value2, value3). You specify the values you want to project - it's good for tabular data.
    ToCassandraBag: a way to convert from (key, value1, value2, value3) to what Cassandra expects when writing - (key:chararray, columns:bag {column:tuple (name, value)}) - the column names are extracted from the variable names in the Pig script.
    Both contributed by Jacob Perkins with slight revisions by Jeremy Hanna

    StringConcat: probably something everyone implements but instead of CONCAT that only does two strings, it does any number of strings.

    GenerateTimeUUID: a udf that generates a time uuid with or without a time to base it on.

    https://github.com/jeromatron/pygmalion/

    It definitely needs more work and examples, but I've been using the UDFs in there for a while with Cassandra 0.7.5 (previously 0.7-branch). Now that 0.7.5 is released, I'd just like to let people know about it if they would like to contribute or even just use it.


    --
    Jonathan Ellis
    Project Chair, Apache Cassandra
    co-founder of DataStax, the source for professional Cassandra support
    http://www.datastax.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedApr 27, '11 at 6:57p
activeApr 27, '11 at 10:00p
posts4
users3
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase