Grokbase Groups Pig user March 2010
FAQ
Hi folks,
We (but mostly Kevin Weil) just open-sourced some of the code we use at
Twitter to make working with Hadoop and Pig easier. Most of what is
currently included in "Elephant Bird" deals with generating Input/Output
formats for LZO-compressed protocol buffers, Pig LoadFuncs and StoreFuncs
for the same; there are also some handy loaders for LZO-compressed stuff
that is not probtobuf based.

The project is on github: http://github.com/kevinweil/elephant-bird/

Kevin presented on some of this at at HUG recently:
http://www.slideshare.net/hadoopusergroup/twitter-protobufs-and-hadoop-hug-021709

Feedback, bug reports, and patches are welcome! Hope you find this useful.

-Dmitriy

Search Discussions

  • Rohan Rai at Mar 30, 2010 at 3:43 am
    Hey...

    I am so excited seeing this...
    I am at the edge of my seat...
    I cant even wait to see what it is...
    So just looking and hoping for a heads up..
    Is this the same thing for which people with requirement
    of compressed format and compatibility with pig were waiting for...

    Regards
    Rohan

    Dmitriy Ryaboy wrote:
    Hi folks,
    We (but mostly Kevin Weil) just open-sourced some of the code we use at
    Twitter to make working with Hadoop and Pig easier. Most of what is
    currently included in "Elephant Bird" deals with generating Input/Output
    formats for LZO-compressed protocol buffers, Pig LoadFuncs and StoreFuncs
    for the same; there are also some handy loaders for LZO-compressed stuff
    that is not probtobuf based.

    The project is on github: http://github.com/kevinweil/elephant-bird/

    Kevin presented on some of this at at HUG recently:
    http://www.slideshare.net/hadoopusergroup/twitter-protobufs-and-hadoop-hug-021709

    Feedback, bug reports, and patches are welcome! Hope you find this useful.

    -Dmitriy
    .

    The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system. The firm is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.
  • Dmitriy Ryaboy at Mar 30, 2010 at 7:01 am
    Rohan,
    Yes. I think. Let us know if it is not.

    -Dmitriy
    On Mon, Mar 29, 2010 at 8:42 PM, Rohan Rai wrote:

    Hey...

    I am so excited seeing this...
    I am at the edge of my seat...
    I cant even wait to see what it is...
    So just looking and hoping for a heads up..
    Is this the same thing for which people with requirement
    of compressed format and compatibility with pig were waiting for...

    Regards
    Rohan

    Dmitriy Ryaboy wrote:
    Hi folks,
    We (but mostly Kevin Weil) just open-sourced some of the code we use at
    Twitter to make working with Hadoop and Pig easier. Most of what is
    currently included in "Elephant Bird" deals with generating Input/Output
    formats for LZO-compressed protocol buffers, Pig LoadFuncs and StoreFuncs
    for the same; there are also some handy loaders for LZO-compressed stuff
    that is not probtobuf based.

    The project is on github: http://github.com/kevinweil/elephant-bird/

    Kevin presented on some of this at at HUG recently:

    http://www.slideshare.net/hadoopusergroup/twitter-protobufs-and-hadoop-hug-021709

    Feedback, bug reports, and patches are welcome! Hope you find this useful.

    -Dmitriy
    .

    The information contained in this communication is intended solely for the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally privileged
    information. If you are not the intended recipient you are hereby notified
    that any disclosure, copying, distribution or taking any action in reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please notify us
    immediately by responding to this email and then delete it from your system.
    The firm is neither liable for the proper and complete transmission of the
    information contained in this communication nor for any delay in its
    receipt.
  • 김영우 at Mar 30, 2010 at 5:11 am
    Awesome!

    Thank you all contributors.

    -Youngwoo

    2010/3/30 Dmitriy Ryaboy <dvryaboy@gmail.com>
    Hi folks,
    We (but mostly Kevin Weil) just open-sourced some of the code we use at
    Twitter to make working with Hadoop and Pig easier. Most of what is
    currently included in "Elephant Bird" deals with generating Input/Output
    formats for LZO-compressed protocol buffers, Pig LoadFuncs and StoreFuncs
    for the same; there are also some handy loaders for LZO-compressed stuff
    that is not probtobuf based.

    The project is on github: http://github.com/kevinweil/elephant-bird/

    Kevin presented on some of this at at HUG recently:

    http://www.slideshare.net/hadoopusergroup/twitter-protobufs-and-hadoop-hug-021709

    Feedback, bug reports, and patches are welcome! Hope you find this useful.

    -Dmitriy
  • Alan Gates at Mar 31, 2010 at 4:30 pm
    I added a link to this on http://wiki.apache.org/pig/PigTools

    Alan.
    On Mar 29, 2010, at 2:51 PM, Dmitriy Ryaboy wrote:

    Hi folks,
    We (but mostly Kevin Weil) just open-sourced some of the code we use
    at
    Twitter to make working with Hadoop and Pig easier. Most of what is
    currently included in "Elephant Bird" deals with generating Input/
    Output
    formats for LZO-compressed protocol buffers, Pig LoadFuncs and
    StoreFuncs
    for the same; there are also some handy loaders for LZO-compressed
    stuff
    that is not probtobuf based.

    The project is on github: http://github.com/kevinweil/elephant-bird/

    Kevin presented on some of this at at HUG recently:
    http://www.slideshare.net/hadoopusergroup/twitter-protobufs-and-hadoop-hug-021709

    Feedback, bug reports, and patches are welcome! Hope you find this
    useful.

    -Dmitriy
  • Jr at Apr 1, 2010 at 10:09 am
    Hi Dmitriy,
    does this require protobuf 2.3? I'm trying to build it on fedora and it
    fails, i think it's because only 2.2 is available on fedora.
    Best regards,
    Johannes

    Am Montag, den 29.03.2010, 14:51 -0700 schrieb Dmitriy Ryaboy:
    Hi folks,
    We (but mostly Kevin Weil) just open-sourced some of the code we use at
    Twitter to make working with Hadoop and Pig easier. Most of what is
    currently included in "Elephant Bird" deals with generating Input/Output
    formats for LZO-compressed protocol buffers, Pig LoadFuncs and StoreFuncs
    for the same; there are also some handy loaders for LZO-compressed stuff
    that is not probtobuf based.

    The project is on github: http://github.com/kevinweil/elephant-bird/

    Kevin presented on some of this at at HUG recently:
    http://www.slideshare.net/hadoopusergroup/twitter-protobufs-and-hadoop-hug-021709

    Feedback, bug reports, and patches are welcome! Hope you find this useful.

    -Dmitriy
  • Kevin Weil at Apr 1, 2010 at 2:38 pm
    Johannes, it does require protobuf 2.3. All of the inputformats, pig
    loaders, etc will themselves work on earlier versions of the protobuf
    library (we began on 2.2), but the protobuf codegen uses 2.3's new compiler
    plugin API. If you don't need that, you should be able to use 2.2 with a
    little hand editing.

    HTH,
    Kevin
    On Thu, Apr 1, 2010 at 3:08 AM, jr wrote:

    Hi Dmitriy,
    does this require protobuf 2.3? I'm trying to build it on fedora and it
    fails, i think it's because only 2.2 is available on fedora.
    Best regards,
    Johannes

    Am Montag, den 29.03.2010, 14:51 -0700 schrieb Dmitriy Ryaboy:
    Hi folks,
    We (but mostly Kevin Weil) just open-sourced some of the code we use at
    Twitter to make working with Hadoop and Pig easier. Most of what is
    currently included in "Elephant Bird" deals with generating Input/Output
    formats for LZO-compressed protocol buffers, Pig LoadFuncs and StoreFuncs
    for the same; there are also some handy loaders for LZO-compressed stuff
    that is not probtobuf based.

    The project is on github: http://github.com/kevinweil/elephant-bird/

    Kevin presented on some of this at at HUG recently:
    http://www.slideshare.net/hadoopusergroup/twitter-protobufs-and-hadoop-hug-021709
    Feedback, bug reports, and patches are welcome! Hope you find this useful.
    -Dmitriy
  • Jr at Apr 1, 2010 at 2:59 pm
    Hello Kevin,
    thanks a lot, since i really only need the pig loaders i'll go for hand
    editing :)
    Johannes

    Am Donnerstag, den 01.04.2010, 07:37 -0700 schrieb Kevin Weil:
    Johannes, it does require protobuf 2.3. All of the inputformats, pig
    loaders, etc will themselves work on earlier versions of the protobuf
    library (we began on 2.2), but the protobuf codegen uses 2.3's new compiler
    plugin API. If you don't need that, you should be able to use 2.2 with a
    little hand editing.

    HTH,
    Kevin
    On Thu, Apr 1, 2010 at 3:08 AM, jr wrote:

    Hi Dmitriy,
    does this require protobuf 2.3? I'm trying to build it on fedora and it
    fails, i think it's because only 2.2 is available on fedora.
    Best regards,
    Johannes

    Am Montag, den 29.03.2010, 14:51 -0700 schrieb Dmitriy Ryaboy:
    Hi folks,
    We (but mostly Kevin Weil) just open-sourced some of the code we use at
    Twitter to make working with Hadoop and Pig easier. Most of what is
    currently included in "Elephant Bird" deals with generating Input/Output
    formats for LZO-compressed protocol buffers, Pig LoadFuncs and StoreFuncs
    for the same; there are also some handy loaders for LZO-compressed stuff
    that is not probtobuf based.

    The project is on github: http://github.com/kevinweil/elephant-bird/

    Kevin presented on some of this at at HUG recently:
    http://www.slideshare.net/hadoopusergroup/twitter-protobufs-and-hadoop-hug-021709
    Feedback, bug reports, and patches are welcome! Hope you find this useful.
    -Dmitriy
  • Kevin Weil at Apr 1, 2010 at 3:01 pm
    Johannes,

    If you want to commit a patch to the build file with a "no-protobuf" target,
    please do and send me a github pull request. I bet you aren't the only one
    who will want this.

    Thanks,
    Kevin
    On Thu, Apr 1, 2010 at 7:58 AM, jr wrote:

    Hello Kevin,
    thanks a lot, since i really only need the pig loaders i'll go for hand
    editing :)
    Johannes

    Am Donnerstag, den 01.04.2010, 07:37 -0700 schrieb Kevin Weil:
    Johannes, it does require protobuf 2.3. All of the inputformats, pig
    loaders, etc will themselves work on earlier versions of the protobuf
    library (we began on 2.2), but the protobuf codegen uses 2.3's new compiler
    plugin API. If you don't need that, you should be able to use 2.2 with a
    little hand editing.

    HTH,
    Kevin

    On Thu, Apr 1, 2010 at 3:08 AM, jr <johannes.russek@io-consulting.net
    wrote:
    Hi Dmitriy,
    does this require protobuf 2.3? I'm trying to build it on fedora and it
    fails, i think it's because only 2.2 is available on fedora.
    Best regards,
    Johannes

    Am Montag, den 29.03.2010, 14:51 -0700 schrieb Dmitriy Ryaboy:
    Hi folks,
    We (but mostly Kevin Weil) just open-sourced some of the code we use
    at
    Twitter to make working with Hadoop and Pig easier. Most of what is
    currently included in "Elephant Bird" deals with generating
    Input/Output
    formats for LZO-compressed protocol buffers, Pig LoadFuncs and
    StoreFuncs
    for the same; there are also some handy loaders for LZO-compressed
    stuff
    that is not probtobuf based.

    The project is on github: http://github.com/kevinweil/elephant-bird/

    Kevin presented on some of this at at HUG recently:
    http://www.slideshare.net/hadoopusergroup/twitter-protobufs-and-hadoop-hug-021709
    Feedback, bug reports, and patches are welcome! Hope you find this useful.
    -Dmitriy
  • Jr at Apr 1, 2010 at 4:20 pm
    Hello Kevin,
    I hope it's alright if i reply about this off the list since i don't
    think it'd be helpful for now.
    I'm trying to figure out what has to be compiled how, and first thing i
    found is the package com.twitter.data.proto.BlockStorage
    i can only find the javadoc for this and a few .java files importing
    from that package.
    is that being generated by protobuf? unfortunately i'm not familiar with
    protobuf at all, so i'm not even sure how to generate that package at
    all :)
    What do i need to generate/get that?
    Johannes

    Am Donnerstag, den 01.04.2010, 07:37 -0700 schrieb Kevin Weil:
    Johannes, it does require protobuf 2.3. All of the inputformats, pig
    loaders, etc will themselves work on earlier versions of the protobuf
    library (we began on 2.2), but the protobuf codegen uses 2.3's new compiler
    plugin API. If you don't need that, you should be able to use 2.2 with a
    little hand editing.

    HTH,
    Kevin
    On Thu, Apr 1, 2010 at 3:08 AM, jr wrote:

    Hi Dmitriy,
    does this require protobuf 2.3? I'm trying to build it on fedora and it
    fails, i think it's because only 2.2 is available on fedora.
    Best regards,
    Johannes

    Am Montag, den 29.03.2010, 14:51 -0700 schrieb Dmitriy Ryaboy:
    Hi folks,
    We (but mostly Kevin Weil) just open-sourced some of the code we use at
    Twitter to make working with Hadoop and Pig easier. Most of what is
    currently included in "Elephant Bird" deals with generating Input/Output
    formats for LZO-compressed protocol buffers, Pig LoadFuncs and StoreFuncs
    for the same; there are also some handy loaders for LZO-compressed stuff
    that is not probtobuf based.

    The project is on github: http://github.com/kevinweil/elephant-bird/

    Kevin presented on some of this at at HUG recently:
    http://www.slideshare.net/hadoopusergroup/twitter-protobufs-and-hadoop-hug-021709
    Feedback, bug reports, and patches are welcome! Hope you find this useful.
    -Dmitriy
  • Jr at Apr 1, 2010 at 4:31 pm
    sorry for the post everyone!
    johannes
  • Dmitriy Ryaboy at Apr 7, 2010 at 7:32 pm
    ElephantBird now also contains UDFs for dynamically invoking (a subset of)
    Java functions that operate on basic classes like Integers, Doubles,
    Strings, etc, without having to write custom UDFs every time. This will be
    native to Pig 0.8, but for now you can use the same functionality in 0.5+ by
    including the elephant-bird jar.
    From the javadoc:
    ----------------------------------------

    This UDF allows one to dynamically invoke Java methods that return a T

    Usage of the Invoker family of UDFs (adjust as appropriate):

    -- invoking a static method
    DEFINE StringToLong InvokeForLong('java.lang.Long.valueOf', 'String')
    longs = FOREACH strings GENERATE StringToLong(some_chararray);

    -- invoking a method on an object
    DEFINE StringConcat InvokeForString('java.lang.String.concat',
    'String String', 'false')
    concatenations = FOREACH strings GENERATE StringConcat(str1, str2);

    The first argument to the constructor is the full path to desired method.
    The second argument is a list of classes of the method parameters.
    If the method is not static, the first element in this list is the object to
    invoke the method on.
    The third argument is the keyword "static" (or "true") to signify that the
    method is static.
    The third argument is optional, and true by default.

    ----------------------------------------

    -Dmitriy



    On Mon, Mar 29, 2010 at 2:51 PM, Dmitriy Ryaboy wrote:

    Hi folks,
    We (but mostly Kevin Weil) just open-sourced some of the code we use at
    Twitter to make working with Hadoop and Pig easier. Most of what is
    currently included in "Elephant Bird" deals with generating Input/Output
    formats for LZO-compressed protocol buffers, Pig LoadFuncs and StoreFuncs
    for the same; there are also some handy loaders for LZO-compressed stuff
    that is not probtobuf based.

    The project is on github: http://github.com/kevinweil/elephant-bird/

    Kevin presented on some of this at at HUG recently:
    http://www.slideshare.net/hadoopusergroup/twitter-protobufs-and-hadoop-hug-021709

    Feedback, bug reports, and patches are welcome! Hope you find this useful.

    -Dmitriy

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedMar 29, '10 at 9:51p
activeApr 7, '10 at 7:32p
posts12
users6
websitepig.apache.org

People

Translate

site design / logo © 2022 Grokbase