Hi folks,
We (but mostly Kevin Weil) just open-sourced some of the code we use
at Twitter to make working with Hadoop and Pig easier. Most of what is
currently included in "Elephant Bird" deals with generating
Input/Output formats for LZO-compressed protocol buffers, Pig
LoadFuncs and StoreFuncs for the same; there are also some handy
loaders for LZO-compressed stuff that is not probtobuf based.
The project is on github: http://github.com/kevinweil/elephant-bird/
Kevin presented on some of this at the Bay Area HUG recently:
http://www.slideshare.net/hadoopusergroup/twitter-protobufs-and-hadoop-hug-021709
Feedback, bug reports, and patches are welcome! Hope you find this useful.
-Dmitriy