Grokbase Groups Pig user April 2011
FAQ
Hi all,

I have a pig script that produces a complex nested data structure:

result: {child: chararray, childTraces: {action: int, time: long}, legacy:
{parent: chararray, parentTraces: {action: int, time: long}}}

I would like to post-process the output of the pig script with a mapreduce
job.
In the mapreduce job I would like to do some nested for and iterate over the
bags.

Do you have any advice on which would be the simplest way to store pig's
output in order not to have to write my own parser in mapreduce?
I thought about using JSON but it looks like there is no JSON store format
for tuples yet (I know elephantbird can store maps, but I would need to
convert my result to a nested map, which is a bit unnatural).
Avro is not an easy option on the hadoop side.

Any help would be highly appreciated.

Thanks,
--
Gianmarco De Francisci Morales

Search Discussions

  • Harsh J at Apr 15, 2011 at 6:05 pm
    Hey Gianmarco,
    On Fri, Apr 15, 2011 at 5:00 PM, Gianmarco wrote:
    Avro is not an easy option on the hadoop side.
    Am just a little curious on this, could you explain why you feel so
    about Avro on M/R?

    --
    Harsh J
  • Gianmarco at Apr 18, 2011 at 2:00 pm
    Last time I checked Avro on MR the integration was not yet ready.

    I see that there was a very recent release of code for it
    http://www.tomslabs.com/
    but I don't know how stable and tested this code is.

    Has anyone had good experiences with Avro on MR?

    Cheers,
    --
    Gianmarco De Francisci Morales

    On Fri, Apr 15, 2011 at 20:04, Harsh J wrote:

    Hey Gianmarco,
    On Fri, Apr 15, 2011 at 5:00 PM, Gianmarco wrote:
    Avro is not an easy option on the hadoop side.
    Am just a little curious on this, could you explain why you feel so
    about Avro on M/R?

    --
    Harsh J
  • Andrew Hammond at May 25, 2011 at 5:29 pm
    Bump. I'm very interested in combining Avro and Pig and would greatly
    appreciate hearing about people's experiences with them.
    On Mon, Apr 18, 2011 at 2:23 AM, Gianmarco wrote:

    Last time I checked Avro on MR the integration was not yet ready.

    I see that there was a very recent release of code for it
    http://www.tomslabs.com/
    but I don't know how stable and tested this code is.

    Has anyone had good experiences with Avro on MR?

    Cheers,
    --
    Gianmarco De Francisci Morales

    On Fri, Apr 15, 2011 at 20:04, Harsh J wrote:

    Hey Gianmarco,

    On Fri, Apr 15, 2011 at 5:00 PM, Gianmarco <gianmarco.dfm@gmail.com>
    wrote:
    Avro is not an easy option on the hadoop side.
    Am just a little curious on this, could you explain why you feel so
    about Avro on M/R?

    --
    Harsh J
  • Dmitriy Ryaboy at May 25, 2011 at 5:32 pm
    You can use Thrift or Protobufs using elephant-bird.

    D

    On Wed, May 25, 2011 at 10:29 AM, Andrew Hammond
    wrote:
    Bump. I'm very interested in combining Avro and Pig and would greatly
    appreciate hearing about people's experiences with them.
    On Mon, Apr 18, 2011 at 2:23 AM, Gianmarco wrote:

    Last time I checked Avro on MR the integration was not yet ready.

    I see that there was a very recent release of code for it
    http://www.tomslabs.com/
    but I don't know how stable and tested this code is.

    Has anyone had good experiences with Avro on MR?

    Cheers,
    --
    Gianmarco De Francisci Morales

    On Fri, Apr 15, 2011 at 20:04, Harsh J wrote:

    Hey Gianmarco,

    On Fri, Apr 15, 2011 at 5:00 PM, Gianmarco <gianmarco.dfm@gmail.com>
    wrote:
    Avro is not an easy option on the hadoop side.
    Am just a little curious on this, could you explain why you feel so
    about Avro on M/R?

    --
    Harsh J

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedApr 15, '11 at 4:19p
activeMay 25, '11 at 5:32p
posts5
users4
websitepig.apache.org

People

Translate

site design / logo © 2022 Grokbase