Grokbase Groups Pig user January 2011

It's a hack (sort of) but here's how I always do it. Since parsing json
in java will put you in an insane asylum:

Write a map only wukong script that parses the json as you want it. See
the example here:

then use the STREAM operator to stream your raw records (load them as
chararrays first) through your wukong script. It's not perfect but it
gets the job done.


On Sat, 2011-01-29 at 12:12 +0000, Alex McLintock wrote:
I wonder if discussion of the Piggybank and other User Defined Fields is
best done here (since it is *using* Pig) or on the Development list (because
it is enhancing Pig).

I'm trying to load some Json into pig using the UDF which
Kim Vogt posted about back in September. (It isn't in Piggybank AFAICS)

The class works for me - mostly....

This works when the Json is just a single level

{"field1": "value1", "field2": "value2", "field3": "value3"}

But doesn't seem to work when the json is nested

{"field1": "value1", "field2": "value2", {"field4": "value4", "field5":
"value5", "field6": "value6"}, "field3": "value3"}

Has anyone got this working? I can't see how the existing code deals with
parseStringToTuple only creates a single Map. There is no recursion I can

Any suggestions?

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 5 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedJan 29, '11 at 12:13p
activeJan 30, '11 at 10:24p



site design / logo © 2021 Grokbase