Grokbase Groups Pig user January 2011

On Sat, Jan 29, 2011 at 5:42 PM, Alex McLintock
I wonder if discussion of the Piggybank and other User Defined Fields is
best done here (since it is *using* Pig) or on the Development list (because
it is enhancing Pig).

I'm trying to load some Json into pig using the UDF which
Kim Vogt posted about back in September. (It isn't in Piggybank AFAICS)

The class works for me - mostly....

This works when the Json is just a single level

{"field1": "value1", "field2": "value2", "field3": "value3"}

But doesn't seem to work when the json is nested

{"field1": "value1", "field2": "value2", {"field4": "value4", "field5":
"value5", "field6": "value6"}, "field3": "value3"}
The json-simple library for Java will build the entire JSON
representation as a JSONObject, which is _exactly_ what you need. This
is a Java Map-like class which would contain your structure properly.
What remains is to properly convert this to a Pig-acceptable Map

But what's happening in Vogt's code (and also Elephant-Bird's
LzoJsonLoader from which it was sourced) is that the Map is
down-converted to a simple Key-Value mapping instead of a Map
containing another Map. This was done due to a limitation in Pig
0.6.0, where the Map type could not hold complex types in it -- as
noted in the latter class's javadoc [1].

This limitation has gone away in 0.7.0+ I think (As the Pig Map spec
now supports <String, {Atom, Tuple, Bag, Map}>, so you can feel free
to change/get rid of the iteration inside parseStringToTuple(...) to
not 'flatten' the Map.

Additionally I think the json-simple dependency can perhaps be removed
in favor of Jackson Core/Mapper libraries that are now being shipped
by Hadoop itself (eliminating an extra JAR). Pig does not ship the
json-simple library along. But you may want to be careful about the
version of Jackson Core/Mapper in place inside your Hadoop. There are
much more recent updates of it available with benefits.

Perhaps, if you feel like, you can contribute your change back to
elephant-bird [2]. I think they're open to newer-Pig related changes.

[1] -
[2] -

Harsh J

Search Discussions

Discussion Posts


Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 5 of 5 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedJan 29, '11 at 12:13p
activeJan 30, '11 at 10:24p



site design / logo © 2021 Grokbase