Grokbase Groups Pig user January 2011
FAQ
Hello,

On Sat, Jan 29, 2011 at 5:42 PM, Alex McLintock
wrote:
I wonder if discussion of the Piggybank and other User Defined Fields is
best done here (since it is *using* Pig) or on the Development list (because
it is enhancing Pig).

I'm trying to load some Json into pig using the PigJsonLoader.java UDF which
Kim Vogt posted about back in September. (It isn't in Piggybank AFAICS)
https://gist.github.com/601331


The class works for me - mostly....


This works when the Json is just a single level

{"field1": "value1", "field2": "value2", "field3": "value3"}

But doesn't seem to work when the json is nested

{"field1": "value1", "field2": "value2", {"field4": "value4", "field5":
"value5", "field6": "value6"}, "field3": "value3"}
The json-simple library for Java will build the entire JSON
representation as a JSONObject, which is _exactly_ what you need. This
is a Java Map-like class which would contain your structure properly.
What remains is to properly convert this to a Pig-acceptable Map
structure.

But what's happening in Vogt's code (and also Elephant-Bird's
LzoJsonLoader from which it was sourced) is that the Map is
down-converted to a simple Key-Value mapping instead of a Map
containing another Map. This was done due to a limitation in Pig
0.6.0, where the Map type could not hold complex types in it -- as
noted in the latter class's javadoc [1].

This limitation has gone away in 0.7.0+ I think (As the Pig Map spec
now supports <String, {Atom, Tuple, Bag, Map}>, so you can feel free
to change/get rid of the iteration inside parseStringToTuple(...) to
not 'flatten' the Map.

Additionally I think the json-simple dependency can perhaps be removed
in favor of Jackson Core/Mapper libraries that are now being shipped
by Hadoop itself (eliminating an extra JAR). Pig does not ship the
json-simple library along. But you may want to be careful about the
version of Jackson Core/Mapper in place inside your Hadoop. There are
much more recent updates of it available with benefits.

Perhaps, if you feel like, you can contribute your change back to
elephant-bird [2]. I think they're open to newer-Pig related changes.

[1] - https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/load/LzoJsonLoader.java
[2] - https://github.com/kevinweil/elephant-bird

--
Harsh J
www.harshj.com

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 5 of 5 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedJan 29, '11 at 12:13p
activeJan 30, '11 at 10:24p
posts5
users3
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase