Grokbase Groups Pig user January 2011
FAQ
Currently, we treat all map value as bytearray. However, if you project
the map value later in the script, you have chance to cast the map
value. Eg:

a = load '1.json' using JSONLoader() as (m:map[]);
b = foreach a generate (map[])m#'key' as v;
c = foreach b generate (long)v;

But you cannot cast the map as a whole. This will be addressed in 0.9 as
we are introducing a typed map.

Daniel

Lance Riedel wrote:
I have written a generalized load func for nested Json - but hit a wall.

Not sure how to access the nested data once in pig for something like the
following:

Original JSON:

{"body":[{"token":"foo2","hash":"-33333333333"},{"token":"bar2","hash":"-22222222222"}],"pmessgid":"559830","subject":[{"token":"fooo","hash":"111111"},{"token":"bar","hash":"999999"}],"userid":"77274","messageid":"559837","threadid":"104997"}


Dump of tuple.toString() in to system out from my LoadFunc (after generating
the tuple from a custom load func - a recursive json walking mechanism that
generates nested maps and tuples)

([body#([token#foo2,hash#-33333333333],[token#bar2,hash#-22222222222]),subject#([token#fooo,hash#111111],[token#bar,hash#999999]),userid#77274,messageid#559837,threadid#104997,pmessgid#559830])


So far so good, I can produce the right data structure in code, and when I
dump it via the toString() it looks good!

**** My problem ->

So here is the schema in the example above:
Map<String,Object> where Object is either a list of tuple of
Map<String,String>s OR just a String.


In my pig script, I can get this far:
A = LOAD '/jivepoc/jivecommunity/dbsqoop/usermessages-clean-features2' USING
com.proximal.pig.tools.JSONLoader() as (
json: map[]
);

If I don't qualify the map[] above, i can select an item from the map (say
'body') and it says:

certain_keys = FOREACH A GENERATE json#'body' AS b;
DESCRIBE certain_keys;
certain_keys: {b: bytearray}

Looks good, it is a bytearray if i don't further define what i have, but now
I'm stuck -> I need to load a much more detailed map[].
Problem is the map[] (as pointed out above) can contain either a String or
a Map<String,String>

There is no typecasting right? I'm I missing something, or am I stuck??

Thanks
Lance



Additional info:

Code to do this:

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 2 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedJan 15, '11 at 12:13a
activeJan 18, '11 at 8:00p
posts2
users2
websitepig.apache.org

2 users in discussion

Daniel Dai: 1 post Lance Riedel: 1 post

People

Translate

site design / logo © 2021 Grokbase