I have written a generalized load func for nested Json - but hit a wall.
Not sure how to access the nested data once in pig for something like the
following:
Original JSON:
{"body":[{"token":"foo2","hash":"-33333333333"},{"token":"bar2","hash":"-22222222222"}],"pmessgid":"559830","subject":[{"token":"fooo","hash":"111111"},{"token":"bar","hash":"999999"}],"userid":"77274","messageid":"559837","threadid":"104997"}
Dump of tuple.toString() in to system out from my LoadFunc (after generating
the tuple from a custom load func - a recursive json walking mechanism that
generates nested maps and tuples)
([body#([token#foo2,hash#-33333333333],[token#bar2,hash#-22222222222]),subject#([token#fooo,hash#111111],[token#bar,hash#999999]),userid#77274,messageid#559837,threadid#104997,pmessgid#559830])
So far so good, I can produce the right data structure in code, and when I
dump it via the toString() it looks good!
**** My problem ->
So here is the schema in the example above:
Map<String,Object> where Object is either a list of tuple of
Map<String,String>s OR just a String.
In my pig script, I can get this far:
A = LOAD '/jivepoc/jivecommunity/dbsqoop/usermessages-clean-features2' USING
com.proximal.pig.tools.JSONLoader() as (
json: map[]
);
If I don't qualify the map[] above, i can select an item from the map (say
'body') and it says:
certain_keys = FOREACH A GENERATE json#'body' AS b;
DESCRIBE certain_keys;
certain_keys: {b: bytearray}
Looks good, it is a bytearray if i don't further define what i have, but now
I'm stuck -> I need to load a much more detailed map[].
Problem is the map[] (as pointed out above) can contain either a String or
a Map<String,String>
There is no typecasting right? I'm I missing something, or am I stuck??
Thanks
Lance
Additional info:
Code to do this: