Grokbase Groups Pig user January 2011
I have written a generalized load func for nested Json - but hit a wall.

Not sure how to access the nested data once in pig for something like the

Original JSON:


Dump of tuple.toString() in to system out from my LoadFunc (after generating
the tuple from a custom load func - a recursive json walking mechanism that
generates nested maps and tuples)


So far so good, I can produce the right data structure in code, and when I
dump it via the toString() it looks good!

**** My problem ->

So here is the schema in the example above:
Map<String,Object> where Object is either a list of tuple of
Map<String,String>s OR just a String.

In my pig script, I can get this far:
A = LOAD '/jivepoc/jivecommunity/dbsqoop/usermessages-clean-features2' USING as (
json: map[]

If I don't qualify the map[] above, i can select an item from the map (say
'body') and it says:

certain_keys = FOREACH A GENERATE json#'body' AS b;
DESCRIBE certain_keys;
certain_keys: {b: bytearray}

Looks good, it is a bytearray if i don't further define what i have, but now
I'm stuck -> I need to load a much more detailed map[].
Problem is the map[] (as pointed out above) can contain either a String or
a Map<String,String>

There is no typecasting right? I'm I missing something, or am I stuck??


Additional info:

Code to do this:

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 2 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedJan 15, '11 at 12:13a
activeJan 18, '11 at 8:00p

2 users in discussion

Daniel Dai: 1 post Lance Riedel: 1 post



site design / logo © 2021 Grokbase