|| at Apr 11, 2012 at 10:59 pm
I don't seem to be getting what I'm after. If my data looks like
I want to produce
I changed the LOAD statement to
mt = LOAD '/hrly_sub_smry/year_month_day=20120329/hour=04/*' USING
opt = foreach mt generate C_SUB_ID, FLATTEN(STRSPLIT(seg_ids,':')) as
I don't seem to be getting the cross product, just something like the
Any ideas ?
From: Norbert Burger
Sent: 06 April 2012 16:01
Subject: Re: "Exploding" a Hive array<string> in Pig from an RCFile
Malcolm -- typically, you'd use a STRSPLIT and optional FLATTEN to tokenize
a chararray on some delimeter. So the following should work:
opt = foreach mt generate C_SUB_ID, flatten(STRSPLIT(seg_ids,':')) as
On Thu, Apr 5, 2012 at 8:58 AM, Malcolm Tye
I'm storing data into a partitioned table using Hive in RCFile
format, but I want to use Pig to do the aggregation of that data.
In my array <string> in Hive, I have colon delimited data, E.g.
With the lateral view and explode functions in Hive, I can output each
value as a separate row.
In Pig, I think I need to use flatten, but it just outputs the array
as a single field, and I can't see where to specify that the delimiter
is the delimiter/value separator
register /opt/pig/trunk/bin/piggybank.jar mt = LOAD
opt = foreach mt generate C_SUB_ID, flatten(seg_ids) as s_seg_id; dump