Grokbase Groups Pig user January 2010
FAQ
I have a question on how to handle data that I would usually store in an
array, or into a normalized child table in a database. The input data
is a set of key/value pairs where one key can be associated with
multiple values (0 to n).

Here is a sample dataset with bucket being the multi value key:

family=sports,channel=baseball,timeframe=today,gender=M,bucket=12,bucket=27,bucket=32
family=sports,channel=baseball,timeframe=today,gender=M,bucket=12,bucket=27,bucket=32,bucket=54
family=events,channel=outdoor,timeframe=weekend,gender=F,bucket=13,bucket=27,bucket=32
family=events,channel=outdoor,timeframe=weekend,gender=F,bucket=13,bucket=27,bucket=32

What I am trying to calculate is a group count on
family,channel,timeframe and bucket, where the results would be:

(sports,baseball,today,12),2
(sports,baseball,today,27),2
(sports,baseball,today,32),2
(sports,baseball,today,54),1
(events,outdoor,weekend,13),2
(events,outdoor,weekend,27),2
(events,outdoor,weekend,32),2

One approach would seem to be to store the bucket values in a separate
relation and join using a segregate key created when reading the data
in. Something like:

A = (12345,sports,baseball,today,M)
B = (32,12345)(27,12345)(12,12345)

C = JOIN A by $0, B by $1;

D = GROUP C by (family,channel,timeframe,bucket)

I am sure this method would work, but it requires generating a
map/reduce friendly segregate key on which to join the data. Is there a
more direct way to do this in pig? Also, is it possible to load more
than one relation at a time (split the data between two relations) with
the LOAD statement?

Thanks,
Scott

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 3 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedJan 21, '10 at 1:38p
activeJan 25, '10 at 9:22p
posts3
users3
websitepig.apache.org

3 users in discussion

Jeff Zhang: 1 post Scott Kester: 1 post Scott: 1 post

People

Translate

site design / logo © 2021 Grokbase