Grokbase Groups Pig user April 2011

is it possible to create an aggregating function with 2 parameters one
of which is bag and another one is not?

In particular, i want to use that to work around lack of function
invocation instance configuration.

say I have a function that can aggregate over some period of history ,
say, aggregate (days, sampleBag).

sampleBag is a bag of tuples in a form (value,time). So i want to use
it multiple times in the same script to aggregate exponentially over
30 days and another invocation instance to aggregate the same bag over
7 days. Exponential scale depends on this time parameter. So i want to
use it in something like

B = foreach A generate agregate(30,sampleBag) as 30daysAggregate,
aggregate(7,sampleBag) as 7daysAggregate.

Question 1 -- is it even valid format for a function implementing Algebraic?
Question 2 -- would i be also able to use Accumulator interface ?

If not, how can I parameterize invocations? I know udf manual says i
really can't so if the above is the way it is, it would really be
very, very sad. I would really hate to create versions such as


Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 4 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedApr 16, '11 at 1:27a
activeApr 17, '11 at 7:07p



site design / logo © 2021 Grokbase