Grokbase Groups Pig user April 2011
FAQ
Hi,

is it possible to create an aggregating function with 2 parameters one
of which is bag and another one is not?

In particular, i want to use that to work around lack of function
invocation instance configuration.

say I have a function that can aggregate over some period of history ,
say, aggregate (days, sampleBag).

sampleBag is a bag of tuples in a form (value,time). So i want to use
it multiple times in the same script to aggregate exponentially over
30 days and another invocation instance to aggregate the same bag over
7 days. Exponential scale depends on this time parameter. So i want to
use it in something like

B = foreach A generate agregate(30,sampleBag) as 30daysAggregate,
aggregate(7,sampleBag) as 7daysAggregate.

Question 1 -- is it even valid format for a function implementing Algebraic?
Question 2 -- would i be also able to use Accumulator interface ?

If not, how can I parameterize invocations? I know udf manual says i
really can't so if the above is the way it is, it would really be
very, very sad. I would really hate to create versions such as
aggregate7days(bag),aggregate30days(bag)...?

Thanks.
-d

Search Discussions

  • Jonathan Coveney at Apr 16, 2011 at 1:31 am
    1) You absolutely can do what you want to do. Literally just make it the
    second input, and in your script you'll have something like...
    DataBag inbag = (DataBag)input.get(0); //the input
    Whatever thing = (Whatever)input.get(1); //and so on

    But beyond that, you can pass parameters to the constructor as so

    DEFINE func mypackage.myfunc(parameter);

    So you could also instantiate 2 versions.

    2011/4/15 Dmitriy Lyubimov <dlyubimov@apache.org>
    Hi,

    is it possible to create an aggregating function with 2 parameters one
    of which is bag and another one is not?

    In particular, i want to use that to work around lack of function
    invocation instance configuration.

    say I have a function that can aggregate over some period of history ,
    say, aggregate (days, sampleBag).

    sampleBag is a bag of tuples in a form (value,time). So i want to use
    it multiple times in the same script to aggregate exponentially over
    30 days and another invocation instance to aggregate the same bag over
    7 days. Exponential scale depends on this time parameter. So i want to
    use it in something like

    B = foreach A generate agregate(30,sampleBag) as 30daysAggregate,
    aggregate(7,sampleBag) as 7daysAggregate.

    Question 1 -- is it even valid format for a function implementing
    Algebraic?
    Question 2 -- would i be also able to use Accumulator interface ?

    If not, how can I parameterize invocations? I know udf manual says i
    really can't so if the above is the way it is, it would really be
    very, very sad. I would really hate to create versions such as
    aggregate7days(bag),aggregate30days(bag)...?

    Thanks.
    -d
  • Dmitriy Lyubimov at Apr 16, 2011 at 1:50 am
    DEFINE func mypackage.myfunc(parameter);
    Thanks! this is so cool. Holy grail, literary. i think this was not
    available at least in 0.6? Since when is this available for eval
    funcs?

    So you could also instantiate 2 versions.

    2011/4/15 Dmitriy Lyubimov <dlyubimov@apache.org>
    Hi,

    is it possible to create an aggregating function with 2 parameters one
    of which is bag and another one is not?

    In particular, i want to use that to work around lack of function
    invocation instance configuration.

    say I have a function that can aggregate over some period of history ,
    say, aggregate (days, sampleBag).

    sampleBag is a bag of tuples in a form (value,time). So i want to use
    it multiple times in the same script to aggregate exponentially over
    30 days and another invocation instance to aggregate the same bag over
    7 days. Exponential scale depends on this time parameter. So i want to
    use it in something like

    B = foreach A generate agregate(30,sampleBag) as 30daysAggregate,
    aggregate(7,sampleBag) as 7daysAggregate.

    Question 1 -- is it even valid format for a function implementing
    Algebraic?
    Question 2 -- would i be also able to use Accumulator interface ?

    If not, how can I parameterize invocations? I know udf manual says i
    really can't so if the above is the way it is, it would really be
    very, very sad. I would really hate to create versions such as
    aggregate7days(bag),aggregate30days(bag)...?

    Thanks.
    -d
  • Dmitriy Ryaboy at Apr 17, 2011 at 7:07 pm
    It's in 0.6, just not well documented :)

    -----Original Message-----
    From: "Dmitriy Lyubimov" <dlieu.7@gmail.com>
    To: user@pig.apache.org
    Cc: "Jonathan Coveney" <jcoveney@gmail.com>
    Sent: 4/15/2011 6:50 PM
    Subject: Re: Algebraic UDF with one bag and one non-bag parameter
    DEFINE func mypackage.myfunc(parameter);
    Thanks! this is so cool. Holy grail, literary. i think this was not
    available at least in 0.6? Since when is this available for eval
    funcs?

    So you could also instantiate 2 versions.

    2011/4/15 Dmitriy Lyubimov <dlyubimov@apache.org>
    Hi,

    is it possible to create an aggregating function with 2 parameters one
    of which is bag and another one is not?

    In particular, i want to use that to work around lack of function
    invocation instance configuration.

    say I have a function that can aggregate over some period of history ,
    say, aggregate (days, sampleBag).

    sampleBag is a bag of tuples in a form (value,time). So i want to use
    it multiple times in the same script to aggregate exponentially over
    30 days and another invocation instance to aggregate the same bag over
    7 days. Exponential scale depends on this time parameter. So i want to
    use it in something like
    [truncated by sender]

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedApr 16, '11 at 1:27a
activeApr 17, '11 at 7:07p
posts4
users4
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase