FAQ
Hi,

This might be a dumb question. Is it possible to pass anything other than
the input tuple to a UDF Eval function?

Basically in my UDF, I need to do some user info lookup. So the input will
be:

(userid,f1,f2)

with this UDF, I want to convert it to something like

(userid,age,gender,location,f1,f2)

where in the UDF I do a DB lookup on the userid and returns user's info
(age, gender, etc). But I don't necessarily want to pass back the same user
info fields, e.g. sometimes I only want age.

I hope there is a way for me to tell the UDF that I only want "age", and
sometimes "age, location", etc.

What's the best way to achieve this without having to write a separate UDF
for every case?

Thanks.
Dexin

Search Discussions

  • Zach Bailey at Dec 7, 2010 at 7:48 pm
    You can pass parameters via the UDF constructor. For example:


    public MyUDF(boolean includeAge, boolean includeGender)


    then you would initialize it like so in your pig script:


    define MY_UDF_ONLY_AGE com.package.MyUDF(true, false)


    and use it like:


    data_with_age = FOREACH data GENERATE user_id, MY_UDF_ONLY_AGE(user_id);


    HTH,
    Zach

    On Tuesday, December 7, 2010 at 2:44 PM, Dexin Wang wrote:

    Hi,

    This might be a dumb question. Is it possible to pass anything other than
    the input tuple to a UDF Eval function?

    Basically in my UDF, I need to do some user info lookup. So the input will
    be:

    (userid,f1,f2)

    with this UDF, I want to convert it to something like

    (userid,age,gender,location,f1,f2)

    where in the UDF I do a DB lookup on the userid and returns user's info
    (age, gender, etc). But I don't necessarily want to pass back the same user
    info fields, e.g. sometimes I only want age.

    I hope there is a way for me to tell the UDF that I only want "age", and
    sometimes "age, location", etc.

    What's the best way to achieve this without having to write a separate UDF
    for every case?

    Thanks.
    Dexin


  • Dexin Wang at Dec 7, 2010 at 8:09 pm
    ah nice. Thank you so much Zach!
    On Tue, Dec 7, 2010 at 11:47 AM, Zach Bailey wrote:


    You can pass parameters via the UDF constructor. For example:


    public MyUDF(boolean includeAge, boolean includeGender)


    then you would initialize it like so in your pig script:


    define MY_UDF_ONLY_AGE com.package.MyUDF(true, false)


    and use it like:


    data_with_age = FOREACH data GENERATE user_id, MY_UDF_ONLY_AGE(user_id);


    HTH,
    Zach

    On Tuesday, December 7, 2010 at 2:44 PM, Dexin Wang wrote:

    Hi,

    This might be a dumb question. Is it possible to pass anything other than
    the input tuple to a UDF Eval function?

    Basically in my UDF, I need to do some user info lookup. So the input will
    be:

    (userid,f1,f2)

    with this UDF, I want to convert it to something like

    (userid,age,gender,location,f1,f2)

    where in the UDF I do a DB lookup on the userid and returns user's info
    (age, gender, etc). But I don't necessarily want to pass back the same user
    info fields, e.g. sometimes I only want age.

    I hope there is a way for me to tell the UDF that I only want "age", and
    sometimes "age, location", etc.

    What's the best way to achieve this without having to write a separate UDF
    for every case?

    Thanks.
    Dexin


Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedDec 7, '10 at 7:44p
activeDec 7, '10 at 8:09p
posts3
users2
websitepig.apache.org

2 users in discussion

Dexin Wang: 2 posts Zach Bailey: 1 post

People

Translate

site design / logo © 2021 Grokbase