FAQ
Hi,

Have following queries while going through types func spec.


a) What does MATCHES on two bytearrays mean ? Spec says it is supported
without any comment.

b) Multiplication/Division between bag/tuple and primitives - says it is
not implemented, but what is the expectation when it does get done ?
Apply to individual fields recursively ?

c) What does CONCAT of two bytearrays mean ? Just combining both arrays
into a new larger array through array copies ? (I am assuming this is
what concat of chararray does)

d) For aggregate functions MIN and MAX, can we provide our own
comparator (udf or otherwise) for the chararrays - to define what the
relative ordering is - like using Collators, instead of always assuming
lexicographical ordering (I assume this is what it uses by default ) ?


e) In the argument construction in function section - is the semantic
change applicable only to arthematic operations ? Only to aggregate udfs
? Or to all udfs ?

What happens in this case :

employee = LOAD 'employee' AS (name, salary, bonus_multiplier);
grouped = GROUP employee BY name;
total_compensation = FOREACH grouped {
T1 = employee.salary;
T2 = employee.bonus_multiplier);
GENERATE group, myUDF(T1 * T2) --- error ?
}
Similarly, for GENERATE group, myUDF(T1, T2) above ?




Thanks,
Mridul

Search Discussions

  • Mridul Muralidharan at Feb 9, 2009 at 8:24 pm
    Hi all,

    To answer some of my questions below for general audience, based on doc
    Olga mentioned -
    http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm
    (someone should update spec with this, way more informative !) ... could
    not find something which explained the others though.


    Regards,
    Mridul


    Mridul Muralidharan wrote:
    Hi,

    Have following queries while going through types func spec.


    a) What does MATCHES on two bytearrays mean ? Spec says it is supported
    without any comment.

    Though not explicitly specified, my feeling is that it is gettig casted
    to chararray.

    b) Multiplication/Division between bag/tuple and primitives - says it is
    not implemented, but what is the expectation when it does get done ?
    Apply to individual fields recursively ?

    c) What does CONCAT of two bytearrays mean ? Just combining both arrays
    into a new larger array through array copies ? (I am assuming this is
    what concat of chararray does)
    New array with concat'ed contents from prev two bytearrays ... imo, use
    with caution since it is rude concat on binary blobs.
    d) For aggregate functions MIN and MAX, can we provide our own
    comparator (udf or otherwise) for the chararrays - to define what the
    relative ordering is - like using Collators, instead of always assuming
    lexicographical ordering (I assume this is what it uses by default ) ?


    e) In the argument construction in function section - is the semantic
    change applicable only to arthematic operations ? Only to aggregate udfs
    ? Or to all udfs ?

    What happens in this case :

    employee = LOAD 'employee' AS (name, salary, bonus_multiplier);
    grouped = GROUP employee BY name;
    total_compensation = FOREACH grouped {
    T1 = employee.salary;
    T2 = employee.bonus_multiplier);
    GENERATE group, myUDF(T1 * T2) --- error ?
    }
    Similarly, for GENERATE group, myUDF(T1, T2) above ?




    Thanks,
    Mridul
  • Olga Natkovich at Feb 9, 2009 at 9:45 pm
    Could you please summarize the list of question that you feel are not
    adequately covered in the document so we can address them.

    Thanks,

    Olga
    -----Original Message-----
    From: Mridul Muralidharan
    Sent: Monday, February 09, 2009 12:23 PM
    To: pig-user@hadoop.apache.org
    Subject: Re: Pig 2.0 operators


    Hi all,

    To answer some of my questions below for general audience,
    based on doc Olga mentioned -
    http://wiki.apache.org/pig-data/attachments/FrontPage/attachme
    nts/plrm.htm
    (someone should update spec with this, way more informative
    !) ... could not find something which explained the others though.


    Regards,
    Mridul


    Mridul Muralidharan wrote:
    Hi,

    Have following queries while going through types func spec.


    a) What does MATCHES on two bytearrays mean ? Spec says it is
    supported without any comment.

    Though not explicitly specified, my feeling is that it is
    gettig casted to chararray.

    b) Multiplication/Division between bag/tuple and primitives
    - says it is
    not implemented, but what is the expectation when it does
    get done ?
    Apply to individual fields recursively ?

    c) What does CONCAT of two bytearrays mean ? Just combining
    both arrays
    into a new larger array through array copies ? (I am
    assuming this is
    what concat of chararray does)
    New array with concat'ed contents from prev two bytearrays
    ... imo, use
    with caution since it is rude concat on binary blobs.
    d) For aggregate functions MIN and MAX, can we provide our own
    comparator (udf or otherwise) for the chararrays - to
    define what the
    relative ordering is - like using Collators, instead of
    always assuming
    lexicographical ordering (I assume this is what it uses by
    default ) ?

    e) In the argument construction in function section - is
    the semantic
    change applicable only to arthematic operations ? Only to
    aggregate udfs
    ? Or to all udfs ?

    What happens in this case :

    employee = LOAD 'employee' AS (name, salary, bonus_multiplier);
    grouped = GROUP employee BY name;
    total_compensation = FOREACH grouped {
    T1 = employee.salary;
    T2 = employee.bonus_multiplier);
    GENERATE group, myUDF(T1 * T2) --- error ?
    }
    Similarly, for GENERATE group, myUDF(T1, T2) above ?




    Thanks,
    Mridul
  • Mridul Muralidharan at Feb 9, 2009 at 9:49 pm
    All questions below and in other mails where there were no responses
    (from me or others ?).

    Thanks,
    Mridul

    Olga Natkovich wrote:
    Could you please summarize the list of question that you feel are not
    adequately covered in the document so we can address them.

    Thanks,

    Olga
    -----Original Message-----
    From: Mridul Muralidharan
    Sent: Monday, February 09, 2009 12:23 PM
    To: pig-user@hadoop.apache.org
    Subject: Re: Pig 2.0 operators


    Hi all,

    To answer some of my questions below for general audience,
    based on doc Olga mentioned -
    http://wiki.apache.org/pig-data/attachments/FrontPage/attachme
    nts/plrm.htm
    (someone should update spec with this, way more informative
    !) ... could not find something which explained the others though.


    Regards,
    Mridul


    Mridul Muralidharan wrote:
    Hi,

    Have following queries while going through types func spec.


    a) What does MATCHES on two bytearrays mean ? Spec says it is
    supported without any comment.
    Though not explicitly specified, my feeling is that it is
    gettig casted to chararray.

    b) Multiplication/Division between bag/tuple and primitives
    - says it is
    not implemented, but what is the expectation when it does
    get done ?
    Apply to individual fields recursively ?

    c) What does CONCAT of two bytearrays mean ? Just combining
    both arrays
    into a new larger array through array copies ? (I am
    assuming this is
    what concat of chararray does)
    New array with concat'ed contents from prev two bytearrays
    ... imo, use
    with caution since it is rude concat on binary blobs.
    d) For aggregate functions MIN and MAX, can we provide our own
    comparator (udf or otherwise) for the chararrays - to
    define what the
    relative ordering is - like using Collators, instead of
    always assuming
    lexicographical ordering (I assume this is what it uses by
    default ) ?
    e) In the argument construction in function section - is
    the semantic
    change applicable only to arthematic operations ? Only to
    aggregate udfs
    ? Or to all udfs ?

    What happens in this case :

    employee = LOAD 'employee' AS (name, salary, bonus_multiplier);
    grouped = GROUP employee BY name;
    total_compensation = FOREACH grouped {
    T1 = employee.salary;
    T2 = employee.bonus_multiplier);
    GENERATE group, myUDF(T1 * T2) --- error ?
    }
    Similarly, for GENERATE group, myUDF(T1, T2) above ?




    Thanks,
    Mridul
  • Olga Natkovich at Feb 9, 2009 at 10:16 pm
    It would be good to have one list with all the questions that
    documentation did not clarify for you. I am hoping it addressed more
    than just NULL issues.

    Olga
    -----Original Message-----
    From: Mridul Muralidharan
    Sent: Monday, February 09, 2009 1:48 PM
    To: pig-user@hadoop.apache.org
    Subject: Re: Pig 2.0 operators


    All questions below and in other mails where there were no
    responses (from me or others ?).

    Thanks,
    Mridul

    Olga Natkovich wrote:
    Could you please summarize the list of question that you
    feel are not
    adequately covered in the document so we can address them.

    Thanks,

    Olga
    -----Original Message-----
    From: Mridul Muralidharan
    Sent: Monday, February 09, 2009 12:23 PM
    To: pig-user@hadoop.apache.org
    Subject: Re: Pig 2.0 operators


    Hi all,

    To answer some of my questions below for general audience,
    based on
    doc Olga mentioned -
    http://wiki.apache.org/pig-data/attachments/FrontPage/attachme
    nts/plrm.htm
    (someone should update spec with this, way more informative
    !) ... could not find something which explained the others though.


    Regards,
    Mridul


    Mridul Muralidharan wrote:
    Hi,

    Have following queries while going through types func spec.


    a) What does MATCHES on two bytearrays mean ? Spec says it is
    supported without any comment.
    Though not explicitly specified, my feeling is that it is gettig
    casted to chararray.

    b) Multiplication/Division between bag/tuple and primitives
    - says it is
    not implemented, but what is the expectation when it does
    get done ?
    Apply to individual fields recursively ?

    c) What does CONCAT of two bytearrays mean ? Just combining
    both arrays
    into a new larger array through array copies ? (I am
    assuming this is
    what concat of chararray does)
    New array with concat'ed contents from prev two bytearrays
    ... imo,
    use with caution since it is rude concat on binary blobs.
    d) For aggregate functions MIN and MAX, can we provide our own
    comparator (udf or otherwise) for the chararrays - to
    define what the
    relative ordering is - like using Collators, instead of
    always assuming
    lexicographical ordering (I assume this is what it uses by
    default ) ?
    e) In the argument construction in function section - is
    the semantic
    change applicable only to arthematic operations ? Only to
    aggregate udfs
    ? Or to all udfs ?

    What happens in this case :

    employee = LOAD 'employee' AS (name, salary, bonus_multiplier);
    grouped = GROUP employee BY name; total_compensation = FOREACH
    grouped {
    T1 = employee.salary;
    T2 = employee.bonus_multiplier);
    GENERATE group, myUDF(T1 * T2) --- error ?
    }
    Similarly, for GENERATE group, myUDF(T1, T2) above ?




    Thanks,
    Mridul
  • Mridul Muralidharan at Feb 9, 2009 at 10:27 pm
    Sure.
    I am still going through the 50 odd udfs and the pig scripts we have to
    see what is involved in porting them.
    If there are no immediate suggestions/comments for the q's I raised, I
    will send out a more comprehensive list with those too included later on.


    Regards,
    Mridul

    Olga Natkovich wrote:
    It would be good to have one list with all the questions that
    documentation did not clarify for you. I am hoping it addressed more
    than just NULL issues.

    Olga
    -----Original Message-----
    From: Mridul Muralidharan
    Sent: Monday, February 09, 2009 1:48 PM
    To: pig-user@hadoop.apache.org
    Subject: Re: Pig 2.0 operators


    All questions below and in other mails where there were no
    responses (from me or others ?).

    Thanks,
    Mridul

    Olga Natkovich wrote:
    Could you please summarize the list of question that you
    feel are not
    adequately covered in the document so we can address them.

    Thanks,

    Olga
    -----Original Message-----
    From: Mridul Muralidharan
    Sent: Monday, February 09, 2009 12:23 PM
    To: pig-user@hadoop.apache.org
    Subject: Re: Pig 2.0 operators


    Hi all,

    To answer some of my questions below for general audience,
    based on
    doc Olga mentioned -
    http://wiki.apache.org/pig-data/attachments/FrontPage/attachme
    nts/plrm.htm
    (someone should update spec with this, way more informative
    !) ... could not find something which explained the others though.


    Regards,
    Mridul


    Mridul Muralidharan wrote:
    Hi,

    Have following queries while going through types func spec.


    a) What does MATCHES on two bytearrays mean ? Spec says it is
    supported without any comment.
    Though not explicitly specified, my feeling is that it is gettig
    casted to chararray.

    b) Multiplication/Division between bag/tuple and primitives
    - says it is
    not implemented, but what is the expectation when it does
    get done ?
    Apply to individual fields recursively ?

    c) What does CONCAT of two bytearrays mean ? Just combining
    both arrays
    into a new larger array through array copies ? (I am
    assuming this is
    what concat of chararray does)
    New array with concat'ed contents from prev two bytearrays
    ... imo,
    use with caution since it is rude concat on binary blobs.
    d) For aggregate functions MIN and MAX, can we provide our own
    comparator (udf or otherwise) for the chararrays - to
    define what the
    relative ordering is - like using Collators, instead of
    always assuming
    lexicographical ordering (I assume this is what it uses by
    default ) ?
    e) In the argument construction in function section - is
    the semantic
    change applicable only to arthematic operations ? Only to
    aggregate udfs
    ? Or to all udfs ?

    What happens in this case :

    employee = LOAD 'employee' AS (name, salary, bonus_multiplier);
    grouped = GROUP employee BY name; total_compensation = FOREACH
    grouped {
    T1 = employee.salary;
    T2 = employee.bonus_multiplier);
    GENERATE group, myUDF(T1 * T2) --- error ?
    }
    Similarly, for GENERATE group, myUDF(T1, T2) above ?




    Thanks,
    Mridul

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedFeb 9, '09 at 11:11a
activeFeb 9, '09 at 10:27p
posts6
users2
websitepig.apache.org

People

Translate

site design / logo © 2022 Grokbase