FAQ
Hi all,

I have the follwoing data file

(1L,2L,3L)
(4L,2L,1L)
(8L,3L,4L)

I am trying to write a UDF (like sum) that would add the fields in Tuple. This works --

public class SumAll extends EvalFunc<Long> {
public Long exec(Tuple input) {
try {
return sum(input);
} catch (NumberFormatException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ExecException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return 0L;
}

static protected Long sum(Tuple input) throws ExecException, NumberFormatException {
long sum = 0;

List<Object> values = input.getAll();
for (Iterator<Object> it = values.iterator(); it.hasNext();) {
Tuple t = (Tuple)it.next();
sum += (Long)t.get(0);
sum += (Long)t.get(1);
sum += (Long)t.get(2);
}
return sum;
}

}

grunt> A = LOAD 'data2' as aa:bytearray;
grunt> C = FOREACH A GENERATE UDF.SumAll((tuple(long,long,long))aa);
grunt> dump C;
2009-11-05 10:07:09,266 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully stored result in: "file:/tmp/temp1206478472/tmp-577036369"
2009-11-05 10:07:09,267 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records written : 3
2009-11-05 10:07:09,267 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written : 0
2009-11-05 10:07:09,267 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
2009-11-05 10:07:09,267 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
(6L)
(7L)
(15L)
grunt>

Initially I thought that such a loop would work

static protected Long sum(Tuple input) throws ExecException, NumberFormatException {
long sum = 0;

List<Object> values = input.getAll(); // Would give all fields in Tuple??
for (Iterator<Object> it = values.iterator(); it.hasNext();) {
sum += (Long)t;
}
return sum;
}

But I get an error that Tuple can't be cast back to Long. So my question is that what is input.getAll() returning? What is the structure of data that gets passed to exec function?

Thanks!

Search Discussions

  • Jeff Zhang at Nov 5, 2009 at 2:14 pm
    The input is the arguments you provide to your UDF. It is tuple type. Tuple
    can have more than more than one element. That means your UDF can have more
    than one argument. Here you provide one argument which is tuple type to
    your UDF.
    So that means the first element of input is a tuple.


    Jeff Zhang

    On Thu, Nov 5, 2009 at 2:23 AM, Kelvin Moss wrote:

    Hi all,

    I have the follwoing data file

    (1L,2L,3L)
    (4L,2L,1L)
    (8L,3L,4L)

    I am trying to write a UDF (like sum) that would add the fields in Tuple.
    This works --

    public class SumAll extends EvalFunc<Long> {
    public Long exec(Tuple input) {
    try {
    return sum(input);
    } catch (NumberFormatException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    } catch (ExecException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    }
    return 0L;
    }

    static protected Long sum(Tuple input) throws ExecException,
    NumberFormatException {
    long sum = 0;

    List<Object> values = input.getAll();
    for (Iterator<Object> it = values.iterator(); it.hasNext();) {
    Tuple t = (Tuple)it.next();
    sum += (Long)t.get(0);
    sum += (Long)t.get(1);
    sum += (Long)t.get(2);
    }
    return sum;
    }

    }

    grunt> A = LOAD 'data2' as aa:bytearray;
    grunt> C = FOREACH A GENERATE UDF.SumAll((tuple(long,long,long))aa);
    grunt> dump C;
    2009-11-05 10:07:09,266 [main] INFO
    org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully
    stored result in: "file:/tmp/temp1206478472/tmp-577036369"
    2009-11-05 10:07:09,267 [main] INFO
    org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records
    written : 3
    2009-11-05 10:07:09,267 [main] INFO
    org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes
    written : 0
    2009-11-05 10:07:09,267 [main] INFO
    org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100%
    complete!
    2009-11-05 10:07:09,267 [main] INFO
    org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
    (6L)
    (7L)
    (15L)
    grunt>

    Initially I thought that such a loop would work

    static protected Long sum(Tuple input) throws ExecException,
    NumberFormatException {
    long sum = 0;

    List<Object> values = input.getAll(); // Would give all fields in Tuple??
    for (Iterator<Object> it = values.iterator(); it.hasNext();) {
    sum += (Long)t;
    }
    return sum;
    }

    But I get an error that Tuple can't be cast back to Long. So my question is
    that what is input.getAll() returning? What is the structure of data that
    gets passed to exec function?

    Thanks!

  • Kelvin Moss at Nov 6, 2009 at 4:16 am
    Thanks for the reply. I understand that Tuple can have more than one field. That is why I was expecting Tuple.getAll to return me all the fields in the Tuple. But as it turns out it returns a Tuple.  That made me think that may be Tuple.getAll returns all the Tuples in the Tuple, but a Tuple like this is not valid, right?

    ((1,2,3),(4,5,6))

    It should be enlosed in a bag like {(1,2,3),(4,5,6)}. Or, may be I am confusing things?

    Thanks!

    --- On Thu, 11/5/09, Jeff Zhang wrote:


    From: Jeff Zhang <zjffdu@gmail.com>
    Subject: Re: Accessing fields in Tuple
    To: pig-user@hadoop.apache.org
    Date: Thursday, November 5, 2009, 7:44 PM


    The input is the arguments you provide to your UDF. It is tuple type.  Tuple
    can have more than more than one element. That means your UDF can have more
    than one argument.  Here you provide one argument which is tuple type to
    your UDF.
    So that means the first element of input is a tuple.


    Jeff Zhang

    On Thu, Nov 5, 2009 at 2:23 AM, Kelvin Moss wrote:

    Hi all,

    I have the follwoing data file

    (1L,2L,3L)
    (4L,2L,1L)
    (8L,3L,4L)

    I am trying to write a UDF (like sum) that would add the fields in Tuple.
    This works --

    public class SumAll extends EvalFunc<Long> {
    public Long exec(Tuple input) {
    try {
    return sum(input);
    } catch (NumberFormatException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    } catch (ExecException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    }
    return 0L;
    }

    static protected Long sum(Tuple input) throws ExecException,
    NumberFormatException {
    long sum = 0;

    List<Object> values = input.getAll();
    for (Iterator<Object> it = values.iterator(); it.hasNext();) {
    Tuple t = (Tuple)it.next();
    sum += (Long)t.get(0);
    sum += (Long)t.get(1);
    sum += (Long)t.get(2);
    }
    return sum;
    }

    }

    grunt> A = LOAD 'data2' as aa:bytearray;
    grunt> C = FOREACH A GENERATE UDF.SumAll((tuple(long,long,long))aa);
    grunt> dump C;
    2009-11-05 10:07:09,266 [main] INFO
    org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully
    stored result in: "file:/tmp/temp1206478472/tmp-577036369"
    2009-11-05 10:07:09,267 [main] INFO
    org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records
    written : 3
    2009-11-05 10:07:09,267 [main] INFO
    org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes
    written : 0
    2009-11-05 10:07:09,267 [main] INFO
    org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100%
    complete!
    2009-11-05 10:07:09,267 [main] INFO
    org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
    (6L)
    (7L)
    (15L)
    grunt>

    Initially I thought that such a loop would work

    static protected Long sum(Tuple input) throws ExecException,
    NumberFormatException {
    long sum = 0;

    List<Object> values = input.getAll(); // Would give all fields in Tuple??
    for (Iterator<Object> it = values.iterator(); it.hasNext();) {
    sum += (Long)t;
    }
    return sum;
    }

    But I get an error that Tuple can't be cast back to Long. So my question is
    that what is input.getAll() returning? What is the structure of data that
    gets passed to exec function?

    Thanks!

  • Thejas Nair at Nov 7, 2009 at 11:54 pm
    Hi Kevin,

    The inputs parameters to the udf are wrapped inside a tuple and then given
    as input to the execu function in the udf.
    In case of -
    grunt> C = FOREACH A GENERATE UDF.SumAll((tuple(long,long,long))aa);
    The exec function gets a Tuple with one column which is a
    tuple(long,long,long)
    ie in exec(Tuple input), input.get(0) will return tuple(long,long,long) .

    On the other hand if you called the udf this way -
    grunt> C = FOREACH A GENERATE UDF.SumAll((long)a1,(chararray)a2);
    in exec(Tuple input), input.get(0) will return long, input.get(1) will
    return chararray.

    I hope this answers you question.

    Thanks,
    Thejas



    On 11/5/09 9:15 PM, "Kelvin Moss" wrote:


    Thanks for the reply. I understand that Tuple can have more than one field.
    That is why I was expecting Tuple.getAll to return me all the fields in the
    Tuple. But as it turns out it returns a Tuple.  That made me think that may be
    Tuple.getAll returns all the Tuples in the Tuple, but a Tuple like this is not
    valid, right?

    ((1,2,3),(4,5,6))

    It should be enlosed in a bag like {(1,2,3),(4,5,6)}. Or, may be I am
    confusing things?

    Thanks!

    --- On Thu, 11/5/09, Jeff Zhang wrote:


    From: Jeff Zhang <zjffdu@gmail.com>
    Subject: Re: Accessing fields in Tuple
    To: pig-user@hadoop.apache.org
    Date: Thursday, November 5, 2009, 7:44 PM


    The input is the arguments you provide to your UDF. It is tuple type.  Tuple
    can have more than more than one element. That means your UDF can have more
    than one argument.  Here you provide one argument which is tuple type to
    your UDF.
    So that means the first element of input is a tuple.


    Jeff Zhang

    On Thu, Nov 5, 2009 at 2:23 AM, Kelvin Moss wrote:

    Hi all,

    I have the follwoing data file

    (1L,2L,3L)
    (4L,2L,1L)
    (8L,3L,4L)

    I am trying to write a UDF (like sum) that would add the fields in Tuple.
    This works --

    public class SumAll extends EvalFunc<Long> {
    public Long exec(Tuple input) {
    try {
    return sum(input);
    } catch (NumberFormatException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    } catch (ExecException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    }
    return 0L;
    }

    static protected Long sum(Tuple input) throws ExecException,
    NumberFormatException {
    long sum = 0;

    List<Object> values = input.getAll();
    for (Iterator<Object> it = values.iterator(); it.hasNext();) {
    Tuple t = (Tuple)it.next();
    sum += (Long)t.get(0);
    sum += (Long)t.get(1);
    sum += (Long)t.get(2);
    }
    return sum;
    }

    }

    grunt> A = LOAD 'data2' as aa:bytearray;
    grunt> C = FOREACH A GENERATE UDF.SumAll((tuple(long,long,long))aa);
    grunt> dump C;
    2009-11-05 10:07:09,266 [main] INFO
    org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully
    stored result in: "file:/tmp/temp1206478472/tmp-577036369"
    2009-11-05 10:07:09,267 [main] INFO
    org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records
    written : 3
    2009-11-05 10:07:09,267 [main] INFO
    org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes
    written : 0
    2009-11-05 10:07:09,267 [main] INFO
    org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100%
    complete!
    2009-11-05 10:07:09,267 [main] INFO
    org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
    (6L)
    (7L)
    (15L)
    grunt>

    Initially I thought that such a loop would work

    static protected Long sum(Tuple input) throws ExecException,
    NumberFormatException {
    long sum = 0;

    List<Object> values = input.getAll(); // Would give all fields in Tuple??
    for (Iterator<Object> it = values.iterator(); it.hasNext();) {
    sum += (Long)t;
    }
    return sum;
    }

    But I get an error that Tuple can't be cast back to Long. So my question is
    that what is input.getAll() returning? What is the structure of data that
    gets passed to exec function?

    Thanks!

  • Mridul Muralidharan at Nov 8, 2009 at 2:26 pm
    Hi Kevin,

    With tuple's and bag's, you can have arbitrary levels of
    nesting/composition.
    That is, a tuple can contain other tuples/bag, and the tuples within a
    bag can contain other tuples/bags.


    As Thejas explained - the input to a udf is always a tuple - so whatever
    parameter you passed in - would be wrapped in a tuple and sent across.


    You probably want to just use :

    myUdf($0, $1, $2) and so on, instead of forcing input to be within
    another tuple.

    Hope this helps.
    Regards,
    Mridul


    Kelvin Moss wrote:
    Thanks for the reply. I understand that Tuple can have more than one field. That is why I was expecting Tuple.getAll to return me all the fields in the Tuple. But as it turns out it returns a Tuple. That made me think that may be Tuple.getAll returns all the Tuples in the Tuple, but a Tuple like this is not valid, right?

    ((1,2,3),(4,5,6))

    It should be enlosed in a bag like {(1,2,3),(4,5,6)}. Or, may be I am confusing things?

    Thanks!

    --- On Thu, 11/5/09, Jeff Zhang wrote:


    From: Jeff Zhang <zjffdu@gmail.com>
    Subject: Re: Accessing fields in Tuple
    To: pig-user@hadoop.apache.org
    Date: Thursday, November 5, 2009, 7:44 PM


    The input is the arguments you provide to your UDF. It is tuple type. Tuple
    can have more than more than one element. That means your UDF can have more
    than one argument. Here you provide one argument which is tuple type to
    your UDF.
    So that means the first element of input is a tuple.


    Jeff Zhang

    On Thu, Nov 5, 2009 at 2:23 AM, Kelvin Moss wrote:

    Hi all,

    I have the follwoing data file

    (1L,2L,3L)
    (4L,2L,1L)
    (8L,3L,4L)

    I am trying to write a UDF (like sum) that would add the fields in Tuple.
    This works --

    public class SumAll extends EvalFunc<Long> {
    public Long exec(Tuple input) {
    try {
    return sum(input);
    } catch (NumberFormatException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    } catch (ExecException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    }
    return 0L;
    }

    static protected Long sum(Tuple input) throws ExecException,
    NumberFormatException {
    long sum = 0;

    List<Object> values = input.getAll();
    for (Iterator<Object> it = values.iterator(); it.hasNext();) {
    Tuple t = (Tuple)it.next();
    sum += (Long)t.get(0);
    sum += (Long)t.get(1);
    sum += (Long)t.get(2);
    }
    return sum;
    }

    }

    grunt> A = LOAD 'data2' as aa:bytearray;
    grunt> C = FOREACH A GENERATE UDF.SumAll((tuple(long,long,long))aa);
    grunt> dump C;
    2009-11-05 10:07:09,266 [main] INFO
    org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully
    stored result in: "file:/tmp/temp1206478472/tmp-577036369"
    2009-11-05 10:07:09,267 [main] INFO
    org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records
    written : 3
    2009-11-05 10:07:09,267 [main] INFO
    org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes
    written : 0
    2009-11-05 10:07:09,267 [main] INFO
    org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100%
    complete!
    2009-11-05 10:07:09,267 [main] INFO
    org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
    (6L)
    (7L)
    (15L)
    grunt>

    Initially I thought that such a loop would work

    static protected Long sum(Tuple input) throws ExecException,
    NumberFormatException {
    long sum = 0;

    List<Object> values = input.getAll(); // Would give all fields in Tuple??
    for (Iterator<Object> it = values.iterator(); it.hasNext();) {
    sum += (Long)t;
    }
    return sum;
    }

    But I get an error that Tuple can't be cast back to Long. So my question is
    that what is input.getAll() returning? What is the structure of data that
    gets passed to exec function?

    Thanks!


Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedNov 5, '09 at 10:24a
activeNov 8, '09 at 2:26p
posts5
users4
websitepig.apache.org

People

Translate

site design / logo © 2022 Grokbase