Grokbase Groups Pig user January 2011
FAQ
I've written a regular expression EvalFunc similar to ExtractAll except
this is called FindAll. It returns a tuple of all strings found that
match the given pattern. The syntax looks like this:

A = FOREACH raw_data GENERATE FindAll(field_str, '[^/]+') AS a_tuple;

I dumped some return tuples which look something like this:

((a,b,c,d,e))

I'm trying to get the size of the tuple so I can filter out certain
entries. If I simply do:

B = FOREACH A GENERATE SIZE(a_tuple);
DUMP B;

I always get a size of 1. I thought maybe this was due to the
surrounding bag so I tried to FLATTEN(FindAll(...)). Now I'm getting an
error from SIZE saying it can't convert a string to a DataBag.

Any idea what's going on here?

Thanks,


-Xavier

Search Discussions

  • Daniel Dai at Feb 1, 2011 at 8:11 pm
    You cannot get size of tuple using SIZE. Use ARITY instead.

    Daniel

    Xavier Stevens wrote:
    I've written a regular expression EvalFunc similar to ExtractAll except
    this is called FindAll. It returns a tuple of all strings found that
    match the given pattern. The syntax looks like this:

    A = FOREACH raw_data GENERATE FindAll(field_str, '[^/]+') AS a_tuple;

    I dumped some return tuples which look something like this:

    ((a,b,c,d,e))

    I'm trying to get the size of the tuple so I can filter out certain
    entries. If I simply do:

    B = FOREACH A GENERATE SIZE(a_tuple);
    DUMP B;

    I always get a size of 1. I thought maybe this was due to the
    surrounding bag so I tried to FLATTEN(FindAll(...)). Now I'm getting an
    error from SIZE saying it can't convert a string to a DataBag.

    Any idea what's going on here?

    Thanks,


    -Xavier
  • Dmitriy Ryaboy at Feb 1, 2011 at 8:19 pm
    Daniel, if that's actually the case we need to fix the javadocs. Cause
    they are pretty explicit...

    /**
    * Find the number of fields in a tuple. Expected input is a tuple,
    * output is an integer.
    * @deprecated Use {@link SIZE} instead.
    */
    public class ARITY extends EvalFunc<Integer> {
    On Tue, Feb 1, 2011 at 12:10 PM, Daniel Dai wrote:
    You cannot get size of tuple using SIZE. Use ARITY instead.

    Daniel

    Xavier Stevens wrote:
    I've written a regular expression EvalFunc similar to ExtractAll except
    this is called FindAll.  It returns a tuple of all strings found that
    match the given pattern.  The syntax looks like this:

    A = FOREACH raw_data GENERATE FindAll(field_str, '[^/]+') AS a_tuple;

    I dumped some return tuples which look something like this:

    ((a,b,c,d,e))

    I'm trying to get the size of the tuple so I can filter out certain
    entries.  If I simply do:

    B = FOREACH A GENERATE SIZE(a_tuple);
    DUMP B;

    I always get a size of 1.  I thought maybe this was due to the
    surrounding bag so I tried to FLATTEN(FindAll(...)).  Now I'm getting an
    error from SIZE saying it can't convert a string to a DataBag.

    Any idea what's going on here?

    Thanks,


    -Xavier
  • Daniel Dai at Feb 1, 2011 at 11:00 pm
    Oh, I am wrong. SIZE is the right UDF to use. The issue is caused by
    TupleSize, as Eric points out a moment ago.

    Daniel

    Dmitriy Ryaboy wrote:
    Daniel, if that's actually the case we need to fix the javadocs. Cause
    they are pretty explicit...

    /**
    * Find the number of fields in a tuple. Expected input is a tuple,
    * output is an integer.
    * @deprecated Use {@link SIZE} instead.
    */
    public class ARITY extends EvalFunc<Integer> {
    On Tue, Feb 1, 2011 at 12:10 PM, Daniel Dai wrote:

    You cannot get size of tuple using SIZE. Use ARITY instead.

    Daniel

    Xavier Stevens wrote:
    I've written a regular expression EvalFunc similar to ExtractAll except
    this is called FindAll. It returns a tuple of all strings found that
    match the given pattern. The syntax looks like this:

    A = FOREACH raw_data GENERATE FindAll(field_str, '[^/]+') AS a_tuple;

    I dumped some return tuples which look something like this:

    ((a,b,c,d,e))

    I'm trying to get the size of the tuple so I can filter out certain
    entries. If I simply do:

    B = FOREACH A GENERATE SIZE(a_tuple);
    DUMP B;

    I always get a size of 1. I thought maybe this was due to the
    surrounding bag so I tried to FLATTEN(FindAll(...)). Now I'm getting an
    error from SIZE saying it can't convert a string to a DataBag.

    Any idea what's going on here?

    Thanks,


    -Xavier
  • Charles Gonçalves at Feb 4, 2011 at 12:00 am
    Jira Issue :
    https://issues.apache.org/jira/browse/PIG-1841
    <https://issues.apache.org/jira/browse/PIG-200>
    On Tue, Feb 1, 2011 at 8:59 PM, Daniel Dai wrote:

    Oh, I am wrong. SIZE is the right UDF to use. The issue is caused by
    TupleSize, as Eric points out a moment ago.

    Daniel


    Dmitriy Ryaboy wrote:
    Daniel, if that's actually the case we need to fix the javadocs. Cause
    they are pretty explicit...

    /**
    * Find the number of fields in a tuple. Expected input is a tuple,
    * output is an integer.
    * @deprecated Use {@link SIZE} instead.
    */
    public class ARITY extends EvalFunc<Integer> {

    On Tue, Feb 1, 2011 at 12:10 PM, Daniel Dai <jianyong@yahoo-inc.com>
    wrote:

    You cannot get size of tuple using SIZE. Use ARITY instead.

    Daniel

    Xavier Stevens wrote:

    I've written a regular expression EvalFunc similar to ExtractAll except
    this is called FindAll. It returns a tuple of all strings found that
    match the given pattern. The syntax looks like this:

    A = FOREACH raw_data GENERATE FindAll(field_str, '[^/]+') AS a_tuple;

    I dumped some return tuples which look something like this:

    ((a,b,c,d,e))

    I'm trying to get the size of the tuple so I can filter out certain
    entries. If I simply do:

    B = FOREACH A GENERATE SIZE(a_tuple);
    DUMP B;

    I always get a size of 1. I thought maybe this was due to the
    surrounding bag so I tried to FLATTEN(FindAll(...)). Now I'm getting an
    error from SIZE saying it can't convert a string to a DataBag.

    Any idea what's going on here?

    Thanks,


    -Xavier


    --
    *Charles Ferreira Gonçalves *
    http://homepages.dcc.ufmg.br/~charles/
    UFMG - ICEx - Dcc
    Cel.: 55 31 87741485
    Tel.: 55 31 34741485
    Lab.: 55 31 34095840

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedJan 27, '11 at 8:54p
activeFeb 4, '11 at 12:00a
posts5
users4
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase