FAQ
For some reason, I am unable to filter inside my nested foreach. The basic
outline of my script is as follows:

1. Load input 1.
2. Load input 2.
3. Join input1 by key1, input2 by key2;
4. foreach joined generate fields + additional field named udf-field (apply
evalfunc udf to generate additional field);
5. group on (key2, key3, key4, key5)
6. FOREACH grouped {
FILTER relation BY udf-field == value
FILTER relation BY udf-field == value2
FILTER relation BY udf-field == value3
FILTER relation BY udf-field == value4

Generate counts of each filtered relation;

}

--
When I try to use the alias relation to reference my original relation
(before the grouping in #5) I get a parsing error for an Invalid alias.
What's the correct alias to use or can I not filter inside foreach?


--

Zaki Rahaman

Search Discussions

  • Dmitriy Ryaboy at Feb 25, 2010 at 7:35 pm
    Zaki, throw in a "describe" on the relation you are trying to run the last
    step on, and just check out what the disambiguated aliases are.

    -D
    On Thu, Feb 25, 2010 at 11:26 AM, zaki rahaman wrote:

    For some reason, I am unable to filter inside my nested foreach. The basic
    outline of my script is as follows:

    1. Load input 1.
    2. Load input 2.
    3. Join input1 by key1, input2 by key2;
    4. foreach joined generate fields + additional field named udf-field (apply
    evalfunc udf to generate additional field);
    5. group on (key2, key3, key4, key5)
    6. FOREACH grouped {
    FILTER relation BY udf-field == value
    FILTER relation BY udf-field == value2
    FILTER relation BY udf-field == value3
    FILTER relation BY udf-field == value4

    Generate counts of each filtered relation;

    }

    --
    When I try to use the alias relation to reference my original relation
    (before the grouping in #5) I get a parsing error for an Invalid alias.
    What's the correct alias to use or can I not filter inside foreach?


    --

    Zaki Rahaman
  • Zaki rahaman at Feb 25, 2010 at 7:42 pm
    Are you talking about the relation just before the FOREACH? I get something
    like this:

    grpd: {group: (date: chararray,dist_partner: chararray,country:
    chararray,client_partner: chararray),rawviews: {date:
    chararray,dist_partner: chararray,country: chararray,client_partner:
    chararray,feature: chararray,userid: chararray}}

    I see the relation name I want to reference(rawviews) is a field in the
    tuple so not sure what's happening here. Basically I want to take the
    rawviews bag and split it based on a udf and then count the resulting
    splits.
    On Thu, Feb 25, 2010 at 2:34 PM, Dmitriy Ryaboy wrote:

    Zaki, throw in a "describe" on the relation you are trying to run the last
    step on, and just check out what the disambiguated aliases are.

    -D

    On Thu, Feb 25, 2010 at 11:26 AM, zaki rahaman <zaki.rahaman@gmail.com
    wrote:
    For some reason, I am unable to filter inside my nested foreach. The basic
    outline of my script is as follows:

    1. Load input 1.
    2. Load input 2.
    3. Join input1 by key1, input2 by key2;
    4. foreach joined generate fields + additional field named udf-field (apply
    evalfunc udf to generate additional field);
    5. group on (key2, key3, key4, key5)
    6. FOREACH grouped {
    FILTER relation BY udf-field == value
    FILTER relation BY udf-field == value2
    FILTER relation BY udf-field == value3
    FILTER relation BY udf-field == value4

    Generate counts of each filtered relation;

    }

    --
    When I try to use the alias relation to reference my original relation
    (before the grouping in #5) I get a parsing error for an Invalid alias.
    What's the correct alias to use or can I not filter inside foreach?


    --

    Zaki Rahaman


    --
    Zaki Rahaman
  • Dmitriy Ryaboy at Feb 25, 2010 at 8:00 pm
    Given that schema, you should be able to do this:

    counts = FOREACH grpd {
    x = FILTER rawviews BY feature == 'some_feature';
    y = FILTER rawviews BY feature == 'some_other_feature';
    GENERATE FLATTEN(group) as (date, dist_partner, country, client_partner),
    COUNT(x) as cnt_x, COUNT(y) as cnt_y;
    };

    Are you saying this isn't working? Can you send the actual script and sample
    data?

    -D
    On Thu, Feb 25, 2010 at 11:41 AM, zaki rahaman wrote:

    Are you talking about the relation just before the FOREACH? I get something
    like this:

    grpd: {group: (date: chararray,dist_partner: chararray,country:
    chararray,client_partner: chararray),rawviews: {date:
    chararray,dist_partner: chararray,country: chararray,client_partner:
    chararray,feature: chararray,userid: chararray}}

    I see the relation name I want to reference(rawviews) is a field in the
    tuple so not sure what's happening here. Basically I want to take the
    rawviews bag and split it based on a udf and then count the resulting
    splits.
    On Thu, Feb 25, 2010 at 2:34 PM, Dmitriy Ryaboy wrote:

    Zaki, throw in a "describe" on the relation you are trying to run the last
    step on, and just check out what the disambiguated aliases are.

    -D

    On Thu, Feb 25, 2010 at 11:26 AM, zaki rahaman <zaki.rahaman@gmail.com
    wrote:
    For some reason, I am unable to filter inside my nested foreach. The basic
    outline of my script is as follows:

    1. Load input 1.
    2. Load input 2.
    3. Join input1 by key1, input2 by key2;
    4. foreach joined generate fields + additional field named udf-field (apply
    evalfunc udf to generate additional field);
    5. group on (key2, key3, key4, key5)
    6. FOREACH grouped {
    FILTER relation BY udf-field == value
    FILTER relation BY udf-field == value2
    FILTER relation BY udf-field == value3
    FILTER relation BY udf-field == value4

    Generate counts of each filtered relation;

    }

    --
    When I try to use the alias relation to reference my original relation
    (before the grouping in #5) I get a parsing error for an Invalid alias.
    What's the correct alias to use or can I not filter inside foreach?


    --

    Zaki Rahaman


    --
    Zaki Rahaman
  • Zaki rahaman at Feb 25, 2010 at 9:47 pm
    Seems to work. Before I had my udf embedded inside the foreach and was doing
    the filter applying it to the fields 4 times... instead I just moved it to
    an additional column outside of the foreach and it's working
  • Mridul Muralidharan at Mar 1, 2010 at 8:17 am
    Just curious, what was the actual error with using filter's within
    nested foreach ?
    Will it be possible to show the snippet ? (and schema of input ?).


    We are using this without issue right now, so curious what the problem
    here is ..

    Thanks,
    Mridul
    On Friday 26 February 2010 03:17 AM, zaki rahaman wrote:
    Seems to work. Before I had my udf embedded inside the foreach and was doing
    the filter applying it to the fields 4 times... instead I just moved it to
    an additional column outside of the foreach and it's working
  • Zaki rahaman at Mar 1, 2010 at 8:50 am
    There was an invalid alias error. Initially, I had each of the four nested
    relations applying a UDF to do the filtering. When I moved the UDF outside
    the foreach so that it was applied once globally and then filtered on the
    resulting values inside the foreach, it worked.

    On Sun, Feb 28, 2010 at 4:02 PM, Mridul Muralidharan
    wrote:
    Just curious, what was the actual error with using filter's within nested
    foreach ?
    Will it be possible to show the snippet ? (and schema of input ?).


    We are using this without issue right now, so curious what the problem here
    is ..

    Thanks,
    Mridul

    On Friday 26 February 2010 03:17 AM, zaki rahaman wrote:

    Seems to work. Before I had my udf embedded inside the foreach and was
    doing
    the filter applying it to the fields 4 times... instead I just moved it to
    an additional column outside of the foreach and it's working

    --
    Zaki Rahaman

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedFeb 25, '10 at 7:27p
activeMar 1, '10 at 8:50a
posts7
users3
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase