FAQ
Projections in nested filter and inside foreach do not work
-----------------------------------------------------------

Key: PIG-430
URL: https://issues.apache.org/jira/browse/PIG-430
Project: Pig
Issue Type: Bug
Affects Versions: types_branch
Reporter: Santhosh Srinivasan
Assignee: Santhosh Srinivasan
Fix For: types_branch


The following queries do not work:

Nested filter:

a = load 'studenttab10k' as (name, age, gpa);
b = filter a by age < 20;
c = group b by age;
d = foreach c { cf = filter b by gpa < 3.0; cp = cf.gpa; cd = distinct cp; co = order cd by $0; generate group, flatten(co); }
store d into 'output';

Nested Distinct:

a = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa);
b = group a by name;
c = foreach b { aa = distinct a.age; generate group, COUNT(aa); }
store c into 'output';

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Olga Natkovich (JIRA) at Sep 13, 2008 at 4:17 pm
    [ https://issues.apache.org/jira/browse/PIG-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Olga Natkovich reassigned PIG-430:
    ----------------------------------

    Assignee: Shravan Matthur Narayanamurthy (was: Santhosh Srinivasan)
    Projections in nested filter and inside foreach do not work
    -----------------------------------------------------------

    Key: PIG-430
    URL: https://issues.apache.org/jira/browse/PIG-430
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Santhosh Srinivasan
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch


    The following queries do not work:
    Nested filter:
    a = load 'studenttab10k' as (name, age, gpa);
    b = filter a by age < 20;
    c = group b by age;
    d = foreach c { cf = filter b by gpa < 3.0; cp = cf.gpa; cd = distinct cp; co = order cd by $0; generate group, flatten(co); }
    store d into 'output';
    Nested Distinct:
    a = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa);
    b = group a by name;
    c = foreach b { aa = distinct a.age; generate group, COUNT(aa); }
    store c into 'output';
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Shravan Matthur Narayanamurthy (JIRA) at Sep 17, 2008 at 8:26 pm
    [ https://issues.apache.org/jira/browse/PIG-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Shravan Matthur Narayanamurthy updated PIG-430:
    -----------------------------------------------

    Status: Patch Available (was: Open)

    I have fixed part of the problem that addresses the project issue. The issue mentioned in distinct still remains. The problem here is that we see that projects are being introduced into the input of distinct which creates a unique case where the projection chaining will not work. The problem is similar to the one where you can assign a nested project to a variable inside a nested block. This has been solved by replacing the nested project with a foreach statement. The solution to the distinct problem should be something similar where the input to the distinct can also be a nested project. I made a local change by replacing BaseEvalSpec by NestedProject in my code for this and it works. However, I don't want to mess up something because I am not completely aware of the side-effects of changing this in the parser. Its better if someone more comfortable with the parser took a look at this one.

    Also, I think there are some issues with the parsing of nested things. I tried the following and the parser just doesn't terminate the nested block waiting and keeps waiting for more input:

    A = load 'file';
    B = group A by $0;
    C = foreach B { C1=distinct "const"; generate C1;}

    I was clueless as to why this is happening but I tried this because I thought that the input to a nested distinct shouldn't be BaseEvalSpec which can FuncEvalSpecs and Constants. I think we need to change things a bit here.
    Projections in nested filter and inside foreach do not work
    -----------------------------------------------------------

    Key: PIG-430
    URL: https://issues.apache.org/jira/browse/PIG-430
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Santhosh Srinivasan
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 430-1.patch


    The following queries do not work:
    Nested filter:
    a = load 'studenttab10k' as (name, age, gpa);
    b = filter a by age < 20;
    c = group b by age;
    d = foreach c { cf = filter b by gpa < 3.0; cp = cf.gpa; cd = distinct cp; co = order cd by $0; generate group, flatten(co); }
    store d into 'output';
    Nested Distinct:
    a = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa);
    b = group a by name;
    c = foreach b { aa = distinct a.age; generate group, COUNT(aa); }
    store c into 'output';
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Shravan Matthur Narayanamurthy (JIRA) at Sep 17, 2008 at 8:26 pm
    [ https://issues.apache.org/jira/browse/PIG-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Shravan Matthur Narayanamurthy updated PIG-430:
    -----------------------------------------------

    Attachment: 430-1.patch
    Projections in nested filter and inside foreach do not work
    -----------------------------------------------------------

    Key: PIG-430
    URL: https://issues.apache.org/jira/browse/PIG-430
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Santhosh Srinivasan
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 430-1.patch


    The following queries do not work:
    Nested filter:
    a = load 'studenttab10k' as (name, age, gpa);
    b = filter a by age < 20;
    c = group b by age;
    d = foreach c { cf = filter b by gpa < 3.0; cp = cf.gpa; cd = distinct cp; co = order cd by $0; generate group, flatten(co); }
    store d into 'output';
    Nested Distinct:
    a = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa);
    b = group a by name;
    c = foreach b { aa = distinct a.age; generate group, COUNT(aa); }
    store c into 'output';
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Shravan Matthur Narayanamurthy (JIRA) at Sep 17, 2008 at 8:28 pm
    [ https://issues.apache.org/jira/browse/PIG-430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631924#action_12631924 ]

    shravanmn edited comment on PIG-430 at 9/17/08 1:26 PM:
    -----------------------------------------------------------------------------

    I have fixed part of the problem that addresses the project issue. The issue mentioned in distinct still remains. The problem here is that we see that projects are being introduced into the input of distinct which creates a unique case where the projection chaining will not work. The problem is similar to the one where you can assign a nested project to a variable inside a nested block. This has been solved by replacing the nested project with a foreach statement. The solution to the distinct problem should be something similar where the input to the distinct can also be a nested project. I made a local change by replacing BaseEvalSpec by NestedProject in my code for this and it works. However, I don't want to mess up something because I am not completely aware of the side-effects of changing this in the parser. Its better if someone more comfortable with the parser took a look at this one.

    Also, I think there are some issues with the parsing of nested things. I tried the following and the parser just doesn't terminate the nested block waiting and keeps waiting for more input:

    A = load 'file';
    B = group A by $0;
    C = foreach B { C1=distinct "const"; generate C1;};

    I was clueless as to why this is happening but I tried this because I thought that the input to a nested distinct shouldn't be BaseEvalSpec which can FuncEvalSpecs and Constants. I think we need to change things a bit here.

    was (Author: shravanmn):
    I have fixed part of the problem that addresses the project issue. The issue mentioned in distinct still remains. The problem here is that we see that projects are being introduced into the input of distinct which creates a unique case where the projection chaining will not work. The problem is similar to the one where you can assign a nested project to a variable inside a nested block. This has been solved by replacing the nested project with a foreach statement. The solution to the distinct problem should be something similar where the input to the distinct can also be a nested project. I made a local change by replacing BaseEvalSpec by NestedProject in my code for this and it works. However, I don't want to mess up something because I am not completely aware of the side-effects of changing this in the parser. Its better if someone more comfortable with the parser took a look at this one.

    Also, I think there are some issues with the parsing of nested things. I tried the following and the parser just doesn't terminate the nested block waiting and keeps waiting for more input:

    A = load 'file';
    B = group A by $0;
    C = foreach B { C1=distinct "const"; generate C1;}

    I was clueless as to why this is happening but I tried this because I thought that the input to a nested distinct shouldn't be BaseEvalSpec which can FuncEvalSpecs and Constants. I think we need to change things a bit here.
    Projections in nested filter and inside foreach do not work
    -----------------------------------------------------------

    Key: PIG-430
    URL: https://issues.apache.org/jira/browse/PIG-430
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Santhosh Srinivasan
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 430-1.patch


    The following queries do not work:
    Nested filter:
    a = load 'studenttab10k' as (name, age, gpa);
    b = filter a by age < 20;
    c = group b by age;
    d = foreach c { cf = filter b by gpa < 3.0; cp = cf.gpa; cd = distinct cp; co = order cd by $0; generate group, flatten(co); }
    store d into 'output';
    Nested Distinct:
    a = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa);
    b = group a by name;
    c = foreach b { aa = distinct a.age; generate group, COUNT(aa); }
    store c into 'output';
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Santhosh Srinivasan (JIRA) at Sep 17, 2008 at 8:36 pm
    [ https://issues.apache.org/jira/browse/PIG-430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631933#action_12631933 ]

    Santhosh Srinivasan commented on PIG-430:
    -----------------------------------------

    Replace the double quotes with the single quotes and it should work.

    {code}
    A = load 'file';
    B = group A by $0;
    C = foreach B { C1=distinct 'const'; generate C1;};
    {code}
    Projections in nested filter and inside foreach do not work
    -----------------------------------------------------------

    Key: PIG-430
    URL: https://issues.apache.org/jira/browse/PIG-430
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Santhosh Srinivasan
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 430-1.patch


    The following queries do not work:
    Nested filter:
    a = load 'studenttab10k' as (name, age, gpa);
    b = filter a by age < 20;
    c = group b by age;
    d = foreach c { cf = filter b by gpa < 3.0; cp = cf.gpa; cd = distinct cp; co = order cd by $0; generate group, flatten(co); }
    store d into 'output';
    Nested Distinct:
    a = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa);
    b = group a by name;
    c = foreach b { aa = distinct a.age; generate group, COUNT(aa); }
    store c into 'output';
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Shravan Matthur Narayanamurthy (JIRA) at Sep 18, 2008 at 8:17 pm
    [ https://issues.apache.org/jira/browse/PIG-430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632370#action_12632370 ]

    Shravan Matthur Narayanamurthy commented on PIG-430:
    ----------------------------------------------------

    Hmm, I have hit a bug accidentally! Shouldn't it have terminated the nested block instead of waiting for more input when there is nothing to?
    Projections in nested filter and inside foreach do not work
    -----------------------------------------------------------

    Key: PIG-430
    URL: https://issues.apache.org/jira/browse/PIG-430
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Santhosh Srinivasan
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 430-1.patch


    The following queries do not work:
    Nested filter:
    a = load 'studenttab10k' as (name, age, gpa);
    b = filter a by age < 20;
    c = group b by age;
    d = foreach c { cf = filter b by gpa < 3.0; cp = cf.gpa; cd = distinct cp; co = order cd by $0; generate group, flatten(co); }
    store d into 'output';
    Nested Distinct:
    a = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa);
    b = group a by name;
    c = foreach b { aa = distinct a.age; generate group, COUNT(aa); }
    store c into 'output';
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Santhosh Srinivasan (JIRA) at Sep 21, 2008 at 12:02 am
    [ https://issues.apache.org/jira/browse/PIG-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Santhosh Srinivasan updated PIG-430:
    ------------------------------------

    Assignee: Santhosh Srinivasan (was: Shravan Matthur Narayanamurthy)
    Projections in nested filter and inside foreach do not work
    -----------------------------------------------------------

    Key: PIG-430
    URL: https://issues.apache.org/jira/browse/PIG-430
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Santhosh Srinivasan
    Assignee: Santhosh Srinivasan
    Fix For: types_branch

    Attachments: 430-1.patch


    The following queries do not work:
    Nested filter:
    a = load 'studenttab10k' as (name, age, gpa);
    b = filter a by age < 20;
    c = group b by age;
    d = foreach c { cf = filter b by gpa < 3.0; cp = cf.gpa; cd = distinct cp; co = order cd by $0; generate group, flatten(co); }
    store d into 'output';
    Nested Distinct:
    a = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa);
    b = group a by name;
    c = foreach b { aa = distinct a.age; generate group, COUNT(aa); }
    store c into 'output';
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Santhosh Srinivasan (JIRA) at Sep 21, 2008 at 12:06 am
    [ https://issues.apache.org/jira/browse/PIG-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Santhosh Srinivasan updated PIG-430:
    ------------------------------------

    Attachment: PIG-430_2.patch

    New patch (PIG-430_2.pathc), includes the rewrite of nested projects as foreach generates for the nested sort, nested distinct and nested filter inputs. Shravan's patch is included as part of this patch. His patch for fixing the BracketedProject projection was fine. Thanks Shravan!

    Unit tests that still fail are:

    [junit] Running org.apache.pig.test.TestEvalPipeline
    [junit] Tests run: 8, Failures: 1, Errors: 0, Time elapsed: 175.711 sec
    [junit] Test org.apache.pig.test.TestEvalPipeline FAILED

    Projections in nested filter and inside foreach do not work
    -----------------------------------------------------------

    Key: PIG-430
    URL: https://issues.apache.org/jira/browse/PIG-430
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Santhosh Srinivasan
    Assignee: Santhosh Srinivasan
    Fix For: types_branch

    Attachments: 430-1.patch, PIG-430_2.patch


    The following queries do not work:
    Nested filter:
    a = load 'studenttab10k' as (name, age, gpa);
    b = filter a by age < 20;
    c = group b by age;
    d = foreach c { cf = filter b by gpa < 3.0; cp = cf.gpa; cd = distinct cp; co = order cd by $0; generate group, flatten(co); }
    store d into 'output';
    Nested Distinct:
    a = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa);
    b = group a by name;
    c = foreach b { aa = distinct a.age; generate group, COUNT(aa); }
    store c into 'output';
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Santhosh Srinivasan (JIRA) at Sep 21, 2008 at 12:06 am
    [ https://issues.apache.org/jira/browse/PIG-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Santhosh Srinivasan updated PIG-430:
    ------------------------------------

    Patch Info: [Patch Available]
    Projections in nested filter and inside foreach do not work
    -----------------------------------------------------------

    Key: PIG-430
    URL: https://issues.apache.org/jira/browse/PIG-430
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Santhosh Srinivasan
    Assignee: Santhosh Srinivasan
    Fix For: types_branch

    Attachments: 430-1.patch, PIG-430_2.patch


    The following queries do not work:
    Nested filter:
    a = load 'studenttab10k' as (name, age, gpa);
    b = filter a by age < 20;
    c = group b by age;
    d = foreach c { cf = filter b by gpa < 3.0; cp = cf.gpa; cd = distinct cp; co = order cd by $0; generate group, flatten(co); }
    store d into 'output';
    Nested Distinct:
    a = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa);
    b = group a by name;
    c = foreach b { aa = distinct a.age; generate group, COUNT(aa); }
    store c into 'output';
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Santhosh Srinivasan (JIRA) at Sep 21, 2008 at 12:08 am
    [ https://issues.apache.org/jira/browse/PIG-430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633024#action_12633024 ]

    sms edited comment on PIG-430 at 9/20/08 5:07 PM:
    ------------------------------------------------------------------

    New patch (PIG-430_2.patch), includes the rewrite of nested projects as foreach generates for the nested sort, nested distinct and nested filter inputs. Shravan's patch is included as part of this patch. His patch for fixing the BracketedProject projection was fine. Thanks Shravan!

    Unit tests that still fail are:

    [junit] Running org.apache.pig.test.TestEvalPipeline
    [junit] Tests run: 8, Failures: 1, Errors: 0, Time elapsed: 175.711 sec
    [junit] Test org.apache.pig.test.TestEvalPipeline FAILED


    was (Author: sms):
    New patch (PIG-430_2.pathc), includes the rewrite of nested projects as foreach generates for the nested sort, nested distinct and nested filter inputs. Shravan's patch is included as part of this patch. His patch for fixing the BracketedProject projection was fine. Thanks Shravan!

    Unit tests that still fail are:

    [junit] Running org.apache.pig.test.TestEvalPipeline
    [junit] Tests run: 8, Failures: 1, Errors: 0, Time elapsed: 175.711 sec
    [junit] Test org.apache.pig.test.TestEvalPipeline FAILED

    Projections in nested filter and inside foreach do not work
    -----------------------------------------------------------

    Key: PIG-430
    URL: https://issues.apache.org/jira/browse/PIG-430
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Santhosh Srinivasan
    Assignee: Santhosh Srinivasan
    Fix For: types_branch

    Attachments: 430-1.patch, PIG-430_2.patch


    The following queries do not work:
    Nested filter:
    a = load 'studenttab10k' as (name, age, gpa);
    b = filter a by age < 20;
    c = group b by age;
    d = foreach c { cf = filter b by gpa < 3.0; cp = cf.gpa; cd = distinct cp; co = order cd by $0; generate group, flatten(co); }
    store d into 'output';
    Nested Distinct:
    a = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa);
    b = group a by name;
    c = foreach b { aa = distinct a.age; generate group, COUNT(aa); }
    store c into 'output';
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Sep 22, 2008 at 8:22 pm
    [ https://issues.apache.org/jira/browse/PIG-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Olga Natkovich updated PIG-430:
    -------------------------------

    Resolution: Fixed
    Status: Resolved (was: Patch Available)

    patch committed. thanks santhosh
    Projections in nested filter and inside foreach do not work
    -----------------------------------------------------------

    Key: PIG-430
    URL: https://issues.apache.org/jira/browse/PIG-430
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Santhosh Srinivasan
    Assignee: Santhosh Srinivasan
    Fix For: types_branch

    Attachments: 430-1.patch, PIG-430_2.patch


    The following queries do not work:
    Nested filter:
    a = load 'studenttab10k' as (name, age, gpa);
    b = filter a by age < 20;
    c = group b by age;
    d = foreach c { cf = filter b by gpa < 3.0; cp = cf.gpa; cd = distinct cp; co = order cd by $0; generate group, flatten(co); }
    store d into 'output';
    Nested Distinct:
    a = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa);
    b = group a by name;
    c = foreach b { aa = distinct a.age; generate group, COUNT(aa); }
    store c into 'output';
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriespig, hadoop
postedSep 12, '08 at 11:10p
activeSep 22, '08 at 8:22p
posts12
users1
websitepig.apache.org

1 user in discussion

Olga Natkovich (JIRA): 12 posts

People

Translate

site design / logo © 2022 Grokbase