Grokbase Groups Pig dev August 2010
FAQ
Optimization rule FilterAboveForeach is too restrictive and doesn't handle project * correctly
----------------------------------------------------------------------------------------------

Key: PIG-1568
URL: https://issues.apache.org/jira/browse/PIG-1568
Project: Pig
Issue Type: Bug
Reporter: Xuefu Zhang
Fix For: 0.8.0


FilterAboveForeach rule is to optimize the plan by pushing up filter above previous foreach operator. However, during code review, two major problems were found:

1. Current implementation assumes that if no projection is found in the filter condition then all columns from foreach are projected. This issue prevents the following optimization:
A = LOAD 'file.txt' AS (a(u,v), b, c);
B = FOREACH A GENERATE $0, b;
C = FILTER B BY 8 > 5;
STORE C INTO 'empty';

2. Current implementation doesn't handle * probjection, which means project all columns. As a result, it wasn't able to optimize the following:
A = LOAD 'file.txt' AS (a(u,v), b, c);
B = FOREACH A GENERATE $0, b;
C = FILTER B BY Identity.class.getName(*) > 5;
STORE C INTO 'empty';

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Xuefu Zhang (JIRA) at Aug 25, 2010 at 9:38 pm
    [ https://issues.apache.org/jira/browse/PIG-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Xuefu Zhang reassigned PIG-1568:
    --------------------------------

    Assignee: Xuefu Zhang
    Optimization rule FilterAboveForeach is too restrictive and doesn't handle project * correctly
    ----------------------------------------------------------------------------------------------

    Key: PIG-1568
    URL: https://issues.apache.org/jira/browse/PIG-1568
    Project: Pig
    Issue Type: Bug
    Reporter: Xuefu Zhang
    Assignee: Xuefu Zhang
    Fix For: 0.8.0


    FilterAboveForeach rule is to optimize the plan by pushing up filter above previous foreach operator. However, during code review, two major problems were found:
    1. Current implementation assumes that if no projection is found in the filter condition then all columns from foreach are projected. This issue prevents the following optimization:
    A = LOAD 'file.txt' AS (a(u,v), b, c);
    B = FOREACH A GENERATE $0, b;
    C = FILTER B BY 8 > 5;
    STORE C INTO 'empty';
    2. Current implementation doesn't handle * probjection, which means project all columns. As a result, it wasn't able to optimize the following:
    A = LOAD 'file.txt' AS (a(u,v), b, c);
    B = FOREACH A GENERATE $0, b;
    C = FILTER B BY Identity.class.getName(*) > 5;
    STORE C INTO 'empty';
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Xuefu Zhang (JIRA) at Aug 26, 2010 at 12:27 am
    [ https://issues.apache.org/jira/browse/PIG-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Xuefu Zhang updated PIG-1568:
    -----------------------------

    Attachment: jira-1568-1.patch
    Optimization rule FilterAboveForeach is too restrictive and doesn't handle project * correctly
    ----------------------------------------------------------------------------------------------

    Key: PIG-1568
    URL: https://issues.apache.org/jira/browse/PIG-1568
    Project: Pig
    Issue Type: Bug
    Reporter: Xuefu Zhang
    Assignee: Xuefu Zhang
    Fix For: 0.8.0

    Attachments: jira-1568-1.patch


    FilterAboveForeach rule is to optimize the plan by pushing up filter above previous foreach operator. However, during code review, two major problems were found:
    1. Current implementation assumes that if no projection is found in the filter condition then all columns from foreach are projected. This issue prevents the following optimization:
    A = LOAD 'file.txt' AS (a(u,v), b, c);
    B = FOREACH A GENERATE $0, b;
    C = FILTER B BY 8 > 5;
    STORE C INTO 'empty';
    2. Current implementation doesn't handle * probjection, which means project all columns. As a result, it wasn't able to optimize the following:
    A = LOAD 'file.txt' AS (a(u,v), b, c);
    B = FOREACH A GENERATE $0, b;
    C = FILTER B BY Identity.class.getName(*) > 5;
    STORE C INTO 'empty';
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Xuefu Zhang (JIRA) at Aug 26, 2010 at 12:27 am
    [ https://issues.apache.org/jira/browse/PIG-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Xuefu Zhang updated PIG-1568:
    -----------------------------

    Status: Patch Available (was: Open)
    Optimization rule FilterAboveForeach is too restrictive and doesn't handle project * correctly
    ----------------------------------------------------------------------------------------------

    Key: PIG-1568
    URL: https://issues.apache.org/jira/browse/PIG-1568
    Project: Pig
    Issue Type: Bug
    Reporter: Xuefu Zhang
    Assignee: Xuefu Zhang
    Fix For: 0.8.0

    Attachments: jira-1568-1.patch


    FilterAboveForeach rule is to optimize the plan by pushing up filter above previous foreach operator. However, during code review, two major problems were found:
    1. Current implementation assumes that if no projection is found in the filter condition then all columns from foreach are projected. This issue prevents the following optimization:
    A = LOAD 'file.txt' AS (a(u,v), b, c);
    B = FOREACH A GENERATE $0, b;
    C = FILTER B BY 8 > 5;
    STORE C INTO 'empty';
    2. Current implementation doesn't handle * probjection, which means project all columns. As a result, it wasn't able to optimize the following:
    A = LOAD 'file.txt' AS (a(u,v), b, c);
    B = FOREACH A GENERATE $0, b;
    C = FILTER B BY Identity.class.getName(*) > 5;
    STORE C INTO 'empty';
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Xuefu Zhang (JIRA) at Aug 26, 2010 at 5:40 pm
    [ https://issues.apache.org/jira/browse/PIG-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Xuefu Zhang updated PIG-1568:
    -----------------------------

    Status: Open (was: Patch Available)
    Optimization rule FilterAboveForeach is too restrictive and doesn't handle project * correctly
    ----------------------------------------------------------------------------------------------

    Key: PIG-1568
    URL: https://issues.apache.org/jira/browse/PIG-1568
    Project: Pig
    Issue Type: Bug
    Reporter: Xuefu Zhang
    Assignee: Xuefu Zhang
    Fix For: 0.8.0

    Attachments: jira-1568-1.patch, jira-1568-1.patch


    FilterAboveForeach rule is to optimize the plan by pushing up filter above previous foreach operator. However, during code review, two major problems were found:
    1. Current implementation assumes that if no projection is found in the filter condition then all columns from foreach are projected. This issue prevents the following optimization:
    A = LOAD 'file.txt' AS (a(u,v), b, c);
    B = FOREACH A GENERATE $0, b;
    C = FILTER B BY 8 > 5;
    STORE C INTO 'empty';
    2. Current implementation doesn't handle * probjection, which means project all columns. As a result, it wasn't able to optimize the following:
    A = LOAD 'file.txt' AS (a(u,v), b, c);
    B = FOREACH A GENERATE $0, b;
    C = FILTER B BY Identity.class.getName(*) > 5;
    STORE C INTO 'empty';
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Xuefu Zhang (JIRA) at Aug 26, 2010 at 5:40 pm
    [ https://issues.apache.org/jira/browse/PIG-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Xuefu Zhang updated PIG-1568:
    -----------------------------

    Attachment: jira-1568-1.patch
    Optimization rule FilterAboveForeach is too restrictive and doesn't handle project * correctly
    ----------------------------------------------------------------------------------------------

    Key: PIG-1568
    URL: https://issues.apache.org/jira/browse/PIG-1568
    Project: Pig
    Issue Type: Bug
    Reporter: Xuefu Zhang
    Assignee: Xuefu Zhang
    Fix For: 0.8.0

    Attachments: jira-1568-1.patch, jira-1568-1.patch


    FilterAboveForeach rule is to optimize the plan by pushing up filter above previous foreach operator. However, during code review, two major problems were found:
    1. Current implementation assumes that if no projection is found in the filter condition then all columns from foreach are projected. This issue prevents the following optimization:
    A = LOAD 'file.txt' AS (a(u,v), b, c);
    B = FOREACH A GENERATE $0, b;
    C = FILTER B BY 8 > 5;
    STORE C INTO 'empty';
    2. Current implementation doesn't handle * probjection, which means project all columns. As a result, it wasn't able to optimize the following:
    A = LOAD 'file.txt' AS (a(u,v), b, c);
    B = FOREACH A GENERATE $0, b;
    C = FILTER B BY Identity.class.getName(*) > 5;
    STORE C INTO 'empty';
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Xuefu Zhang (JIRA) at Aug 26, 2010 at 5:46 pm
    [ https://issues.apache.org/jira/browse/PIG-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Xuefu Zhang updated PIG-1568:
    -----------------------------

    Status: Patch Available (was: Open)

    Regenerate the patch after fixing failed test case. The test case itself was changed as it uses an internal bug. When a UDF takes no argument, PIG backend passes the whole input to the UDF. This needs to be corrected. In another word, if a UDF doesn't specify any argument, we assume that it doesn't need any input. If a UDF needs all input, it can either specify a star (*). It can also list whatever it requires in the argument list.

    A Jira tracking Pig backend changes will be created.

    Optimization rule FilterAboveForeach is too restrictive and doesn't handle project * correctly
    ----------------------------------------------------------------------------------------------

    Key: PIG-1568
    URL: https://issues.apache.org/jira/browse/PIG-1568
    Project: Pig
    Issue Type: Bug
    Reporter: Xuefu Zhang
    Assignee: Xuefu Zhang
    Fix For: 0.8.0

    Attachments: jira-1568-1.patch, jira-1568-1.patch


    FilterAboveForeach rule is to optimize the plan by pushing up filter above previous foreach operator. However, during code review, two major problems were found:
    1. Current implementation assumes that if no projection is found in the filter condition then all columns from foreach are projected. This issue prevents the following optimization:
    A = LOAD 'file.txt' AS (a(u,v), b, c);
    B = FOREACH A GENERATE $0, b;
    C = FILTER B BY 8 > 5;
    STORE C INTO 'empty';
    2. Current implementation doesn't handle * probjection, which means project all columns. As a result, it wasn't able to optimize the following:
    A = LOAD 'file.txt' AS (a(u,v), b, c);
    B = FOREACH A GENERATE $0, b;
    C = FILTER B BY Identity.class.getName(*) > 5;
    STORE C INTO 'empty';
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Daniel Dai (JIRA) at Aug 30, 2010 at 8:00 am
    [ https://issues.apache.org/jira/browse/PIG-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Daniel Dai updated PIG-1568:
    ----------------------------

    Status: Resolved (was: Patch Available)
    Hadoop Flags: [Reviewed]
    Resolution: Fixed

    test-patch result:

    [exec] +1 overall.
    [exec]
    [exec] +1 @author. The patch does not contain any @author tags.
    [exec]
    [exec] +1 tests included. The patch appears to include 6 new or modified tests.
    [exec]
    [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
    [exec]
    [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
    [exec]
    [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
    [exec]
    [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

    Patch committed. Thanks Xuefu!
    Optimization rule FilterAboveForeach is too restrictive and doesn't handle project * correctly
    ----------------------------------------------------------------------------------------------

    Key: PIG-1568
    URL: https://issues.apache.org/jira/browse/PIG-1568
    Project: Pig
    Issue Type: Bug
    Reporter: Xuefu Zhang
    Assignee: Xuefu Zhang
    Fix For: 0.8.0

    Attachments: jira-1568-1.patch, jira-1568-1.patch


    FilterAboveForeach rule is to optimize the plan by pushing up filter above previous foreach operator. However, during code review, two major problems were found:
    1. Current implementation assumes that if no projection is found in the filter condition then all columns from foreach are projected. This issue prevents the following optimization:
    A = LOAD 'file.txt' AS (a(u,v), b, c);
    B = FOREACH A GENERATE $0, b;
    C = FILTER B BY 8 > 5;
    STORE C INTO 'empty';
    2. Current implementation doesn't handle * probjection, which means project all columns. As a result, it wasn't able to optimize the following:
    A = LOAD 'file.txt' AS (a(u,v), b, c);
    B = FOREACH A GENERATE $0, b;
    C = FILTER B BY Identity.class.getName(*) > 5;
    STORE C INTO 'empty';
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriespig, hadoop
postedAug 25, '10 at 9:38p
activeAug 30, '10 at 8:00a
posts8
users1
websitepig.apache.org

1 user in discussion

Daniel Dai (JIRA): 8 posts

People

Translate

site design / logo © 2022 Grokbase