Grokbase Groups Pig dev August 2010
FAQ
Optimization rule PushUpFilter causes filter to be pushed up out joins
----------------------------------------------------------------------

Key: PIG-1574
URL: https://issues.apache.org/jira/browse/PIG-1574
Project: Pig
Issue Type: Bug
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
Fix For: 0.8.0


The PushUpFilter optimization rule in the new logical plan moves the filter up to one of the join branch. It does this aggressively by find an operator that has all the projection UIDs. However, it didn't consider that the found operator might be another join. If that join is outer, then we cannot simply move the filter to one of its branches.

As an example, the following script will be erroneously optimized:

A = load 'myfile' as (d1:int);
B = load 'anotherfile' as (d2:int);
C = join A by d1 full outer, B by d2;
D = load 'xxx' as (d3:int);
E = join C by d1, D by d3;
F = filter E by d1 > 5;
G = store F into 'dummy';


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Xuefu Zhang (JIRA) at Aug 27, 2010 at 6:54 pm
    [ https://issues.apache.org/jira/browse/PIG-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Xuefu Zhang updated PIG-1574:
    -----------------------------

    Attachment: jira-1574-1.patch
    Optimization rule PushUpFilter causes filter to be pushed up out joins
    ----------------------------------------------------------------------

    Key: PIG-1574
    URL: https://issues.apache.org/jira/browse/PIG-1574
    Project: Pig
    Issue Type: Bug
    Reporter: Xuefu Zhang
    Assignee: Xuefu Zhang
    Fix For: 0.8.0

    Attachments: jira-1574-1.patch


    The PushUpFilter optimization rule in the new logical plan moves the filter up to one of the join branch. It does this aggressively by find an operator that has all the projection UIDs. However, it didn't consider that the found operator might be another join. If that join is outer, then we cannot simply move the filter to one of its branches.
    As an example, the following script will be erroneously optimized:
    A = load 'myfile' as (d1:int);
    B = load 'anotherfile' as (d2:int);
    C = join A by d1 full outer, B by d2;
    D = load 'xxx' as (d3:int);
    E = join C by d1, D by d3;
    F = filter E by d1 > 5;
    G = store F into 'dummy';
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Xuefu Zhang (JIRA) at Aug 27, 2010 at 7:05 pm
    [ https://issues.apache.org/jira/browse/PIG-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Xuefu Zhang updated PIG-1574:
    -----------------------------

    Attachment: (was: jira-1574-1.patch)
    Optimization rule PushUpFilter causes filter to be pushed up out joins
    ----------------------------------------------------------------------

    Key: PIG-1574
    URL: https://issues.apache.org/jira/browse/PIG-1574
    Project: Pig
    Issue Type: Bug
    Reporter: Xuefu Zhang
    Assignee: Xuefu Zhang
    Fix For: 0.8.0


    The PushUpFilter optimization rule in the new logical plan moves the filter up to one of the join branch. It does this aggressively by find an operator that has all the projection UIDs. However, it didn't consider that the found operator might be another join. If that join is outer, then we cannot simply move the filter to one of its branches.
    As an example, the following script will be erroneously optimized:
    A = load 'myfile' as (d1:int);
    B = load 'anotherfile' as (d2:int);
    C = join A by d1 full outer, B by d2;
    D = load 'xxx' as (d3:int);
    E = join C by d1, D by d3;
    F = filter E by d1 > 5;
    G = store F into 'dummy';
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Xuefu Zhang (JIRA) at Aug 27, 2010 at 7:10 pm
    [ https://issues.apache.org/jira/browse/PIG-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Xuefu Zhang updated PIG-1574:
    -----------------------------

    Status: Patch Available (was: Open)
    Optimization rule PushUpFilter causes filter to be pushed up out joins
    ----------------------------------------------------------------------

    Key: PIG-1574
    URL: https://issues.apache.org/jira/browse/PIG-1574
    Project: Pig
    Issue Type: Bug
    Reporter: Xuefu Zhang
    Assignee: Xuefu Zhang
    Fix For: 0.8.0

    Attachments: jira-1574-1.patch


    The PushUpFilter optimization rule in the new logical plan moves the filter up to one of the join branch. It does this aggressively by find an operator that has all the projection UIDs. However, it didn't consider that the found operator might be another join. If that join is outer, then we cannot simply move the filter to one of its branches.
    As an example, the following script will be erroneously optimized:
    A = load 'myfile' as (d1:int);
    B = load 'anotherfile' as (d2:int);
    C = join A by d1 full outer, B by d2;
    D = load 'xxx' as (d3:int);
    E = join C by d1, D by d3;
    F = filter E by d1 > 5;
    G = store F into 'dummy';
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Xuefu Zhang (JIRA) at Aug 27, 2010 at 7:10 pm
    [ https://issues.apache.org/jira/browse/PIG-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Xuefu Zhang updated PIG-1574:
    -----------------------------

    Attachment: jira-1574-1.patch
    Optimization rule PushUpFilter causes filter to be pushed up out joins
    ----------------------------------------------------------------------

    Key: PIG-1574
    URL: https://issues.apache.org/jira/browse/PIG-1574
    Project: Pig
    Issue Type: Bug
    Reporter: Xuefu Zhang
    Assignee: Xuefu Zhang
    Fix For: 0.8.0

    Attachments: jira-1574-1.patch


    The PushUpFilter optimization rule in the new logical plan moves the filter up to one of the join branch. It does this aggressively by find an operator that has all the projection UIDs. However, it didn't consider that the found operator might be another join. If that join is outer, then we cannot simply move the filter to one of its branches.
    As an example, the following script will be erroneously optimized:
    A = load 'myfile' as (d1:int);
    B = load 'anotherfile' as (d2:int);
    C = join A by d1 full outer, B by d2;
    D = load 'xxx' as (d3:int);
    E = join C by d1, D by d3;
    F = filter E by d1 > 5;
    G = store F into 'dummy';
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Daniel Dai (JIRA) at Aug 30, 2010 at 7:48 am
    [ https://issues.apache.org/jira/browse/PIG-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Daniel Dai updated PIG-1574:
    ----------------------------

    Status: Resolved (was: Patch Available)
    Hadoop Flags: [Reviewed]
    Resolution: Fixed

    test-patch result:
    jira-1574-1.patch

    [exec] +1 overall.
    [exec]
    [exec] +1 @author. The patch does not contain any @author tags.
    [exec]
    [exec] +1 tests included. The patch appears to include 3 new or modified tests.
    [exec]
    [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
    [exec]
    [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
    [exec]
    [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
    [exec]
    [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

    This patch does not push filter before join if the join is outer join. Actually we can push filter to the outer side of the join. I assume it will be addressed in PIG-1575.

    Patch jira-1574-1.patch committed. Thanks Xuefu!
    Optimization rule PushUpFilter causes filter to be pushed up out joins
    ----------------------------------------------------------------------

    Key: PIG-1574
    URL: https://issues.apache.org/jira/browse/PIG-1574
    Project: Pig
    Issue Type: Bug
    Reporter: Xuefu Zhang
    Assignee: Xuefu Zhang
    Fix For: 0.8.0

    Attachments: jira-1574-1.patch


    The PushUpFilter optimization rule in the new logical plan moves the filter up to one of the join branch. It does this aggressively by find an operator that has all the projection UIDs. However, it didn't consider that the found operator might be another join. If that join is outer, then we cannot simply move the filter to one of its branches.
    As an example, the following script will be erroneously optimized:
    A = load 'myfile' as (d1:int);
    B = load 'anotherfile' as (d2:int);
    C = join A by d1 full outer, B by d2;
    D = load 'xxx' as (d3:int);
    E = join C by d1, D by d3;
    F = filter E by d1 > 5;
    G = store F into 'dummy';
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriespig, hadoop
postedAug 27, '10 at 6:45p
activeAug 30, '10 at 7:48a
posts6
users1
websitepig.apache.org

1 user in discussion

Daniel Dai (JIRA): 6 posts

People

Translate

site design / logo © 2022 Grokbase