Grokbase Groups Pig user March 2009
FAQ
Hello,
I have just noticed that the implicit split is added in the wrong place in this plan. I am just examining the plan for the Pig script that is available in the jira issue: https://issues.apache.org/jira/browse/PIG-627

A = load 'data' as (a, b, c);
B = filter A by a > 5;
store B into 'output1';
C = group B by b;
store C into 'output2';

The plan logical plan is below. I think the split operator
should be placed before the filter. And so the filter will
be performed on only one branch not on both.

Store 1-14 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: Unknown
---SplitOutput[B] 1-21 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag
Const 1-20 FieldSchema: boolean Type: boolean

---Split 1-19 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag

---Filter 1-13 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag

GreaterThan 1-12 FieldSchema: boolean Type: boolean
---Const 1-11 FieldSchema: int Type: int

---Cast 1-18 FieldSchema: int Type: int

---Project 1-10 Projections: [0] Overloaded: false FieldSchema: a: bytearray Type: bytearray
Input: Load 1-9

---Load 1-9 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag
Store 1-17 Schema: {group: bytearray,B: {a: bytearray,b: bytearray,c: bytearray}} Type: Unknown
---CoGroup 1-16 Schema: {group: bytearray,B: {a: bytearray,b: bytearray,c: bytearray}} Type: bag
Project 1-15 Projections: [1] Overloaded: false FieldSchema: b: bytearray Type: bytearray
Input: SplitOutput[B] 1-23

---SplitOutput[B] 1-23 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag
Const 1-22 FieldSchema: boolean Type: boolean

---Split 1-19 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag

---Filter 1-13 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag

GreaterThan 1-12 FieldSchema: boolean Type: boolean
---Const 1-11 FieldSchema: int Type: int

---Cast 1-18 FieldSchema: int Type: int

---Project 1-10 Projections: [0] Overloaded: false FieldSchema: a: bytearray Type: bytearray
Input: Load 1-9

---Load 1-9 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag
Thanks,
Iman.




__________________________________________________________________
Yahoo! Canada Toolbar: Search from anywhere on the web, and bookmark your favourite sites. Download it now at
http://ca.toolbar.yahoo.com.

Search Discussions

  • Gunther Hagleitner at Mar 23, 2009 at 6:25 pm
    Hi,

    I believe the split is in the right place. Both B and C need to have the
    filter performed before they are stored. Also, the filter is only going to
    be run once - load (1-9), filter (1-13) and split (1-19) are the same
    operator in both paths. I've attached a graphical representation of the same
    logical plan (which I think is easier to read).

    If you wanted the filter to be performed only on the non-cogroup path, for
    instance, the script would have to read:

    A = load 'data' as (a, b, c);
    B = filter A by a > 5;
    store B into 'output1';
    C = group A by b; -- Use pre-filter handle A, instead of B
    store C into 'output2';

    Thanks,
    Gunther.
    On 3/22/09 7:40 PM, "Iman Elghandour" wrote:

    Hello,
    I have just noticed that the implicit split is added in the wrong place in
    this plan. I am just examining the plan for the Pig script that is available
    in the jira issue: https://issues.apache.org/jira/browse/PIG-627

    A = load 'data' as (a, b, c);
    B = filter A by a > 5;
    store B into 'output1';
    C = group B by b;
    store C into 'output2';

    The plan logical plan is below. I think the split operator
    should be placed before the filter. And so the filter will
    be performed on only one branch not on both.

    Store 1-14 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: Unknown
    ---SplitOutput[B] 1-21 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag
    Const 1-20 FieldSchema: boolean Type: boolean

    ---Split 1-19 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag

    ---Filter 1-13 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag
    GreaterThan 1-12 FieldSchema: boolean Type: boolean
    ---Const 1-11 FieldSchema: int Type: int

    ---Cast 1-18 FieldSchema: int Type: int

    ---Project 1-10 Projections: [0] Overloaded: false
    FieldSchema: a: bytearray Type: bytearray
    Input: Load 1-9

    ---Load 1-9 Schema: {a: bytearray,b: bytearray,c: bytearray}
    Type: bag

    Store 1-17 Schema: {group: bytearray,B: {a: bytearray,b: bytearray,c:
    bytearray}} Type: Unknown
    ---CoGroup 1-16 Schema: {group: bytearray,B: {a: bytearray,b: bytearray,c:
    bytearray}} Type: bag
    Project 1-15 Projections: [1] Overloaded: false FieldSchema: b:
    bytearray Type: bytearray
    Input: SplitOutput[B] 1-23

    ---SplitOutput[B] 1-23 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag
    Const 1-22 FieldSchema: boolean Type: boolean

    ---Split 1-19 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag
    ---Filter 1-13 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag
    GreaterThan 1-12 FieldSchema: boolean Type: boolean
    ---Const 1-11 FieldSchema: int Type: int

    ---Cast 1-18 FieldSchema: int Type: int

    ---Project 1-10 Projections: [0] Overloaded: false
    FieldSchema: a: bytearray Type: bytearray
    Input: Load 1-9

    ---Load 1-9 Schema: {a: bytearray,b: bytearray,c: bytearray}
    Type: bag

    Thanks,
    Iman.




    __________________________________________________________________
    Yahoo! Canada Toolbar: Search from anywhere on the web, and bookmark your
    favourite sites. Download it now at
    http://ca.toolbar.yahoo.com.
  • Iman Elghandour at Mar 24, 2009 at 2:16 pm
    Thank you so much, Gunther for the clarification.
    Iman.

    --- On Mon, 3/23/09, Gunther Hagleitner wrote:

    From: Gunther Hagleitner <hagleitn@yahoo-inc.com>
    Subject: Re: implicit splits in multiquery plans
    To: pig-user@hadoop.apache.org
    Received: Monday, March 23, 2009, 2:24 PM

    Hi,

    I believe the split is in the right place. Both B and C need to have the
    filter performed before they are stored. Also, the filter is only going to
    be run once - load (1-9), filter (1-13) and split (1-19) are the same
    operator in both paths. I've attached a graphical representation of the same
    logical plan (which I think is easier to read).

    If you wanted the filter to be performed only on the non-cogroup path, for
    instance, the script would have to read:

    A = load 'data' as (a, b, c);
    B = filter A by a > 5;
    store B into 'output1';
    C = group A by b; -- Use pre-filter handle A, instead of B
    store C into 'output2';

    Thanks,
    Gunther.
    On 3/22/09 7:40 PM, "Iman Elghandour" wrote:

    Hello,
    I have just noticed that the implicit split is added in the wrong place in
    this plan. I am just examining the plan for the Pig script that is available
    in the jira issue: https://issues.apache.org/jira/browse/PIG-627

    A = load 'data' as (a, b, c);
    B = filter A by a > 5;
    store B into 'output1';
    C = group B by b;
    store C into 'output2';

    The plan logical plan is below. I think the split operator
    should be placed before the filter. And so the filter will
    be performed on only one branch not on both.

    Store 1-14 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: Unknown
    ---SplitOutput[B] 1-21 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag
    Const 1-20 FieldSchema: boolean Type: boolean

    ---Split 1-19 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag

    ---Filter 1-13 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag
    GreaterThan 1-12 FieldSchema: boolean Type: boolean
    ---Const 1-11 FieldSchema: int Type: int

    ---Cast 1-18 FieldSchema: int Type: int

    ---Project 1-10 Projections: [0] Overloaded: false
    FieldSchema: a: bytearray Type: bytearray
    Input: Load 1-9

    ---Load 1-9 Schema: {a: bytearray,b: bytearray,c: bytearray}
    Type: bag

    Store 1-17 Schema: {group: bytearray,B: {a: bytearray,b: bytearray,c:
    bytearray}} Type: Unknown
    ---CoGroup 1-16 Schema: {group: bytearray,B: {a: bytearray,b: bytearray,c:
    bytearray}} Type: bag
    Project 1-15 Projections: [1] Overloaded: false FieldSchema: b:
    bytearray Type: bytearray
    Input: SplitOutput[B] 1-23

    ---SplitOutput[B] 1-23 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag
    Const 1-22 FieldSchema: boolean Type: boolean

    ---Split 1-19 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag
    ---Filter 1-13 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag
    GreaterThan 1-12 FieldSchema: boolean Type: boolean
    ---Const 1-11 FieldSchema: int Type: int

    ---Cast 1-18 FieldSchema: int Type: int

    ---Project 1-10 Projections: [0] Overloaded: false
    FieldSchema: a: bytearray Type: bytearray
    Input: Load 1-9

    ---Load 1-9 Schema: {a: bytearray,b: bytearray,c: bytearray}
    Type: bag

    Thanks,
    Iman.




    __________________________________________________________________
    Yahoo! Canada Toolbar: Search from anywhere on the web, and bookmark your
    favourite sites. Download it now at
    http://ca.toolbar.yahoo.com.



    __________________________________________________________________
    Looking for the perfect gift? Give the gift of Flickr!

    http://www.flickr.com/gift/

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedMar 23, '09 at 2:41a
activeMar 24, '09 at 2:16p
posts3
users2
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase