Grokbase Groups Pig user August 2011
FAQ
Hello,
I have some opinion about pig commands implementation procedure:
For example:
pig commands(from TestNewPlanLogToPhyTranslationVisitor.java):
a = load 'd1.txt' as (id, c);
b = load 'd2.txt'as (id, c);
c = load 'd3.txt' as (id, c);
d = join a by id, b by c;
e = filter d by a::id==NULL AND b::c==NULL;
f = join e by b::c, c by id;
g = filter f by b::id==NULL AND c::c==NULL;
store g into 'empty2';
Pig will use buildPlan method to get LogicalPlan like this:
---g: Filter scope-24 Schema: {e::a::id: bytearray,e::a::c: bytearray,e::b::id: bytearray,e::b::c: bytearray,c::id: bytearray,c::c: bytearray} Type: bag

And scope-23 FieldSchema: boolean Type: boolean
---Equal scope-19 FieldSchema: boolean Type: boolean
---Project scope-17 Projections: [2] Overloaded: false FieldSchema: e::b::id: bytearray Type: bytearray
Input: f: LOJoin scope-16

---Const scope-18( null ) FieldSchema: bytearray Type: bytearray
---Equal scope-22 FieldSchema: boolean Type: boolean

---Project scope-20 Projections: [5] Overloaded: false FieldSchema: c::c: bytearray Type: bytearray
Input: f: LOJoin scope-16

---Const scope-21( null ) FieldSchema: bytearray Type: bytearray
---f: LOJoin scope-16 Schema: {e::a::id: bytearray,e::a::c: bytearray,e::b::id: bytearray,e::b::c: bytearray,c::id: bytearray,c::c: bytearray} Type: bag
Project scope-14 Projections: [3] Overloaded: false FieldSchema: b::c: bytearray Type: bytearray
Input: e: Filter scope-13
Project scope-15 Projections: [0] Overloaded: false FieldSchema: id: bytearray Type: bytearray
Input: c: Load scope-2

---c: Load scope-2 Schema: {id: bytearray,c: bytearray} Type: bag

---e: Filter scope-13 Schema: {a::id: bytearray,a::c: bytearray,b::id: bytearray,b::c: bytearray} Type: bag

And scope-12 FieldSchema: boolean Type: boolean
---Equal scope-8 FieldSchema: boolean Type: boolean
---Project scope-6 Projections: [0] Overloaded: false FieldSchema: a::id: bytearray Type: bytearray
Input: d: LOJoin scope-5

---Const scope-7( null ) FieldSchema: bytearray Type: bytearray
---Equal scope-11 FieldSchema: boolean Type: boolean

---Project scope-9 Projections: [3] Overloaded: false FieldSchema: b::c: bytearray Type: bytearray
Input: d: LOJoin scope-5

---Const scope-10( null ) FieldSchema: bytearray Type: bytearray
---d: LOJoin scope-5 Schema: {a::id: bytearray,a::c: bytearray,b::id: bytearray,b::c: bytearray} Type: bag
Project scope-3 Projections: [0] Overloaded: false FieldSchema: id: bytearray Type: bytearray
Input: a: Load scope-0
Project scope-4 Projections: [1] Overloaded: false FieldSchema: c: bytearray Type: bytearray
Input: b: Load scope-1

---a: Load scope-0 Schema: {id: bytearray,c: bytearray} Type: bag

---b: Load scope-1 Schema: {id: bytearray,c: bytearray} Type: bag
I assume the commands analysis and middle data storage are all based on HashMap structure. Is this correct?
I found some test cases result are based on the result of HashMap analysis. Then in my opinion, our test case output result should not be single. As we know the output of HashMap analysis is not steadfast. Please give your opinion about my words. Thank you.

Search Discussions

  • Daniel Dai at Aug 23, 2011 at 8:04 pm
    Yes, we use HashMap in 0.8.1. In 0.9, we are using ArrayList, so you
    might see fewer issues like this.

    Daniel

    2011/8/23 lulynn_2008 <lulynn_2008@163.com>:
    Hello,
    I have some opinion about pig commands implementation procedure:
    For example:
    pig commands(from TestNewPlanLogToPhyTranslationVisitor.java):
    a = load 'd1.txt' as (id, c);
    b = load 'd2.txt'as (id, c);
    c = load 'd3.txt' as (id, c);
    d = join a by id, b by c;
    e = filter d by a::id==NULL AND b::c==NULL;
    f = join e by b::c, c by id;
    g = filter f by b::id==NULL AND c::c==NULL;
    store g into 'empty2';
    Pig will use buildPlan method to get LogicalPlan like this:
    ---g: Filter scope-24 Schema: {e::a::id: bytearray,e::a::c: bytearray,e::b::id: bytearray,e::b::c: bytearray,c::id: bytearray,c::c: bytearray} Type: bag

    And scope-23 FieldSchema: boolean Type: boolean
    ---Equal scope-19 FieldSchema: boolean Type: boolean
    ---Project scope-17 Projections: [2] Overloaded: false FieldSchema: e::b::id: bytearray Type: bytearray
    Input: f: LOJoin scope-16

    ---Const scope-18( null ) FieldSchema: bytearray Type: bytearray
    ---Equal scope-22 FieldSchema: boolean Type: boolean

    ---Project scope-20 Projections: [5] Overloaded: false FieldSchema: c::c: bytearray Type: bytearray
    Input: f: LOJoin scope-16

    ---Const scope-21( null ) FieldSchema: bytearray Type: bytearray
    ---f: LOJoin scope-16 Schema: {e::a::id: bytearray,e::a::c: bytearray,e::b::id: bytearray,e::b::c: bytearray,c::id: bytearray,c::c: bytearray} Type: bag
    Project scope-14 Projections: [3] Overloaded: false FieldSchema: b::c: bytearray Type: bytearray
    Input: e: Filter scope-13
    Project scope-15 Projections: [0] Overloaded: false FieldSchema: id: bytearray Type: bytearray
    Input: c: Load scope-2

    ---c: Load scope-2 Schema: {id: bytearray,c: bytearray} Type: bag

    ---e: Filter scope-13 Schema: {a::id: bytearray,a::c: bytearray,b::id: bytearray,b::c: bytearray} Type: bag

    And scope-12 FieldSchema: boolean Type: boolean
    ---Equal scope-8 FieldSchema: boolean Type: boolean
    ---Project scope-6 Projections: [0] Overloaded: false FieldSchema: a::id: bytearray Type: bytearray
    Input: d: LOJoin scope-5

    ---Const scope-7( null ) FieldSchema: bytearray Type: bytearray
    ---Equal scope-11 FieldSchema: boolean Type: boolean

    ---Project scope-9 Projections: [3] Overloaded: false FieldSchema: b::c: bytearray Type: bytearray
    Input: d: LOJoin scope-5

    ---Const scope-10( null ) FieldSchema: bytearray Type: bytearray
    ---d: LOJoin scope-5 Schema: {a::id: bytearray,a::c: bytearray,b::id: bytearray,b::c: bytearray} Type: bag
    Project scope-3 Projections: [0] Overloaded: false FieldSchema: id: bytearray Type: bytearray
    Input: a: Load scope-0
    Project scope-4 Projections: [1] Overloaded: false FieldSchema: c: bytearray Type: bytearray
    Input: b: Load scope-1

    ---a: Load scope-0 Schema: {id: bytearray,c: bytearray} Type: bag

    ---b: Load scope-1 Schema: {id: bytearray,c: bytearray} Type: bag
    I assume the commands analysis and middle data storage are all based on HashMap structure. Is this correct?
    I found some test cases result are based on the result of HashMap analysis. Then in my opinion, our test case output result should not be single. As we know the output of HashMap analysis is not  steadfast. Please give your opinion about my words. Thank you.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedAug 23, '11 at 10:20a
activeAug 23, '11 at 8:04p
posts2
users2
websitepig.apache.org

2 users in discussion

Daniel Dai: 1 post Lulynn_2008: 1 post

People

Translate

site design / logo © 2022 Grokbase