Grokbase Groups Pig user August 2011
FAQ
Hello,
I have some opinion about pig commands implementation procedure:
For example:
pig commands(from TestNewPlanLogToPhyTranslationVisitor.java):
a = load 'd1.txt' as (id, c);
b = load 'd2.txt'as (id, c);
c = load 'd3.txt' as (id, c);
d = join a by id, b by c;
e = filter d by a::id==NULL AND b::c==NULL;
f = join e by b::c, c by id;
g = filter f by b::id==NULL AND c::c==NULL;
store g into 'empty2';
Pig will use buildPlan method to get LogicalPlan like this:
---g: Filter scope-24 Schema: {e::a::id: bytearray,e::a::c: bytearray,e::b::id: bytearray,e::b::c: bytearray,c::id: bytearray,c::c: bytearray} Type: bag

And scope-23 FieldSchema: boolean Type: boolean
---Equal scope-19 FieldSchema: boolean Type: boolean
---Project scope-17 Projections: [2] Overloaded: false FieldSchema: e::b::id: bytearray Type: bytearray
Input: f: LOJoin scope-16

---Const scope-18( null ) FieldSchema: bytearray Type: bytearray
---Equal scope-22 FieldSchema: boolean Type: boolean

---Project scope-20 Projections: [5] Overloaded: false FieldSchema: c::c: bytearray Type: bytearray
Input: f: LOJoin scope-16

---Const scope-21( null ) FieldSchema: bytearray Type: bytearray
---f: LOJoin scope-16 Schema: {e::a::id: bytearray,e::a::c: bytearray,e::b::id: bytearray,e::b::c: bytearray,c::id: bytearray,c::c: bytearray} Type: bag
Project scope-14 Projections: [3] Overloaded: false FieldSchema: b::c: bytearray Type: bytearray
Input: e: Filter scope-13
Project scope-15 Projections: [0] Overloaded: false FieldSchema: id: bytearray Type: bytearray
Input: c: Load scope-2

---c: Load scope-2 Schema: {id: bytearray,c: bytearray} Type: bag

---e: Filter scope-13 Schema: {a::id: bytearray,a::c: bytearray,b::id: bytearray,b::c: bytearray} Type: bag

And scope-12 FieldSchema: boolean Type: boolean
---Equal scope-8 FieldSchema: boolean Type: boolean
---Project scope-6 Projections: [0] Overloaded: false FieldSchema: a::id: bytearray Type: bytearray
Input: d: LOJoin scope-5

---Const scope-7( null ) FieldSchema: bytearray Type: bytearray
---Equal scope-11 FieldSchema: boolean Type: boolean

---Project scope-9 Projections: [3] Overloaded: false FieldSchema: b::c: bytearray Type: bytearray
Input: d: LOJoin scope-5

---Const scope-10( null ) FieldSchema: bytearray Type: bytearray
---d: LOJoin scope-5 Schema: {a::id: bytearray,a::c: bytearray,b::id: bytearray,b::c: bytearray} Type: bag
Project scope-3 Projections: [0] Overloaded: false FieldSchema: id: bytearray Type: bytearray
Input: a: Load scope-0
Project scope-4 Projections: [1] Overloaded: false FieldSchema: c: bytearray Type: bytearray
Input: b: Load scope-1

---a: Load scope-0 Schema: {id: bytearray,c: bytearray} Type: bag

---b: Load scope-1 Schema: {id: bytearray,c: bytearray} Type: bag
I assume the commands analysis and middle data storage are all based on HashMap structure. Is this correct?
I found some test cases result are based on the result of HashMap analysis. Then in my opinion, our test case output result should not be single. As we know the output of HashMap analysis is not steadfast. Please give your opinion about my words. Thank you.

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 2 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedAug 23, '11 at 10:20a
activeAug 23, '11 at 8:04p
posts2
users2
websitepig.apache.org

2 users in discussion

Daniel Dai: 1 post Lulynn_2008: 1 post

People

Translate

site design / logo © 2022 Grokbase