I have some opinion about pig commands implementation procedure:
For example:
pig commands(from TestNewPlanLogToPhyTranslationVisitor.java):
a = load 'd1.txt' as (id, c);
b = load 'd2.txt'as (id, c);
c = load 'd3.txt' as (id, c);
d = join a by id, b by c;
e = filter d by a::id==NULL AND b::c==NULL;
f = join e by b::c, c by id;
g = filter f by b::id==NULL AND c::c==NULL;
store g into 'empty2';
Pig will use buildPlan method to get LogicalPlan like this:
---g: Filter scope-24 Schema: {e::a::id: bytearray,e::a::c: bytearray,e::b::id: bytearray,e::b::c: bytearray,c::id: bytearray,c::c: bytearray} Type: bag
And scope-23 FieldSchema: boolean Type: boolean
---Equal scope-19 FieldSchema: boolean Type: boolean
---Project scope-20 Projections: [5] Overloaded: false FieldSchema: c::c: bytearray Type: bytearray
Input: f: LOJoin scope-16
---Const scope-21( null ) FieldSchema: bytearray Type: bytearray
---f: LOJoin scope-16 Schema: {e::a::id: bytearray,e::a::c: bytearray,e::b::id: bytearray,e::b::c: bytearray,c::id: bytearray,c::c: bytearray} Type: bag
Project scope-14 Projections: [3] Overloaded: false FieldSchema: b::c: bytearray Type: bytearray
Input: e: Filter scope-13
Project scope-15 Projections: [0] Overloaded: false FieldSchema: id: bytearray Type: bytearray
Input: c: Load scope-2
---c: Load scope-2 Schema: {id: bytearray,c: bytearray} Type: bag
---e: Filter scope-13 Schema: {a::id: bytearray,a::c: bytearray,b::id: bytearray,b::c: bytearray} Type: bag
And scope-12 FieldSchema: boolean Type: boolean
---Equal scope-8 FieldSchema: boolean Type: boolean
---Project scope-9 Projections: [3] Overloaded: false FieldSchema: b::c: bytearray Type: bytearray
Input: d: LOJoin scope-5
---Const scope-10( null ) FieldSchema: bytearray Type: bytearray
---d: LOJoin scope-5 Schema: {a::id: bytearray,a::c: bytearray,b::id: bytearray,b::c: bytearray} Type: bag
Project scope-3 Projections: [0] Overloaded: false FieldSchema: id: bytearray Type: bytearray
Input: a: Load scope-0
Project scope-4 Projections: [1] Overloaded: false FieldSchema: c: bytearray Type: bytearray
Input: b: Load scope-1
---a: Load scope-0 Schema: {id: bytearray,c: bytearray} Type: bag
---b: Load scope-1 Schema: {id: bytearray,c: bytearray} Type: bag
I assume the commands analysis and middle data storage are all based on HashMap structure. Is this correct?And scope-23 FieldSchema: boolean Type: boolean
---Equal scope-19 FieldSchema: boolean Type: boolean
---Project scope-17 Projections: [2] Overloaded: false FieldSchema: e::b::id: bytearray Type: bytearray
Input: f: LOJoin scope-16
---Const scope-18( null ) FieldSchema: bytearray Type: bytearray
---Equal scope-22 FieldSchema: boolean Type: booleanInput: f: LOJoin scope-16
---Const scope-18( null ) FieldSchema: bytearray Type: bytearray
---Project scope-20 Projections: [5] Overloaded: false FieldSchema: c::c: bytearray Type: bytearray
Input: f: LOJoin scope-16
---Const scope-21( null ) FieldSchema: bytearray Type: bytearray
---f: LOJoin scope-16 Schema: {e::a::id: bytearray,e::a::c: bytearray,e::b::id: bytearray,e::b::c: bytearray,c::id: bytearray,c::c: bytearray} Type: bag
Input: e: Filter scope-13
Project scope-15 Projections: [0] Overloaded: false FieldSchema: id: bytearray Type: bytearray
Input: c: Load scope-2
---c: Load scope-2 Schema: {id: bytearray,c: bytearray} Type: bag
---e: Filter scope-13 Schema: {a::id: bytearray,a::c: bytearray,b::id: bytearray,b::c: bytearray} Type: bag
And scope-12 FieldSchema: boolean Type: boolean
---Equal scope-8 FieldSchema: boolean Type: boolean
---Project scope-6 Projections: [0] Overloaded: false FieldSchema: a::id: bytearray Type: bytearray
Input: d: LOJoin scope-5
---Const scope-7( null ) FieldSchema: bytearray Type: bytearray
---Equal scope-11 FieldSchema: boolean Type: booleanInput: d: LOJoin scope-5
---Const scope-7( null ) FieldSchema: bytearray Type: bytearray
---Project scope-9 Projections: [3] Overloaded: false FieldSchema: b::c: bytearray Type: bytearray
Input: d: LOJoin scope-5
---Const scope-10( null ) FieldSchema: bytearray Type: bytearray
---d: LOJoin scope-5 Schema: {a::id: bytearray,a::c: bytearray,b::id: bytearray,b::c: bytearray} Type: bag
Input: a: Load scope-0
Project scope-4 Projections: [1] Overloaded: false FieldSchema: c: bytearray Type: bytearray
Input: b: Load scope-1
---a: Load scope-0 Schema: {id: bytearray,c: bytearray} Type: bag
---b: Load scope-1 Schema: {id: bytearray,c: bytearray} Type: bag
I found some test cases result are based on the result of HashMap analysis. Then in my opinion, our test case output result should not be single. As we know the output of HashMap analysis is not steadfast. Please give your opinion about my words. Thank you.