Looks like FLATTEN(tuple) results in single null when tuple is null,
irrespective of the schema.
As as result, the particular ends up with fewer columns than expected. This
can lead to various kinds of problems.. runtime exceptions, incorrect values
A = load 'x.txt' as (a, t:(b,c), d:);
*(5,,8)* -- note NULL for 't'.
B = foreach A generate a, FLATTEN(t), d;
*(5,,8)* -- only three fields. results are unpredictable and never correct.
I think the correct output should have been :
(1, 2, 3, 4)
It is quite hard for a user to figure this out. PIG know what is expected.
Is there work around for this?
We are thinking of writing a UDF that returns a tuple with NULLs when the
input is null. But it looks like UDFContext does not have context for a pure
UDF (store and load UDFs have). will start another thread about that.
tested with Pig 0.8.1.