Grokbase Groups Pig user June 2011
FAQ
Looks like FLATTEN(tuple) results in single null when tuple is null,
irrespective of the schema.

As as result, the particular ends up with fewer columns than expected. This
can lead to various kinds of problems.. runtime exceptions, incorrect values
etc.

E.g.
A = load 'x.txt' as (a, t:(b,c), d:);
dump A;
*(1,(2,3),4)*
*(5,,8)* -- note NULL for 't'.
B = foreach A generate a, FLATTEN(t), d;
dump B;
*(1,2,3,4)*
*(5,,8)* -- only three fields. results are unpredictable and never correct.

I think the correct output should have been :
(1, 2, 3, 4)
(5,,,8)

It is quite hard for a user to figure this out. PIG know what is expected.
Is there work around for this?

We are thinking of writing a UDF that returns a tuple with NULLs when the
input is null. But it looks like UDFContext does not have context for a pure
UDF (store and load UDFs have). will start another thread about that.

tested with Pig 0.8.1.

Thanks,
Raghu.

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 1 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedJun 27, '11 at 8:49p
activeJun 27, '11 at 8:49p
posts1
users1
websitepig.apache.org

1 user in discussion

Raghu Angadi: 1 post

People

Translate

site design / logo © 2021 Grokbase