Grokbase Groups Pig user June 2011
FAQ
Hello,

I am seeing some odd behavior when UNION¹ing the input of two avro files.
I¹ve applied the patch from PIG-1890, which solves the bug where tuples are
output twice due to the additional call to setLocation (see PIG-1680).

Unfortunately when I UNION two avro inputs, each tuple seems to be output
twice. The duplication only occur when using AvroStorage with Pig-0.8 or
later.

Any insights to what might be causing this behavior?


Sample pig script:
---
REGISTER avro-1.4.1.jar;
REGISTER json-simple-1.1.jar;
REGISTER piggybank.jar;

-- 1,2,3
A = LOAD 'input_123.avro' USING
org.apache.pig.piggybank.storage.avro.AvroStorage();

-- 7,8,9
B = LOAD 'input_789.avro' USING
org.apache.pig.piggybank.storage.avro.AvroStorage();

C = UNION A, B;
DUMP C;

(1,2,3)
(7,8,9)
(1,2,3)
(7,8,9)
---


Thanks,
Mads

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedJun 18, '11 at 10:08p
activeJun 18, '11 at 10:08p
posts1
users1
websitepig.apache.org

1 user in discussion

Mads Moeller: 1 post

People

Translate

site design / logo © 2022 Grokbase