Grokbase Groups Pig user January 2011
FAQ
Yep, just keep reading until you you can't or you get a valid record.
I like to increment a "bad record" counter when that happens., for sanity checks.

-----Original Message-----
From: "Mark Tozzi" <mark.tozzi@gmail.com>
To: user@pig.apache.org
Sent: 12/31/2010 4:59 AM
Subject: discarding bad rows in load UDF

Hi all,

I'm working on a custom load UDF. Part of the motivation is to be
able to filter out lines in my input data which are not well formed,
as this is easy to detect during the load. What should the UDF do
when it encounters such a line though? I have tried returning null,
and that seems to terminate reading from that split. Should I just
loop through the RecordReader until I find a good row or run out of
data?

Thanks,

--Mark Tozzi

Search Discussions

  • Mark Tozzi at Jan 1, 2011 at 5:26 pm
    Hi all,

    I'm working on a custom load UDF. Part of the motivation is to be
    able to filter out lines in my input data which are not well formed,
    as this is easy to detect during the load. What should the UDF do
    when it encounters such a line though? I have tried returning null,
    and that seems to terminate reading from that split. Should I just
    loop through the RecordReader until I find a good row or run out of
    data?

    Thanks,

    --Mark Tozzi

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedJan 1, '11 at 5:14p
activeJan 1, '11 at 5:26p
posts2
users2
websitepig.apache.org

2 users in discussion

Dmitriy Ryaboy: 1 post Mark Tozzi: 1 post

People

Translate

site design / logo © 2021 Grokbase