Grokbase Groups Pig user March 2009
FAQ
Thanks Alan.

In your original reply you did: filter grpd by COUNT(fltr) = 0. I saw that
there's another way of doing this with: filter grpd by IsEmpty(fltr)
Would IsEmpty preform better since I don't really need the exact count ?

Tamir
On Thu, Mar 12, 2009 at 5:47 PM, Alan Gates wrote:

COGROUP is outer by default, so you need not add it.

Alan.


On Mar 12, 2009, at 12:21 AM, Tamir Kamara wrote:

Thanks,
I read in the manual that INNER ensures that only bags with at least one
tuple is returned, but I need to return bags with zero tuples (key not in
the fltr file). So it seems that INNER won't help me.
There's an OUTER keyword but with no explanation of what it does. will
that
be good is this case ?

On Wed, Mar 11, 2009 at 5:59 PM, Mridul Muralidharan
wrote:

Use of INNER should remove the need for filter ?

- Mridul


Alan Gates wrote:

I think this will do what you want. It cogroups the two files, filters
out entries where the bag for the filter_file is not empty, and then
returns
just entries from the main file.

Alan.

main = load 'main_file';
fltr = load 'filter_file';
grpd = cogroup main by $0, fltr by $0; -- or replace $0 with whatever
your
key is
fltrd = filter grpd by COUNT(fltr) = 0;
rslt = foreach fltrd generate flatten(main);


On Mar 11, 2009, at 6:49 AM, Tamir Kamara wrote:

Hi,
Is it possible to filter one file by a key not present in the other
(similar
to NOT IN or LEFT JOIN & IS NULL that can be done in DB) ?

Thanks,
Tamir

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 9 of 11 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedMar 11, '09 at 1:50p
activeMar 16, '09 at 3:02p
posts11
users4
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase