Grokbase Groups Pig user March 2009
FAQ
Read the problem and the script wrong - INNER wont help, true.
You are right, you would need what Alan has proposed.

Regards,
Mridul

Tamir Kamara wrote:
Thanks,

I read in the manual that INNER ensures that only bags with at least one
tuple is returned, but I need to return bags with zero tuples (key not in
the fltr file). So it seems that INNER won't help me.
There's an OUTER keyword but with no explanation of what it does. will that
be good is this case ?

On Wed, Mar 11, 2009 at 5:59 PM, Mridul Muralidharan
wrote:
Use of INNER should remove the need for filter ?

- Mridul


Alan Gates wrote:
I think this will do what you want. It cogroups the two files, filters
out entries where the bag for the filter_file is not empty, and then returns
just entries from the main file.

Alan.

main = load 'main_file';
fltr = load 'filter_file';
grpd = cogroup main by $0, fltr by $0; -- or replace $0 with whatever your
key is
fltrd = filter grpd by COUNT(fltr) = 0;
rslt = foreach fltrd generate flatten(main);


On Mar 11, 2009, at 6:49 AM, Tamir Kamara wrote:

Hi,
Is it possible to filter one file by a key not present in the other
(similar
to NOT IN or LEFT JOIN & IS NULL that can be done in DB) ?

Thanks,
Tamir

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 7 of 11 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedMar 11, '09 at 1:50p
activeMar 16, '09 at 3:02p
posts11
users4
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase