Grokbase Groups Pig user March 2009
FAQ
Doesn't matter. Underneath they both call bag.size() to find the size
of the bag. And this call causes very little overhead because the bag
counts as elements are inserted.

Alan.
On Mar 12, 2009, at 9:13 AM, Tamir Kamara wrote:

Thanks Alan.

In your original reply you did: filter grpd by COUNT(fltr) = 0. I
saw that
there's another way of doing this with: filter grpd by IsEmpty(fltr)
Would IsEmpty preform better since I don't really need the exact
count ?

Tamir
On Thu, Mar 12, 2009 at 5:47 PM, Alan Gates wrote:

COGROUP is outer by default, so you need not add it.

Alan.


On Mar 12, 2009, at 12:21 AM, Tamir Kamara wrote:

Thanks,
I read in the manual that INNER ensures that only bags with at
least one
tuple is returned, but I need to return bags with zero tuples (key
not in
the fltr file). So it seems that INNER won't help me.
There's an OUTER keyword but with no explanation of what it does.
will
that
be good is this case ?

On Wed, Mar 11, 2009 at 5:59 PM, Mridul Muralidharan
wrote:

Use of INNER should remove the need for filter ?

- Mridul


Alan Gates wrote:

I think this will do what you want. It cogroups the two files,
filters
out entries where the bag for the filter_file is not empty, and
then
returns
just entries from the main file.

Alan.

main = load 'main_file';
fltr = load 'filter_file';
grpd = cogroup main by $0, fltr by $0; -- or replace $0 with
whatever
your
key is
fltrd = filter grpd by COUNT(fltr) = 0;
rslt = foreach fltrd generate flatten(main);


On Mar 11, 2009, at 6:49 AM, Tamir Kamara wrote:

Hi,
Is it possible to filter one file by a key not present in the
other
(similar
to NOT IN or LEFT JOIN & IS NULL that can be done in DB) ?

Thanks,
Tamir

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 11 of 11 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedMar 11, '09 at 1:50p
activeMar 16, '09 at 3:02p
posts11
users4
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase