Grokbase Groups Pig user August 2010
FAQ
The way I would go about doing it is to stream the data through a dept separator then do your order limit. This would heuristically cut the input data size down and give you some normalization. There is a way to do it with a group first, but that seems more difficult to me.

Matt

Neil Kodner wrote:

I'm trying to perform a top-n query in pig. For example's sake, lets say my
input data is
(employeeid, departmentid, salary).

I'm trying to get the top n-highest-salaried employees of each department.

I would start off by grouping the data by department but am not sure how to
sort and limit the grouped data before flattening it into my output rows.

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 9 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedAug 22, '10 at 4:26p
activeAug 23, '10 at 2:28p
posts9
users4
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase