Grokbase Groups Pig user August 2010
The way I would go about doing it is to stream the data through a dept separator then do your order limit. This would heuristically cut the input data size down and give you some normalization. There is a way to do it with a group first, but that seems more difficult to me.


Neil Kodner wrote:

I'm trying to perform a top-n query in pig. For example's sake, lets say my
input data is
(employeeid, departmentid, salary).

I'm trying to get the top n-highest-salaried employees of each department.

I would start off by grouping the data by department but am not sure how to
sort and limit the grouped data before flattening it into my output rows.

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 9 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedAug 22, '10 at 4:26p
activeAug 23, '10 at 2:28p



site design / logo © 2021 Grokbase