I'm still fairly new at MapReduce, but here's my thoughts the solution.
Use the Item as the Key, the Count as the Value, in the Reducer, sum up all of the Count's and output the Item,sum(Count). To make it more efficient, use the same Reducer as the Combiner.
Then do a 2nd Job where you map the Count as the Key, and Item as the Value, use 1 Reducer, and Identity Reduce it (e.g. don't do any reducing, just output the Count,Item).
Aaron Baff | Developer | Telescope, Inc.
email: aa
[email protected] | office: 424 270 2913 | www.telescope.tv
Bored with summer reruns? Spice up your TV week by watching and voting for your favorite act on America's Got Talent, 9pm ET/CT Tuesday nights on NBC.
The information contained in this email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Any views expressed in this message are those of the individual and may not necessarily reflect the views of Telescope Inc. or its associated companies.
-----Original Message-----
From: Neil Ghosh
Sent: Friday, September 10, 2010 3:51 PM
To: James Seigel
Cc:
[email protected]Subject: Re: TOP N items
Thanks James,
This gives me only N results for sure but not necessarily the top N
I have used the Item as Key and Count as Value as input to the reducer.
and my reducing logic is to sum the count for a particular item.
Now my output comes as grouped but not in order.
Do I need to use custom comparator ?
Thanks
Neil
On Sat, Sep 11, 2010 at 2:41 AM, James Seigel wrote:Welcome to the land of the fuzzy elephant!
Of course there are many ways to do it. Here is one, it might not be
brilliant or the right was, but I am sure you will get more :)
Use the identity mapper...
job.setMapperClass(Mapper.class);
then have one reducer....
job.setNumReduceTasks(1);
then have a reducer that has something like this around your reducing
code...
Counter counter = context.getCounter("ME", "total output records"
);
if (counter.getValue() < LIMIT) {
<do your reducey stuff here>
context.write(key, value);
counter.increment(1);
}
Cheers
James.
On 2010-09-10, at 3:04 PM, Neil Ghosh wrote:
Hello ,
I am new to Hadoop.Can anybody suggest any example or procedure of
outputting TOP N items having maximum total count, where the input file has
have (Item, count ) pair in each line .
Items can repeat.
Thanks
Neil
http://neilghosh.com--
Thanks and Regards
Neil
http://neilghosh.com --
Thanks and Regards
Neil
http://neilghosh.com