FAQ
Hey Kun,

Keys given to a given reducer instance are given in sorted order. Meaning,
for a given reducer JVM instance, the reduce function will be called several
times, once for each key. The order in which the keys are given to the
reduce function are sorted. The sorting happens in the shuffle phase, which
is basically partitioning and sorting. That said, if you have one reducer
(which isn't possible in large jobs), keys will be given to you in sorted
order.

You may be interested in the combiner phase, which is essentially a mini
reduce that happens before data is transferred between mapper and reducer:

<http://wiki.apache.org/hadoop/HadoopMapReduce> (grep for "combine")

You may also find these videos useful:
<http://www.cloudera.com/hadoop-training-mapreduce-hdfs>
<http://www.cloudera.com/hadoop-training-programming-with-hadoop>

Hope this helps. Let me know if I misunderstood your question.

Alex
On Mon, Jun 15, 2009 at 4:22 PM, Kunsheng Chen wrote:


Hi everyone,

Is there anyway to sort the "keys" before Reduce but after Map ?


I also think of sorting keys myself in Reduce function, but it might take
too many memory once the number of results getting large.

I am thinking of using some numeric value as "keys" in Reduce (which was
calculate by Map). If it is possible, I could output my results by some
orders easily.


Thanks in advance,

-Kun


Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 6 | next ›
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 15, '09 at 11:23p
activeJun 17, '09 at 11:13p
posts6
users5
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase