FAQ
Hi,
I was going through FAQs on Hadoop to optimize the performance of
map/reduce. There is a suggestion to set the number of reducers to a prime
number closest to the number of nodes and number of mappers a prime number
closest to several times the number of nodes in the cluster.
What performance advantages do these numbers give? Obviously doing so
improved the performance of my map reduce jobs considerably. Interested to
know the principles behind it.

Thanks,
Richa Khandelwal


University Of California,
Santa Cruz.
Ph:425-241-7763

Search Discussions

  • Owen O'Malley at Mar 17, 2009 at 4:57 pm

    On Mar 17, 2009, at 9:18 AM, Richa Khandelwal wrote:

    I was going through FAQs on Hadoop to optimize the performance of
    map/reduce. There is a suggestion to set the number of reducers to a
    prime
    number closest to the number of nodes and number of mappers a prime
    number
    closest to several times the number of nodes in the cluster.
    There is no need for the number of reduces to be prime. The only thing
    it helps is if you are using the HashPartitioner and your key's hash
    function is too linear. In practice, you usually want to use 99% of
    your reduce capacity of the cluster.

    -- Owen

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMar 17, '09 at 4:19p
activeMar 17, '09 at 4:57p
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Owen O'Malley: 1 post Richa Khandelwal: 1 post

People

Translate

site design / logo © 2022 Grokbase