FAQ
There is a suggestion to set the number of reducers to a prime number
closest to the number of nodes and number of mappers a prime number
closest to several times the number of nodes in the cluster. But there
is also saying that "There is no need for the number of reduces to be
prime. The only thing it helps is if you are using the HashPartitioner
and your key's hash function is too linear. In practice, you usually
want to use 99% of your reduce capacity of the cluster."

Could anyone explain what is the theory behind the prime number and the
hash function here?

Shi

Search Discussions

  • Aniket ray at Oct 25, 2010 at 4:12 am
    http://computinglife.wordpress.com/2008/11/20/why-do-hash-functions-use-prime-numbers/

    <http://computinglife.wordpress.com/2008/11/20/why-do-hash-functions-use-prime-numbers/>discusses
    the theory in detail.
    On Sun, Oct 24, 2010 at 7:30 AM, Shi Yu wrote:

    There is a suggestion to set the number of reducers to a prime number
    closest to the number of nodes and number of mappers a prime number closest
    to several times the number of nodes in the cluster. But there is also
    saying that "There is no need for the number of reduces to be prime. The
    only thing it helps is if you are using the HashPartitioner and your key's
    hash function is too linear. In practice, you usually want to use 99% of
    your reduce capacity of the cluster."

    Could anyone explain what is the theory behind the prime number and the
    hash function here?

    Shi
  • Tsz Wo \(Nicholas\), Sze at Oct 25, 2010 at 5:08 am
    You may also see Knuth's "The Art of Computer Programming". I remember that
    there is a discussion about prime number and hash function. (It should be in
    Volume 3 Chapter 6. There is a section about hashing. Sorry that I don't have
    the book with me and can't give you the page numbers.)


    Nicholas



    ________________________________
    From: aniket ray <aniket.ray@gmail.com>
    To: common-user@hadoop.apache.org
    Sent: Mon, October 25, 2010 12:12:16 PM
    Subject: Re: Prime number of reduces vs. linear hash function

    http://computinglife.wordpress.com/2008/11/20/why-do-hash-functions-use-prime-numbers/


    <http://computinglife.wordpress.com/2008/11/20/why-do-hash-functions-use-prime-numbers/>discusses

    the theory in detail.
    On Sun, Oct 24, 2010 at 7:30 AM, Shi Yu wrote:

    There is a suggestion to set the number of reducers to a prime number
    closest to the number of nodes and number of mappers a prime number closest
    to several times the number of nodes in the cluster. But there is also
    saying that "There is no need for the number of reduces to be prime. The
    only thing it helps is if you are using the HashPartitioner and your key's
    hash function is too linear. In practice, you usually want to use 99% of
    your reduce capacity of the cluster."

    Could anyone explain what is the theory behind the prime number and the
    hash function here?

    Shi
  • Owen O'Malley at Oct 27, 2010 at 6:07 pm
    Prime numbers only matter if the hash function is bad and you are using a
    hash partitioner. In most cases, the hashes are fine and thus the number of
    reduces can be dictated by the desired degree of parallelism.

    -- Owen

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedOct 24, '10 at 2:00a
activeOct 27, '10 at 6:07p
posts4
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase