FAQ
Hi,

Not sure whether this is the right place to ask, but I'll give it a go:

Is Hadoop able to distribute tasks to individual cores in multicore
nodes? I realise the framework is designed for running on a large number
of unreliable networked nodes, which I suppose is not so much of an
issue when it comes to using multiple CPU cores within one system. But
Hadoop nodes may already be dual or quad core machines. It would be
great if developers could harness this power transparently, I mean
without having to worry about the details of distribution.

Ger-Jan

Search Discussions

  • Toby DiPasquale at Sep 30, 2007 at 11:39 am

    On 9/30/07, Ger-Jan te Dorsthorst wrote:
    Not sure whether this is the right place to ask, but I'll give it a go:

    Is Hadoop able to distribute tasks to individual cores in multicore
    nodes? I realise the framework is designed for running on a large number
    of unreliable networked nodes, which I suppose is not so much of an
    issue when it comes to using multiple CPU cores within one system. But
    Hadoop nodes may already be dual or quad core machines. It would be
    great if developers could harness this power transparently, I mean
    without having to worry about the details of distribution.
    In short, yes. Hadoop's code takes advantage of multiple native
    threads and you can tune the level of concurrency in the system by
    setting mapred.map.tasks and mapred.reduce.tasks to take advantage of
    multiple cores on the nodes which have them.

    --
    Toby DiPasquale
  • Doug Cutting at Oct 1, 2007 at 4:38 pm

    Toby DiPasquale wrote:
    In short, yes. Hadoop's code takes advantage of multiple native
    threads and you can tune the level of concurrency in the system by
    setting mapred.map.tasks and mapred.reduce.tasks to take advantage of
    multiple cores on the nodes which have them.
    More importantly, you should set mapred.tasktracker.tasks.maximum
    according to the number of cores per node. That parameter determines
    how many tasks will be run simultaneously per node. Note that, at this
    point, this parameter is global for the cluster, and not independently
    configurable per node. Someone with a heterogeneous cluster might be
    interested in fixing that someday...

    Doug
  • Ted Dunning at Oct 1, 2007 at 6:18 pm
    That someone should be me since I have just such a cluster, but I find that
    splitting the difference with a value a bit more than desirable on the weak
    nodes and a bit less than desirable on the fast nodes works too well to get
    me to get going on this.

    I blame it on Doug and Co for making the map and reduce tasks take so little
    memory!

    On 10/1/07 9:37 AM, "Doug Cutting" wrote:

    you should set mapred.tasktracker.tasks.maximum
    according to the number of cores per node. ... at this
    point, this parameter is global for the cluster, and not independently
    configurable per node. Someone with a heterogeneous cluster might be
    interested in fixing that someday...

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedSep 30, '07 at 10:58a
activeOct 1, '07 at 6:18p
posts4
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase