FAQ
Hi !
I was asking myself if it could be a good idea to parallelize some of the
alogorithms of Hadoop, such as MergeSorter, for the case a single job of
run on a multicore system. If so, I would like to propose a patch that do
it, by adding a parameter like io.sort.parallelism.
If the number of sorting threads is less than this value, Hadoop will start
more than one thread.

Thanks
Brice

Search Discussions

  • Doug Cutting at May 5, 2008 at 6:29 pm

    Brice Arnould wrote:
    I was asking myself if it could be a good idea to parallelize some of the
    alogorithms of Hadoop, such as MergeSorter, for the case a single job of
    run on a multicore system.
    One can already exploit parallelism on a multicore system by using
    "pseudo-distributed" mode and increasing
    mapred.tasktracker.map.tasks.maximum and
    mapred.tasktracker.reduce.tasks.maximum.

    LocalRunner should also someday be enhanced to run multiple maps and
    reduces in separate threads, which would be more efficient, since
    intermediate data would not need to travel through the loopback network
    interface. But I don't see an urgent case for making the sort code
    itself multi-threaded, since MapReduce itself performs parallel sorting.

    Doug
  • Brice Arnould at May 7, 2008 at 8:45 am

    On Mon, 05 May 2008 11:29:00 -0700, Doug Cutting wrote:
    Brice Arnould wrote:
    I was asking myself if it could be a good idea to parallelize some of the
    alogorithms of Hadoop, such as MergeSorter, for the case a single job of
    run on a multicore system.
    One can already exploit parallelism on a multicore system by using
    "pseudo-distributed" mode and increasing
    mapred.tasktracker.map.tasks.maximum and
    mapred.tasktracker.reduce.tasks.maximum.
    LocalRunner should also someday be enhanced to run multiple maps and
    reduces in separate threads, which would be more efficient, since
    intermediate data would not need to travel through the loopback network
    interface. But I don't see an urgent case for making the sort code
    itself multi-threaded, since MapReduce itself performs parallel sorting.
    Sorry, I really had misunderstood the way it works. Thanks for your
    explanations, I'm going to look at LocalJobRunner.

    Brice

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedMay 5, '08 at 9:12a
activeMay 7, '08 at 8:45a
posts3
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase