FAQ
optimize allocation of tasks w/ local data
------------------------------------------

Key: HADOOP-173
URL: http://issues.apache.org/jira/browse/HADOOP-173
Project: Hadoop
Type: Improvement

Components: mapred
Versions: 0.2
Reporter: Doug Cutting
Assigned to: Doug Cutting


When a job first starts, all task trackers ask the job tracker for jobs at once. With lots of task trackers, the job tracker gets very slow. The first type of task that the job tracker attempts to find is one with some of its input data stored on the same node as the task tracker. This case currently loops through tasks blindly, which, on average, requires numHosts/(replication*2) iterations to find a match (I think). This could be optimized by adding a table mapping from host to task.


--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira

Search Discussions

  • Doug Cutting (JIRA) at Apr 27, 2006 at 9:00 pm
    [ http://issues.apache.org/jira/browse/HADOOP-173?page=all ]

    Doug Cutting updated HADOOP-173:
    --------------------------------

    Attachment: fast-local-task.patch

    This patch optimizes the jobtracker's allocation of tasks to nodes that have local data. I have tested it, but not yet on a large cluster.
    optimize allocation of tasks w/ local data
    ------------------------------------------

    Key: HADOOP-173
    URL: http://issues.apache.org/jira/browse/HADOOP-173
    Project: Hadoop
    Type: Improvement
    Components: mapred
    Versions: 0.2
    Reporter: Doug Cutting
    Assignee: Doug Cutting
    Attachments: fast-local-task.patch

    When a job first starts, all task trackers ask the job tracker for jobs at once. With lots of task trackers, the job tracker gets very slow. The first type of task that the job tracker attempts to find is one with some of its input data stored on the same node as the task tracker. This case currently loops through tasks blindly, which, on average, requires numHosts/(replication*2) iterations to find a match (I think). This could be optimized by adding a table mapping from host to task.
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators:
    http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see:
    http://www.atlassian.com/software/jira
  • Doug Cutting (JIRA) at Apr 28, 2006 at 5:07 pm
    [ http://issues.apache.org/jira/browse/HADOOP-173?page=all ]

    Doug Cutting resolved HADOOP-173:
    ---------------------------------

    Fix Version: 0.2
    Resolution: Fixed

    I committed this.
    optimize allocation of tasks w/ local data
    ------------------------------------------

    Key: HADOOP-173
    URL: http://issues.apache.org/jira/browse/HADOOP-173
    Project: Hadoop
    Type: Improvement
    Components: mapred
    Versions: 0.2
    Reporter: Doug Cutting
    Assignee: Doug Cutting
    Fix For: 0.2
    Attachments: fast-local-task.patch

    When a job first starts, all task trackers ask the job tracker for jobs at once. With lots of task trackers, the job tracker gets very slow. The first type of task that the job tracker attempts to find is one with some of its input data stored on the same node as the task tracker. This case currently loops through tasks blindly, which, on average, requires numHosts/(replication*2) iterations to find a match (I think). This could be optimized by adding a table mapping from host to task.
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators:
    http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see:
    http://www.atlassian.com/software/jira

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedApr 27, '06 at 9:00p
activeApr 28, '06 at 5:07p
posts3
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Doug Cutting (JIRA): 3 posts

People

Translate

site design / logo © 2022 Grokbase