FAQ
[ https://issues.apache.org/jira/browse/HADOOP-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-2141:
--------------------------------

Attachment: 2141.5.patch

A well tested patch. Results are pretty good w.r.t slot utilization vis-a-vis job run-time. The tests are still going on.
speculative execution start up condition based on completion time
-----------------------------------------------------------------

Key: HADOOP-2141
URL: https://issues.apache.org/jira/browse/HADOOP-2141
Project: Hadoop Core
Issue Type: Improvement
Components: mapred
Affects Versions: 0.21.0
Reporter: Koji Noguchi
Assignee: Andy Konwinski
Fix For: 0.21.0

Attachments: 2141.4.patch, 2141.5.patch, 2141.patch, HADOOP-2141-v2.patch, HADOOP-2141-v3.patch, HADOOP-2141-v4.patch, HADOOP-2141-v5.patch, HADOOP-2141-v6.patch, HADOOP-2141.patch, HADOOP-2141.v7.patch, HADOOP-2141.v8.patch


We had one job with speculative execution hang.
4 reduce tasks were stuck with 95% completion because of a bad disk.
Devaraj pointed out
bq . One of the conditions that must be met for launching a speculative instance of a task is that it must be at least 20% behind the average progress, and this is not true here.
It would be nice if speculative execution also starts up when tasks stop making progress.
Devaraj suggested
bq. Maybe, we should introduce a condition for average completion time for tasks in the speculative execution check.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Devaraj Das (JIRA) at Jun 9, 2009 at 9:11 am
    [ https://issues.apache.org/jira/browse/HADOOP-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Devaraj Das updated HADOOP-2141:
    --------------------------------

    Status: Patch Available (was: Open)
    speculative execution start up condition based on completion time
    -----------------------------------------------------------------

    Key: HADOOP-2141
    URL: https://issues.apache.org/jira/browse/HADOOP-2141
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Affects Versions: 0.21.0
    Reporter: Koji Noguchi
    Assignee: Andy Konwinski
    Fix For: 0.21.0

    Attachments: 2141.4.patch, 2141.5.patch, 2141.patch, HADOOP-2141-v2.patch, HADOOP-2141-v3.patch, HADOOP-2141-v4.patch, HADOOP-2141-v5.patch, HADOOP-2141-v6.patch, HADOOP-2141.patch, HADOOP-2141.v7.patch, HADOOP-2141.v8.patch


    We had one job with speculative execution hang.
    4 reduce tasks were stuck with 95% completion because of a bad disk.
    Devaraj pointed out
    bq . One of the conditions that must be met for launching a speculative instance of a task is that it must be at least 20% behind the average progress, and this is not true here.
    It would be nice if speculative execution also starts up when tasks stop making progress.
    Devaraj suggested
    bq. Maybe, we should introduce a condition for average completion time for tasks in the speculative execution check.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Devaraj Das (JIRA) at Jun 9, 2009 at 9:11 am
    [ https://issues.apache.org/jira/browse/HADOOP-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Devaraj Das updated HADOOP-2141:
    --------------------------------

    Fix Version/s: 0.21.0
    Status: Open (was: Patch Available)
    speculative execution start up condition based on completion time
    -----------------------------------------------------------------

    Key: HADOOP-2141
    URL: https://issues.apache.org/jira/browse/HADOOP-2141
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Affects Versions: 0.21.0
    Reporter: Koji Noguchi
    Assignee: Andy Konwinski
    Fix For: 0.21.0

    Attachments: 2141.4.patch, 2141.5.patch, 2141.patch, HADOOP-2141-v2.patch, HADOOP-2141-v3.patch, HADOOP-2141-v4.patch, HADOOP-2141-v5.patch, HADOOP-2141-v6.patch, HADOOP-2141.patch, HADOOP-2141.v7.patch, HADOOP-2141.v8.patch


    We had one job with speculative execution hang.
    4 reduce tasks were stuck with 95% completion because of a bad disk.
    Devaraj pointed out
    bq . One of the conditions that must be met for launching a speculative instance of a task is that it must be at least 20% behind the average progress, and this is not true here.
    It would be nice if speculative execution also starts up when tasks stop making progress.
    Devaraj suggested
    bq. Maybe, we should introduce a condition for average completion time for tasks in the speculative execution check.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Devaraj Das (JIRA) at Jun 12, 2009 at 8:42 am
    [ https://issues.apache.org/jira/browse/HADOOP-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Devaraj Das updated HADOOP-2141:
    --------------------------------

    Attachment: 2141.6.patch

    Attaching a patch with minor changes. For a sort job on a ~200 node cluster, the number of speculative tasks launched with this patch is only ~10% of the number of task launches with the trunk. The job run time is almost the same. From the tasks that is chosen, I've seen at least 30% accuracy in correctly choosing the tasks for speculation. In some cases, I even saw 100%.
    speculative execution start up condition based on completion time
    -----------------------------------------------------------------

    Key: HADOOP-2141
    URL: https://issues.apache.org/jira/browse/HADOOP-2141
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Affects Versions: 0.21.0
    Reporter: Koji Noguchi
    Assignee: Andy Konwinski
    Fix For: 0.21.0

    Attachments: 2141.4.patch, 2141.5.patch, 2141.6.patch, 2141.patch, HADOOP-2141-v2.patch, HADOOP-2141-v3.patch, HADOOP-2141-v4.patch, HADOOP-2141-v5.patch, HADOOP-2141-v6.patch, HADOOP-2141.patch, HADOOP-2141.v7.patch, HADOOP-2141.v8.patch


    We had one job with speculative execution hang.
    4 reduce tasks were stuck with 95% completion because of a bad disk.
    Devaraj pointed out
    bq . One of the conditions that must be met for launching a speculative instance of a task is that it must be at least 20% behind the average progress, and this is not true here.
    It would be nice if speculative execution also starts up when tasks stop making progress.
    Devaraj suggested
    bq. Maybe, we should introduce a condition for average completion time for tasks in the speculative execution check.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Devaraj Das (JIRA) at Jun 12, 2009 at 11:48 am
    [ https://issues.apache.org/jira/browse/HADOOP-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Devaraj Das updated HADOOP-2141:
    --------------------------------

    Attachment: 2141.7.patch

    Attached patch addresses the concerns.
    speculative execution start up condition based on completion time
    -----------------------------------------------------------------

    Key: HADOOP-2141
    URL: https://issues.apache.org/jira/browse/HADOOP-2141
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Affects Versions: 0.21.0
    Reporter: Koji Noguchi
    Assignee: Andy Konwinski
    Fix For: 0.21.0

    Attachments: 2141.4.patch, 2141.5.patch, 2141.6.patch, 2141.7.patch, 2141.patch, HADOOP-2141-v2.patch, HADOOP-2141-v3.patch, HADOOP-2141-v4.patch, HADOOP-2141-v5.patch, HADOOP-2141-v6.patch, HADOOP-2141.patch, HADOOP-2141.v7.patch, HADOOP-2141.v8.patch


    We had one job with speculative execution hang.
    4 reduce tasks were stuck with 95% completion because of a bad disk.
    Devaraj pointed out
    bq . One of the conditions that must be met for launching a speculative instance of a task is that it must be at least 20% behind the average progress, and this is not true here.
    It would be nice if speculative execution also starts up when tasks stop making progress.
    Devaraj suggested
    bq. Maybe, we should introduce a condition for average completion time for tasks in the speculative execution check.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Devaraj Das (JIRA) at Jun 15, 2009 at 5:28 am
    [ https://issues.apache.org/jira/browse/HADOOP-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Devaraj Das updated HADOOP-2141:
    --------------------------------

    Attachment: 2141.8.2.patch

    This patch has an improved testcase, and also fixes some java doc.
    ant run-test-mapred/test-patch passed with this patch.
    speculative execution start up condition based on completion time
    -----------------------------------------------------------------

    Key: HADOOP-2141
    URL: https://issues.apache.org/jira/browse/HADOOP-2141
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Affects Versions: 0.21.0
    Reporter: Koji Noguchi
    Assignee: Andy Konwinski
    Fix For: 0.21.0

    Attachments: 2141.4.patch, 2141.5.patch, 2141.6.patch, 2141.7.patch, 2141.8.2.patch, 2141.patch, HADOOP-2141-v2.patch, HADOOP-2141-v3.patch, HADOOP-2141-v4.patch, HADOOP-2141-v5.patch, HADOOP-2141-v6.patch, HADOOP-2141.patch, HADOOP-2141.v7.patch, HADOOP-2141.v8.patch


    We had one job with speculative execution hang.
    4 reduce tasks were stuck with 95% completion because of a bad disk.
    Devaraj pointed out
    bq . One of the conditions that must be met for launching a speculative instance of a task is that it must be at least 20% behind the average progress, and this is not true here.
    It would be nice if speculative execution also starts up when tasks stop making progress.
    Devaraj suggested
    bq. Maybe, we should introduce a condition for average completion time for tasks in the speculative execution check.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Devaraj Das (JIRA) at Jun 15, 2009 at 10:55 am
    [ https://issues.apache.org/jira/browse/HADOOP-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Devaraj Das updated HADOOP-2141:
    --------------------------------

    Attachment: 2141.8.3.patch

    Attached patch addresses the minor concerns.
    speculative execution start up condition based on completion time
    -----------------------------------------------------------------

    Key: HADOOP-2141
    URL: https://issues.apache.org/jira/browse/HADOOP-2141
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Affects Versions: 0.21.0
    Reporter: Koji Noguchi
    Assignee: Andy Konwinski
    Fix For: 0.21.0

    Attachments: 2141.4.patch, 2141.5.patch, 2141.6.patch, 2141.7.patch, 2141.8.2.patch, 2141.8.3.patch, 2141.patch, HADOOP-2141-v2.patch, HADOOP-2141-v3.patch, HADOOP-2141-v4.patch, HADOOP-2141-v5.patch, HADOOP-2141-v6.patch, HADOOP-2141.patch, HADOOP-2141.v7.patch, HADOOP-2141.v8.patch


    We had one job with speculative execution hang.
    4 reduce tasks were stuck with 95% completion because of a bad disk.
    Devaraj pointed out
    bq . One of the conditions that must be met for launching a speculative instance of a task is that it must be at least 20% behind the average progress, and this is not true here.
    It would be nice if speculative execution also starts up when tasks stop making progress.
    Devaraj suggested
    bq. Maybe, we should introduce a condition for average completion time for tasks in the speculative execution check.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Devaraj Das (JIRA) at Jun 16, 2009 at 4:16 am
    [ https://issues.apache.org/jira/browse/HADOOP-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Devaraj Das updated HADOOP-2141:
    --------------------------------

    Resolution: Fixed
    Release Note: Improves the speculative execution heuristic. The heuristic is currently based on the progress-rates of tasks and the expected time of completion. Also, statistics about trackers are collected, and speculative tasks are not given to the ones deduced to be slow. (was: Updated speculative execution scheduler)
    Hadoop Flags: [Reviewed]
    Status: Resolved (was: Patch Available)

    I just committed this. Thanks Andy!
    speculative execution start up condition based on completion time
    -----------------------------------------------------------------

    Key: HADOOP-2141
    URL: https://issues.apache.org/jira/browse/HADOOP-2141
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Affects Versions: 0.21.0
    Reporter: Koji Noguchi
    Assignee: Andy Konwinski
    Fix For: 0.21.0

    Attachments: 2141.4.patch, 2141.5.patch, 2141.6.patch, 2141.7.patch, 2141.8.2.patch, 2141.8.3.patch, 2141.patch, HADOOP-2141-v2.patch, HADOOP-2141-v3.patch, HADOOP-2141-v4.patch, HADOOP-2141-v5.patch, HADOOP-2141-v6.patch, HADOOP-2141.patch, HADOOP-2141.v7.patch, HADOOP-2141.v8.patch


    We had one job with speculative execution hang.
    4 reduce tasks were stuck with 95% completion because of a bad disk.
    Devaraj pointed out
    bq . One of the conditions that must be met for launching a speculative instance of a task is that it must be at least 20% behind the average progress, and this is not true here.
    It would be nice if speculative execution also starts up when tasks stop making progress.
    Devaraj suggested
    bq. Maybe, we should introduce a condition for average completion time for tasks in the speculative execution check.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedJun 9, '09 at 9:11a
activeJun 16, '09 at 4:16a
posts8
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Devaraj Das (JIRA): 8 posts

People

Translate

site design / logo © 2022 Grokbase