FAQ
slow reduce copies - map output locations not being fetched even when map complete
----------------------------------------------------------------------------------

Key: HADOOP-2980
URL: https://issues.apache.org/jira/browse/HADOOP-2980
Project: Hadoop Core
Issue Type: Bug
Affects Versions: 0.15.3
Reporter: Joydeep Sen Sarma


maps are long finished. reduces are stuck looking for map locations. they make progress - but slowly. it almost seems like they get new map locations every minute or so:

2008-03-07 18:50:52,737 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 done copying task_200803041231_3586_m_004620_0 output from hadoop082.sf2p.facebook.com..
2008-03-07 18:50:53,733 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0: Got 0 new map-outputs & 0 obsolete map-outputs from tasktracker and 0 map-outputs from previous failures
2008-03-07 18:50:53,733 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 Got 0 known map output location(s); scheduling...
...
2008-03-07 18:51:49,767 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 Got 50 known map output location(s); scheduling...
2008-03-07 18:51:49,767 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 Scheduled 41 of 50 known outputs (0 slow hosts and 9 dup hosts)

they get about 50 locations at a time and this 1 minute delay pattern is surprisingly common ..

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Devaraj Das (JIRA) at Mar 9, 2008 at 11:29 am
    [ https://issues.apache.org/jira/browse/HADOOP-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576752#action_12576752 ]

    Devaraj Das commented on HADOOP-2980:
    -------------------------------------

    There is a fixed delay between two consecutive polls to the JobTracker. But as of 0.16, the way it works is if tasks run out of map output locations, the tasktracker polls the JT (minimum delay between two consecutive polls is set as 5 secs).
    slow reduce copies - map output locations not being fetched even when map complete
    ----------------------------------------------------------------------------------

    Key: HADOOP-2980
    URL: https://issues.apache.org/jira/browse/HADOOP-2980
    Project: Hadoop Core
    Issue Type: Bug
    Affects Versions: 0.15.3
    Reporter: Joydeep Sen Sarma

    maps are long finished. reduces are stuck looking for map locations. they make progress - but slowly. it almost seems like they get new map locations every minute or so:
    2008-03-07 18:50:52,737 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 done copying task_200803041231_3586_m_004620_0 output from hadoop082.sf2p.facebook.com..
    2008-03-07 18:50:53,733 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0: Got 0 new map-outputs & 0 obsolete map-outputs from tasktracker and 0 map-outputs from previous failures
    2008-03-07 18:50:53,733 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 Got 0 known map output location(s); scheduling...
    ...
    2008-03-07 18:51:49,767 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 Got 50 known map output location(s); scheduling...
    2008-03-07 18:51:49,767 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 Scheduled 41 of 50 known outputs (0 slow hosts and 9 dup hosts)
    they get about 50 locations at a time and this 1 minute delay pattern is surprisingly common ..
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Joydeep Sen Sarma (JIRA) at Mar 12, 2008 at 7:46 am
    [ https://issues.apache.org/jira/browse/HADOOP-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577740#action_12577740 ]

    Joydeep Sen Sarma commented on HADOOP-2980:
    -------------------------------------------

    any guesses on what might be going wrong or what u might want us to log/look at? the tasktracker logs look clean to me (at least i dont see any exceptions and such). but this is happening all the time and i want to get to the bottom of it ..
    slow reduce copies - map output locations not being fetched even when map complete
    ----------------------------------------------------------------------------------

    Key: HADOOP-2980
    URL: https://issues.apache.org/jira/browse/HADOOP-2980
    Project: Hadoop Core
    Issue Type: Bug
    Affects Versions: 0.15.3
    Reporter: Joydeep Sen Sarma

    maps are long finished. reduces are stuck looking for map locations. they make progress - but slowly. it almost seems like they get new map locations every minute or so:
    2008-03-07 18:50:52,737 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 done copying task_200803041231_3586_m_004620_0 output from hadoop082.sf2p.facebook.com..
    2008-03-07 18:50:53,733 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0: Got 0 new map-outputs & 0 obsolete map-outputs from tasktracker and 0 map-outputs from previous failures
    2008-03-07 18:50:53,733 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 Got 0 known map output location(s); scheduling...
    ...
    2008-03-07 18:51:49,767 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 Got 50 known map output location(s); scheduling...
    2008-03-07 18:51:49,767 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 Scheduled 41 of 50 known outputs (0 slow hosts and 9 dup hosts)
    they get about 50 locations at a time and this 1 minute delay pattern is surprisingly common ..
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Devaraj Das (JIRA) at Mar 12, 2008 at 9:14 am
    [ https://issues.apache.org/jira/browse/HADOOP-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577767#action_12577767 ]

    Devaraj Das commented on HADOOP-2980:
    -------------------------------------

    The '50' can be explained since the default probe size (for mapcompletionevents) is 50. The probe size increases with the number of copier threads via the calculation: _max(numCopiers*5, 50)_ So one thing that you should do is to increase the number of copier threads (mapred.reduce.parallel.copies to something like 20).
    But I am not able to clearly explain the 1 minute interval between the queries. This is one scenario when it can happen - discussed in HADOOP-1719 (https://issues.apache.org/jira/browse/HADOOP-1719?focusedCommentId=12522954#action_12522954). The other scenario where this can happen is when there is a failure to fetch output from a host and the task backs off between 1 & 5 minutes from that host. But you say that you don't see any exceptions...
    BTW the above issues are fixed in 0.16.
    slow reduce copies - map output locations not being fetched even when map complete
    ----------------------------------------------------------------------------------

    Key: HADOOP-2980
    URL: https://issues.apache.org/jira/browse/HADOOP-2980
    Project: Hadoop Core
    Issue Type: Bug
    Affects Versions: 0.15.3
    Reporter: Joydeep Sen Sarma

    maps are long finished. reduces are stuck looking for map locations. they make progress - but slowly. it almost seems like they get new map locations every minute or so:
    2008-03-07 18:50:52,737 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 done copying task_200803041231_3586_m_004620_0 output from hadoop082.sf2p.facebook.com..
    2008-03-07 18:50:53,733 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0: Got 0 new map-outputs & 0 obsolete map-outputs from tasktracker and 0 map-outputs from previous failures
    2008-03-07 18:50:53,733 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 Got 0 known map output location(s); scheduling...
    ...
    2008-03-07 18:51:49,767 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 Got 50 known map output location(s); scheduling...
    2008-03-07 18:51:49,767 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 Scheduled 41 of 50 known outputs (0 slow hosts and 9 dup hosts)
    they get about 50 locations at a time and this 1 minute delay pattern is surprisingly common ..
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Joydeep Sen Sarma (JIRA) at Mar 13, 2008 at 11:15 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578530#action_12578530 ]

    Joydeep Sen Sarma commented on HADOOP-2980:
    -------------------------------------------

    yeah - it's not the 1719 jira. the copies are all done and no new locations come back before 1 additional minute. there are no slow copies and no backoffs (which is the 5 minute thing) in the logs.

    doesn't seem like there are any changes in 0.16 that would seem to resolve the situation then. i am sure i can find this with enough additional instrumentation. just haven't had the time to ..

    is there an ipc timeout of 60 seconds somewhere lower down the stack?


    slow reduce copies - map output locations not being fetched even when map complete
    ----------------------------------------------------------------------------------

    Key: HADOOP-2980
    URL: https://issues.apache.org/jira/browse/HADOOP-2980
    Project: Hadoop Core
    Issue Type: Bug
    Affects Versions: 0.15.3
    Reporter: Joydeep Sen Sarma

    maps are long finished. reduces are stuck looking for map locations. they make progress - but slowly. it almost seems like they get new map locations every minute or so:
    2008-03-07 18:50:52,737 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 done copying task_200803041231_3586_m_004620_0 output from hadoop082.sf2p.facebook.com..
    2008-03-07 18:50:53,733 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0: Got 0 new map-outputs & 0 obsolete map-outputs from tasktracker and 0 map-outputs from previous failures
    2008-03-07 18:50:53,733 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 Got 0 known map output location(s); scheduling...
    ...
    2008-03-07 18:51:49,767 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 Got 50 known map output location(s); scheduling...
    2008-03-07 18:51:49,767 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 Scheduled 41 of 50 known outputs (0 slow hosts and 9 dup hosts)
    they get about 50 locations at a time and this 1 minute delay pattern is surprisingly common ..
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Devaraj Das (JIRA) at Mar 16, 2008 at 2:24 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579191#action_12579191 ]

    Devaraj Das commented on HADOOP-2980:
    -------------------------------------

    Before you spend time instrumenting the code, could you please apply https://issues.apache.org/jira/secure/attachment/12372447/1719.patch and see whether it helps. I checked that the patch applies cleanly to 0.15.3. Thanks!
    slow reduce copies - map output locations not being fetched even when map complete
    ----------------------------------------------------------------------------------

    Key: HADOOP-2980
    URL: https://issues.apache.org/jira/browse/HADOOP-2980
    Project: Hadoop Core
    Issue Type: Bug
    Affects Versions: 0.15.3
    Reporter: Joydeep Sen Sarma

    maps are long finished. reduces are stuck looking for map locations. they make progress - but slowly. it almost seems like they get new map locations every minute or so:
    2008-03-07 18:50:52,737 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 done copying task_200803041231_3586_m_004620_0 output from hadoop082.sf2p.facebook.com..
    2008-03-07 18:50:53,733 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0: Got 0 new map-outputs & 0 obsolete map-outputs from tasktracker and 0 map-outputs from previous failures
    2008-03-07 18:50:53,733 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 Got 0 known map output location(s); scheduling...
    ...
    2008-03-07 18:51:49,767 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 Got 50 known map output location(s); scheduling...
    2008-03-07 18:51:49,767 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 Scheduled 41 of 50 known outputs (0 slow hosts and 9 dup hosts)
    they get about 50 locations at a time and this 1 minute delay pattern is surprisingly common ..
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Sameer Al-Sakran (JIRA) at Aug 22, 2008 at 12:07 am
    [ https://issues.apache.org/jira/browse/HADOOP-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624523#action_12624523 ]

    Sameer Al-Sakran commented on HADOOP-2980:
    ------------------------------------------

    I'm seeing this behaviour in 0.17.0, only waits are anywhere from 1-60 minutes. Happens sporadically, but relatively frequently.
    slow reduce copies - map output locations not being fetched even when map complete
    ----------------------------------------------------------------------------------

    Key: HADOOP-2980
    URL: https://issues.apache.org/jira/browse/HADOOP-2980
    Project: Hadoop Core
    Issue Type: Bug
    Affects Versions: 0.15.3
    Reporter: Joydeep Sen Sarma

    maps are long finished. reduces are stuck looking for map locations. they make progress - but slowly. it almost seems like they get new map locations every minute or so:
    2008-03-07 18:50:52,737 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 done copying task_200803041231_3586_m_004620_0 output from hadoop082.sf2p.facebook.com..
    2008-03-07 18:50:53,733 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0: Got 0 new map-outputs & 0 obsolete map-outputs from tasktracker and 0 map-outputs from previous failures
    2008-03-07 18:50:53,733 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 Got 0 known map output location(s); scheduling...
    ...
    2008-03-07 18:51:49,767 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 Got 50 known map output location(s); scheduling...
    2008-03-07 18:51:49,767 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 Scheduled 41 of 50 known outputs (0 slow hosts and 9 dup hosts)
    they get about 50 locations at a time and this 1 minute delay pattern is surprisingly common ..
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedMar 8, '08 at 3:05a
activeAug 22, '08 at 12:07a
posts7
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Sameer Al-Sakran (JIRA): 7 posts

People

Translate

site design / logo © 2022 Grokbase