FAQ
Upate thread in FairScheduler runs too frequently
-------------------------------------------------

Key: HADOOP-5185
URL: https://issues.apache.org/jira/browse/HADOOP-5185
Project: Hadoop Core
Issue Type: Bug
Components: contrib/fair-share
Reporter: Vinod K V


The UpdateThread in FairScheduler runs every 500ms (hardcoded). This proves to be very costly when running large clusters. UpdateThread tries to acquire lock on JT object every that often and so seriously affects HeartBeat processing besides everything else. The update interval should be a function of the cluster size. Or in the minimum it should be configurable and by default should be set to a reasonably high default value.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Vinod K V (JIRA) at Feb 6, 2009 at 1:30 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671125#action_12671125 ]

    Vinod K V commented on HADOOP-5185:
    -----------------------------------

    As Hemanth commented offline, update thread doesn't lock the JT as such. But it does lock 'taskTrackers' HashMap via getTotalSlots method. This still can get problematic and needs to be fixed.
    Upate thread in FairScheduler runs too frequently
    -------------------------------------------------

    Key: HADOOP-5185
    URL: https://issues.apache.org/jira/browse/HADOOP-5185
    Project: Hadoop Core
    Issue Type: Bug
    Components: contrib/fair-share
    Reporter: Vinod K V

    The UpdateThread in FairScheduler runs every 500ms (hardcoded). This proves to be very costly when running large clusters. UpdateThread tries to acquire lock on JT object every that often and so seriously affects HeartBeat processing besides everything else. The update interval should be a function of the cluster size. Or in the minimum it should be configurable and by default should be set to a reasonably high default value.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Matei Zaharia (JIRA) at Feb 9, 2009 at 7:33 am
    [ https://issues.apache.org/jira/browse/HADOOP-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671754#action_12671754 ]

    Matei Zaharia commented on HADOOP-5185:
    ---------------------------------------

    As a temporary fix, feel free to submit a patch that scales up the interval based on cluster size or heartbeat interval. Or, if there's a way to make getTotalSlots non-synchronized or cache its result, we should do that, as there is no reason to call this method all the time.

    Incidentally, if we change the fair scheduler logic to not use deficits anymore (which I'm proposing in HADOOP-4803 and seems like a better idea the more I think of it), the update thread could start running much less frequently. The reason it runs so often now is to make the deficit computations accurate so that we don't have too many tasks per job starting/finishing in-between update calls. If we removed deficits, I think the main reason we'd need periodic updates will be preemption, and that check can happen much less frequently.
    Upate thread in FairScheduler runs too frequently
    -------------------------------------------------

    Key: HADOOP-5185
    URL: https://issues.apache.org/jira/browse/HADOOP-5185
    Project: Hadoop Core
    Issue Type: Bug
    Components: contrib/fair-share
    Reporter: Vinod K V

    The UpdateThread in FairScheduler runs every 500ms (hardcoded). This proves to be very costly when running large clusters. UpdateThread tries to acquire lock on JT object every that often and so seriously affects HeartBeat processing besides everything else. The update interval should be a function of the cluster size. Or in the minimum it should be configurable and by default should be set to a reasonably high default value.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Vinod K V (JIRA) at Feb 11, 2009 at 4:47 am
    [ https://issues.apache.org/jira/browse/HADOOP-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672506#action_12672506 ]

    Vinod K V commented on HADOOP-5185:
    -----------------------------------

    Agree Matei. In any case, making this configurable will only help the cause. An update - running JT with a 5 second update interval(instead of the default 500ms) on a 400 node cluster seemed to improve JT's serving of requests when compared to previous runs. Will submit a patch making the interval configurable.
    Upate thread in FairScheduler runs too frequently
    -------------------------------------------------

    Key: HADOOP-5185
    URL: https://issues.apache.org/jira/browse/HADOOP-5185
    Project: Hadoop Core
    Issue Type: Bug
    Components: contrib/fair-share
    Reporter: Vinod K V

    The UpdateThread in FairScheduler runs every 500ms (hardcoded). This proves to be very costly when running large clusters. UpdateThread tries to acquire lock on JT object every that often and so seriously affects HeartBeat processing besides everything else. The update interval should be a function of the cluster size. Or in the minimum it should be configurable and by default should be set to a reasonably high default value.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Matei Zaharia (JIRA) at Mar 20, 2009 at 12:18 am
    [ https://issues.apache.org/jira/browse/HADOOP-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683691#action_12683691 ]

    Matei Zaharia commented on HADOOP-5185:
    ---------------------------------------

    Is anyone from Yahoo working on this? It's pretty straightforward to add a config option.
    Upate thread in FairScheduler runs too frequently
    -------------------------------------------------

    Key: HADOOP-5185
    URL: https://issues.apache.org/jira/browse/HADOOP-5185
    Project: Hadoop Core
    Issue Type: Bug
    Components: contrib/fair-share
    Reporter: Vinod K V

    The UpdateThread in FairScheduler runs every 500ms (hardcoded). This proves to be very costly when running large clusters. UpdateThread tries to acquire lock on JT object every that often and so seriously affects HeartBeat processing besides everything else. The update interval should be a function of the cluster size. Or in the minimum it should be configurable and by default should be set to a reasonably high default value.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at Jun 2, 2009 at 7:21 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715632#action_12715632 ]

    dhruba borthakur commented on HADOOP-5185:
    ------------------------------------------

    Does anybody have a patch for this one? I am seeing this issue in our cluster. I am thinking of changing running it with a value of 1000 ms (instead of 500ms).
    Upate thread in FairScheduler runs too frequently
    -------------------------------------------------

    Key: HADOOP-5185
    URL: https://issues.apache.org/jira/browse/HADOOP-5185
    Project: Hadoop Core
    Issue Type: Bug
    Components: contrib/fair-share
    Reporter: Vinod K V

    The UpdateThread in FairScheduler runs every 500ms (hardcoded). This proves to be very costly when running large clusters. UpdateThread tries to acquire lock on JT object every that often and so seriously affects HeartBeat processing besides everything else. The update interval should be a function of the cluster size. Or in the minimum it should be configurable and by default should be set to a reasonably high default value.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedFeb 6, '09 at 9:40a
activeJun 2, '09 at 7:21p
posts6
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

dhruba borthakur (JIRA): 6 posts

People

Translate

site design / logo © 2022 Grokbase