FAQ
DataNode : idle rebalancing threads need not take up threads.
-------------------------------------------------------------

Key: HADOOP-4116
URL: https://issues.apache.org/jira/browse/HADOOP-4116
Project: Hadoop Core
Issue Type: Improvement
Components: dfs
Affects Versions: 0.17.0
Reporter: Raghu Angadi



The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.

These waiting threads need not take up a thread.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Raghu Angadi (JIRA) at Sep 8, 2008 at 4:26 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Raghu Angadi updated HADOOP-4116:
    ---------------------------------

    Description:
    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.

    These operations waiting for active rebalancing threads to finish need not take up a thread.

    was:

    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.

    These waiting threads need not take up a thread.

    Summary: DataNode : idle rebalancing operations need not take up threads. (was: DataNode : idle rebalancing threads need not take up threads.)
    DataNode : idle rebalancing operations need not take up threads.
    ----------------------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Improvement
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi

    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Sep 12, 2008 at 6:29 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630647#action_12630647 ]

    Hairong Kuang commented on HADOOP-4116:
    ---------------------------------------

    The more close investigation of the problem shows the balancer needs additional improvements:

    (1) The balancer needs to better handle block move timeout well. Currently it simply assumes that
    the timeouted move is failed but does not take the effort to make sure the move is interrupted and the resources the
    move takes is released. The next phase of scheduling may schedule more blocks to move from the same DataNode thus using
    more and more resources.

    (2) Resource control for the balancing purpose at DataNodes should use a fair Semaphore. Currently
    it uses an unfair Semaphore that makes no guarantees about the order in which threads acquire permits. A
    thread invoking acquire() can be allocated a permit ahead of a thread that has been waiting. Therefore, if a dfs
    cluster has many DataNodes that has a long queue of block move requests, it is very likely to enter the
    following state: A thread in DataNode A holding a permit and asks DataNode B to receive a block, while DataNode B has a
    thread holding a Semaphore and asking DataNode A to receive a block. Although the block move from B to A was scheduled
    much later than the move from A to B, they may be executed simultaneously. Both block receives are blocks on acquiring
    a permit assuming only one permit can be issued. Therefore, a deadlock occurs.
    DataNode : idle rebalancing operations need not take up threads.
    ----------------------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Improvement
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi

    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Sep 12, 2008 at 6:31 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-4116:
    ----------------------------------

    Assignee: Hairong Kuang
    Summary: Balancer should provide better resource management (was: DataNode : idle rebalancing operations need not take up threads.)
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Improvement
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang

    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Sep 12, 2008 at 6:39 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630652#action_12630652 ]

    Raghu Angadi commented on HADOOP-4116:
    --------------------------------------

    Thanks for the detailed analysis Hairong. I filed the original jira just looking at one symptom.
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Improvement
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang

    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Sep 13, 2008 at 12:10 am
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630713#action_12630713 ]

    Hairong Kuang commented on HADOOP-4116:
    ---------------------------------------

    Proposed changes to the Balancer:
    1. Remove the use of Semaphor at DataNodes. Instead a DataNode uses a counter to manages the number of concurrent block moves. On receiving a block move request while maximum block moves are in progress, reject the request immediately.
    2. Let the receiver initiate the block move; The sender rejects the request when the maximum number has already reached. As a result when either the sender or the receiver does not have resource to handle block move, the block content will not get transfered across network.
    3. The balancer does not set a timeout on a socket. Instead, it sets the option KeepAlive on the socket. So a block move does not timeout no matter how slow it goes and next phrase of scheduling does not get started when there is a pending block move.
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Improvement
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang

    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Sep 18, 2008 at 8:15 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-4116:
    ----------------------------------

    Attachment: balancerRM.patch

    A patch for review.
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Improvement
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang
    Attachments: balancerRM.patch


    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Sep 19, 2008 at 12:17 am
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-4116:
    ----------------------------------

    Priority: Blocker (was: Major)
    Fix Version/s: 0.19.0
    0.18.2
    Issue Type: Bug (was: Improvement)
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.18.2, 0.19.0

    Attachments: balancerRM.patch


    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Bo Shi (JIRA) at Sep 19, 2008 at 10:55 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632858#action_12632858 ]

    Bo Shi commented on HADOOP-4116:
    --------------------------------

    Hi Hairong, I'm interested in testing out your fix - is there, by any chance, a patch against the 0.18.x series?

    I've only been able to partially apply your patch to to the official 0.18.1 release and it seems there was a major code reorg between 0.18 and 0.19 branches.
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.18.2, 0.19.0

    Attachments: balancerRM.patch


    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Konstantin Shvachko (JIRA) at Sep 20, 2008 at 1:10 am
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632906#action_12632906 ]

    Konstantin Shvachko commented on HADOOP-4116:
    ---------------------------------------------

    - I do not understand what was the reason for changing the push block model to the pull model.
    Previously balancer would send OP_COPY_BLOCK request to the proxy node, which then sent OP_REPLACE_BLOCK to the target.
    Now it is reversed: the balancer sends OP_REPLACE_BLOCK to the target, which sends OP_COPY_BLOCK to the proxy.
    In both cases DataXceiver is supposed to check the XceiverCount and throw an exception if it is exceeded.
    So in both cases the transfer will fail if any of the two data-nodes are busy.
    I don't see a mistake here, but don't see a reason for this rather radical change either, may be I am missing something.
    - BalanceManager
    -- should probably be called {{BlockBalanceThrottler}}.
    -- It makes sense to derive it from {{BlockTransferThrottler}}.
    -- It should be a static class.
    -- And it should rather be a member of {{DataXceiveServer}} than {{DataNode}}.
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.18.2, 0.19.0

    Attachments: balancerRM.patch


    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Sep 22, 2008 at 6:02 am
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633149#action_12633149 ]

    Hairong Kuang commented on HADOOP-4116:
    ---------------------------------------

    In the push model, if the proxy is allowed to transfer the block but the destination does not allow to receive it. The whole/partial block will still be transferred to the destination's system buffer but then be thrown away. Although the block is not delivered to the target's datanode buffer, network resources are still wasted.

    Bo, I will create a patch to 0.18 the first thing when I get to the office tomorrow.
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.18.2, 0.19.0

    Attachments: balancerRM.patch


    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Sep 23, 2008 at 5:14 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-4116:
    ----------------------------------

    Attachment: balancerRM1.patch

    The patch incorporates Konstantin's comments.
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.18.2, 0.19.0

    Attachments: balancerRM.patch, balancerRM1.patch


    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Sep 23, 2008 at 5:39 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633816#action_12633816 ]

    Raghu Angadi commented on HADOOP-4116:
    --------------------------------------
    3. The balancer does not set a timeout on a socket. Instead, it sets the option KeepAlive on the socket. So a block move does not timeout no matter how slow it goes and next phrase of scheduling does not get started when there is a pending block move.
    How does keep-alive matter? TCP has no inherent timeout for a connection in normal state.
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.18.2, 0.19.0

    Attachments: balancerRM.patch, balancerRM1.patch


    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Sep 23, 2008 at 6:04 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633828#action_12633828 ]

    Hairong Kuang commented on HADOOP-4116:
    ---------------------------------------

    As I said, my intention is that receiveResponse never times out in normal state no matter how slow the other side is. Setting KeepAlive is for detecting the other side's machine gets crashed suddenly so it won't wait there forever. But for all other cases, it will return eventually. Does it make sense?
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.18.2, 0.19.0

    Attachments: balancerRM.patch, balancerRM1.patch


    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Sep 23, 2008 at 7:12 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633849#action_12633849 ]

    Raghu Angadi commented on HADOOP-4116:
    --------------------------------------

    Oh.. it is to detect if the other side has crashed. It make sense. The keepalive might be in the order of hours I guess.

    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.18.2, 0.19.0

    Attachments: balancerRM.patch, balancerRM1.patch


    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Sep 23, 2008 at 8:14 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633874#action_12633874 ]

    Hairong Kuang commented on HADOOP-4116:
    ---------------------------------------

    Yes, the default is 2 hours. My argument is that the performance of Balancer is not of importance comparing to using too many resources. Anyway the administrator can stop the balancer anytime if she sees that the balancer has not made any progress for hours.
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.18.2, 0.19.0

    Attachments: balancerRM.patch, balancerRM1.patch


    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Sep 23, 2008 at 9:36 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633913#action_12633913 ]

    Raghu Angadi commented on HADOOP-4116:
    --------------------------------------

    Yes. I think hours is fine in this case. thanks for the clarification.
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.18.2, 0.19.0

    Attachments: balancerRM.patch, balancerRM1.patch


    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Sep 23, 2008 at 11:22 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-4116:
    ----------------------------------

    Attachment: (was: balancerRM1.patch)
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.18.2, 0.19.0

    Attachments: balancerRM.patch, balancerRM1.patch


    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Sep 23, 2008 at 11:22 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-4116:
    ----------------------------------

    Attachment: balancerRM1.patch
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.18.2, 0.19.0

    Attachments: balancerRM.patch, balancerRM1.patch


    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Konstantin Shvachko (JIRA) at Sep 24, 2008 at 12:52 am
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633984#action_12633984 ]

    Konstantin Shvachko commented on HADOOP-4116:
    ---------------------------------------------

    +1.
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.18.2, 0.19.0

    Attachments: balancerRM.patch, balancerRM1.patch


    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Steve Loughran (JIRA) at Sep 24, 2008 at 9:28 am
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634074#action_12634074 ]

    Steve Loughran commented on HADOOP-4116:
    ----------------------------------------

    keepAlives are a fairly weak way of assessing "liveness" because
    -it works at the network stack level, so your app may still be dead but the KA packets are happy
    -if there are a lot of (idle) connections between two hosts, a lot of KA traffic can be generated, rather than one packet per host, which is how a lot of protocols (CORBA and DCOM, for example) communicate "we are still alive".

    I think this proposal is better than nothing, but we need to be aware of limitations. It will detect a network partition, but not a hung far end if the network stack is still up
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.18.2, 0.19.0

    Attachments: balancerRM.patch, balancerRM1.patch


    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Sep 24, 2008 at 6:42 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634249#action_12634249 ]

    Hairong Kuang commented on HADOOP-4116:
    ---------------------------------------

    I agree that KA does not detect dead apps on the other side. Besides timeouts, is there any good solution to this problem?

    Balancer limits at most 5 concurrent connections from itself to a DataNode, so too many KAs should not be a concern.
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.18.2, 0.19.0

    Attachments: balancerRM.patch, balancerRM1.patch


    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Sep 24, 2008 at 8:36 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-4116:
    ----------------------------------

    Attachment: balancerRM-b18.patch

    Here is a patch for branch 18.
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.18.2, 0.19.0

    Attachments: balancerRM-b18.patch, balancerRM.patch, balancerRM1.patch


    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Sep 24, 2008 at 9:52 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-4116:
    ----------------------------------

    Attachment: (was: balancerRM-b18.patch)
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.18.2, 0.19.0

    Attachments: balancerRM.patch, balancerRM1.patch, balancerRM2-b18.patch


    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Sep 24, 2008 at 9:52 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-4116:
    ----------------------------------

    Attachment: balancerRM2-b18.patch

    This is a patch to branch 18 that has fixed the findbug warning.
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.18.2, 0.19.0

    Attachments: balancerRM.patch, balancerRM1.patch, balancerRM2-b18.patch


    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Sep 24, 2008 at 10:14 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-4116:
    ----------------------------------

    Hadoop Flags: [Reviewed]
    Status: Patch Available (was: Open)
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.18.2, 0.19.0

    Attachments: balancerRM.patch, balancerRM1.patch, balancerRM2-b18.patch, balancerRM2.patch


    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Sep 24, 2008 at 10:14 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-4116:
    ----------------------------------

    Attachment: balancerRM2.patch

    Attach one patch for the trunk, which also has fixed the findbugs warning introduced by the previous trunk patch.
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.18.2, 0.19.0

    Attachments: balancerRM.patch, balancerRM1.patch, balancerRM2-b18.patch, balancerRM2.patch


    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Sep 25, 2008 at 1:18 am
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634358#action_12634358 ]

    Hadoop QA commented on HADOOP-4116:
    -----------------------------------

    -1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12390875/balancerRM2.patch
    against trunk revision 698721.

    +1 @author. The patch does not contain any @author tags.

    +1 tests included. The patch appears to include 3 new or modified tests.

    +1 javadoc. The javadoc tool did not generate any warning messages.

    +1 javac. The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs. The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    -1 core tests. The patch failed core unit tests.

    +1 contrib tests. The patch passed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3365/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3365/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3365/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3365/console

    This message is automatically generated.
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.18.2, 0.19.0

    Attachments: balancerRM.patch, balancerRM1.patch, balancerRM2-b18.patch, balancerRM2.patch


    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Steve Loughran (JIRA) at Sep 25, 2008 at 11:10 am
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634441#action_12634441 ]

    Steve Loughran commented on HADOOP-4116:
    ----------------------------------------

    The ping operation we are proposing for the lifecycle in HADOOP-3628 and HADOOP-3969 does a better health check, as it asks the far end if it thinks it is happy, and can detect a dead end (with suitable timeouts) and a machine that thinks it is unwell. But the most reliable way to check system health is to give that node real work and see if it completes it within time. That's something that could be done as a low priority job across a cluster: queue work and check the results, though you'd need to direct the work to specific nodes somehow.
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.18.2, 0.19.0

    Attachments: balancerRM.patch, balancerRM1.patch, balancerRM2-b18.patch, balancerRM2.patch


    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Sep 25, 2008 at 6:58 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634579#action_12634579 ]

    Hairong Kuang commented on HADOOP-4116:
    ---------------------------------------

    Yes, I agree. I did similar stuff in HADOOP-2188 in the context of IPC. For this issue, I do not want to take the effort to add a Ping interface to DataNode. I will use KeepAlive for now.

    The failed unit test seems not related to this jira.
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.18.2, 0.19.0

    Attachments: balancerRM.patch, balancerRM1.patch, balancerRM2-b18.patch, balancerRM2.patch


    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Sep 25, 2008 at 8:30 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-4116:
    ----------------------------------

    Resolution: Fixed
    Status: Resolved (was: Patch Available)

    I've committed this.
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.18.2, 0.19.0

    Attachments: balancerRM.patch, balancerRM1.patch, balancerRM2-b18.patch, balancerRM2.patch


    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Sep 25, 2008 at 9:14 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634627#action_12634627 ]

    Tsz Wo (Nicholas), SZE commented on HADOOP-4116:
    ------------------------------------------------

    It is not clear to me that the test (TestDatanodeDeath) failed is not related since the patch changed quite a lot Datanode codes. Created HADOOP-4278 for TestDatanodeDeath.
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.18.2, 0.19.0

    Attachments: balancerRM.patch, balancerRM1.patch, balancerRM2-b18.patch, balancerRM2.patch


    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hudson (JIRA) at Sep 26, 2008 at 5:04 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634927#action_12634927 ]

    Hudson commented on HADOOP-4116:
    --------------------------------

    Integrated in Hadoop-trunk #615 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/615/])
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.18.2, 0.19.0

    Attachments: balancerRM.patch, balancerRM1.patch, balancerRM2-b18.patch, balancerRM2.patch


    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Oct 28, 2008 at 10:35 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-4116:
    ----------------------------------

    Hadoop Flags: [Incompatible change, Reviewed] (was: [Reviewed])
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.18.2, 0.19.0

    Attachments: balancerRM.patch, balancerRM1.patch, balancerRM2-b18.patch, balancerRM2.patch


    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hudson (JIRA) at Oct 29, 2008 at 2:40 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643498#action_12643498 ]

    Hudson commented on HADOOP-4116:
    --------------------------------

    Integrated in Hadoop-trunk #646 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/646/])
    Fix the change log to reflect that is reverted in 0.18.2

    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.18.2, 0.19.0

    Attachments: balancerRM.patch, balancerRM1.patch, balancerRM2-b18.patch, balancerRM2.patch


    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Robert Chansler (JIRA) at Oct 30, 2008 at 7:21 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Chansler updated HADOOP-4116:
    ------------------------------------

    Release Note: Changed DataNode protocol version without impact to clients other than to compel use of current version of client application.
    Hadoop Flags: [Incompatible change, Reviewed] (was: [Reviewed, Incompatible change])
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.18.2, 0.19.0

    Attachments: balancerRM.patch, balancerRM1.patch, balancerRM2-b18.patch, balancerRM2.patch


    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Feb 23, 2009 at 11:36 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Raghu Angadi updated HADOOP-4116:
    ---------------------------------

    Fix Version/s: (was: 0.18.2)
    0.19.0
    Hadoop Flags: [Incompatible change, Reviewed] (was: [Reviewed, Incompatible change])

    Changed 'fix version' to 0.19.0 since 0.18 change was reverted [later|http://svn.apache.org/viewvc?view=rev&revision=708637].
    Balancer should provide better resource management
    --------------------------------------------------

    Key: HADOOP-4116
    URL: https://issues.apache.org/jira/browse/HADOOP-4116
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.17.0
    Reporter: Raghu Angadi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.19.0

    Attachments: balancerRM.patch, balancerRM1.patch, balancerRM2-b18.patch, balancerRM2.patch


    The number of threads are currently limited on datanodes. Once these threads are occupied, DataNode does not accept any more requests (DOS). Recently we saw a case where most of the 256 threads were waiting in {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since rebalancing is (heavily) throttled, I would think this would be the common case.
    These operations waiting for active rebalancing threads to finish need not take up a thread.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedSep 8, '08 at 4:24p
activeFeb 23, '09 at 11:36p
posts37
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Raghu Angadi (JIRA): 37 posts

People

Translate

site design / logo © 2022 Grokbase