FAQ
Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
----------------------------------------------------------------------------------------

Key: HADOOP-4584
URL: https://issues.apache.org/jira/browse/HADOOP-4584
Project: Hadoop Core
Issue Type: Bug
Reporter: Hairong Kuang
Fix For: 0.20.0


sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.

It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Hairong Kuang (JIRA) at Nov 4, 2008 at 12:54 am
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-4584:
    ----------------------------------

    Component/s: dfs
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Fix For: 0.20.0


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Brian Bockelman (JIRA) at Jan 15, 2009 at 6:19 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664205#action_12664205 ]

    Brian Bockelman commented on HADOOP-4584:
    -----------------------------------------

    I can add that we are starting to run into this problem as we convert over our pre-existing servers to HDFS!

    The largest has the capacity to store 650k, 64-MB blocks. At some point, it's going to "kill itself" just waiting for the block report -- even under light load!
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Fix For: 0.20.0


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Jan 22, 2009 at 1:10 am
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12666020#action_12666020 ]

    Raghu Angadi commented on HADOOP-4584:
    --------------------------------------

    +1. This has been long standing problem with DataNodes and has been mentioned multiple times in user mailing lists and other jiras.
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Fix For: 0.20.0


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Suresh Srinivas (JIRA) at Jan 30, 2009 at 1:08 am
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Suresh Srinivas reassigned HADOOP-4584:
    ---------------------------------------

    Assignee: Suresh Srinivas
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Suresh Srinivas (JIRA) at Feb 2, 2009 at 6:52 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Suresh Srinivas updated HADOOP-4584:
    ------------------------------------

    Attachment: 4584.patch

    Current loop in {{Datanode.OfferService()}} performs multiple steps as follows:
    1. If in the next heartbeat interval {{sendHeartbeat}}. Process the {{DatanodeCommand}} from the namenode
    2. If there is a block received send {{blockReceived}} request to the namenode
    3. If in the next blockreport interval build and send {{blockReport}}. Process the {{DatanodeCommand}} from the namenode.
    4. Wait till the next heartbeat interval or until another block is received
    5. go back to 1.

    With the changes we have two threads.
    Heartbeat Thread:
    1. New thread sends heartbeat and receives {{DatanodeCommand}} in response. Queues the command to an arraylist.

    Main thread does the following without the previous heartbeat functionality:
    1. If there are commands in the queue, process all of them.
    2. If there is a block received send {{blockReceived}} request to the namenode
    3. If in the next blockreport interval build and send {{blockReport}}. Process the {{DatanodeCommand}} from the namenode.
    4. If there are no blocks recieved or commands to process wait for 1 second or until another block is received
    5. go back to 1.


    Questions:
    1. In step 4. should we wait for receiving a command or for receiving another block?
    2. In OfferService we process all the commands that are in the queue at once. Do you see any issues with it?

    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Feb 2, 2009 at 7:50 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669719#action_12669719 ]

    Raghu Angadi commented on HADOOP-4584:
    --------------------------------------

    I think this approach might work ok for now. It makes sure the data node is not marked dead. But this should be considered mostly a work around. We should note the fundamental problem still remains (a little less lethal). e.g. a) new blocks are not reported, b) no new blocks can be written during this time c) (not sure) not blocks can be read? etc.

    If all the nodes are taking very long to process the block report, many operations on HDFS will fail. An admin can increase the block report period to reduce the effect of this problem. The current fix works fine for occasional delays.
    In step 4. should we wait for receiving a command or for receiving another block?
    both would be better.
    In OfferService we process all the commands that are in the queue at once. Do you see any issues with it?
    Not fundamentally different. One main issue would be that there might be thousands of blocks to delete sometimes.. But that is same problem as long block report.

    Regd more complete fix, I could file another jira to propose a fix that I discussed with Sameer and Hairong, that satisfies all the requirements on current block report.


    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at Feb 2, 2009 at 8:04 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669724#action_12669724 ]

    dhruba borthakur commented on HADOOP-4584:
    ------------------------------------------

    +1.
    In step 4. should we wait for receiving a command or for receiving another block?
    we should wait for either one of them to occur
    In OfferService we process all the commands that are in the queue at once.
    This might introduce delays in sending blockReceived/blockReports (especially whole processing block deletions).
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at Feb 2, 2009 at 8:08 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669724#action_12669724 ]

    dhruba edited comment on HADOOP-4584 at 2/2/09 12:06 PM:
    -------------------------------------------------------------------

    +1.
    In step 4. should we wait for receiving a command or for receiving another block?
    we should wait for either one of them to occur
    In OfferService we process all the commands that are in the queue at once.
    This might introduce delays in sending blockReceived/blockReports (especially while processing block deletions).

    was (Author: dhruba):
    +1.
    In step 4. should we wait for receiving a command or for receiving another block?
    we should wait for either one of them to occur
    In OfferService we process all the commands that are in the queue at once.
    This might introduce delays in sending blockReceived/blockReports (especially whole processing block deletions).
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Suresh Srinivas (JIRA) at Feb 2, 2009 at 8:12 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Suresh Srinivas updated HADOOP-4584:
    ------------------------------------

    Attachment: 4584.patch

    Updated patch
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Suresh Srinivas (JIRA) at Feb 2, 2009 at 8:16 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669727#action_12669727 ]

    Suresh Srinivas commented on HADOOP-4584:
    -----------------------------------------

    @Raghu, should we just go for a permanent solution instead of this workaround? If so, lets talk about how it should be done.

    In step 4, we cannot wait for both. We are dong {{wait(1000}} against the {{receivedBlockList}}. I am not sure how we can at the same time wait against {{commandQueue}}. We have to chose one of them. Which object might get {{notify()}} of the two? BTW this wait is currently 1 second. So we keep looking for work to do every second.

    @dhruba
    Is is it better to process one command at a time instead of all the queued ones at once?
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Feb 2, 2009 at 8:22 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669728#action_12669728 ]

    Raghu Angadi commented on HADOOP-4584:
    --------------------------------------
    should we just go for a permanent solution instead of this workaround? If so, lets talk about how it should be done.
    I am fine either way. I don't know the time table for a fix.
    Which object might get notify() of the two?
    you can use a different object to wait or HeartBeat thread can notify {{receivedBlockList}};
    Is is it better to process one command at a time instead of all the queued ones at once?
    The loop should process all the commands before next block report. These commands are more time sensitive than blockReport. For 'blocks received', you could check the blockReceivedList, between the commands.
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at Feb 2, 2009 at 10:04 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669764#action_12669764 ]

    dhruba borthakur commented on HADOOP-4584:
    ------------------------------------------
    For 'blocks received', you could check the blockReceivedList, between the commands.
    +1. This should work best.

    Also, is it convenient for CommandQueue to implement the java.util.Queue interface? Or even use something like java.util.LinkedBlockingQueue.
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Suresh Srinivas (JIRA) at Feb 6, 2009 at 3:30 am
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Suresh Srinivas updated HADOOP-4584:
    ------------------------------------

    Attachment: 4584.patch

    I have made the changes to report received block after processing every command from the queue.

    @Dhruba
    I have retained {{CommandQueue}} based on LinkedList. The reasons - 1. The required interface is simple 2.{{CommandQueue}} notifies based on a common object {{dataAvailable}} along with receivedBlockList.

    Currently the following unit tests are failing (know failures):
    - TestMapReduceLocal
    - TestKillCompletedJob

    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at Feb 6, 2009 at 5:10 am
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671005#action_12671005 ]

    dhruba borthakur commented on HADOOP-4584:
    ------------------------------------------

    Sounds good to me. thanks.
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Feb 6, 2009 at 7:34 am
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671031#action_12671031 ]

    Raghu Angadi commented on HADOOP-4584:
    --------------------------------------


    Synchronization is not correct.

    E.g. : The condition should be checked inside the lock that you wait on. It is not fatal in this case only because it waits only for 1 sec, but still it is better not to write that way. The block report thread should ideally do "{{wait(time till next report)}}", then the bug shows up more. A correct synchronization should work irrespective of wait time.

    Similarly commandQueue and receviedBlockList sizes are checked outside the locks. We don't need to synchronize on these two any more (since the patch uses common "dataAvailable" for synchronization).

    I don't think new class for CommandQueue is required. Also no need for conversion between list and arrays etc. I would synchronize the following way :

    {code}
    // Block receiver thread :

    while (1) {
    cmd = null;
    synchronized (dataAvaliable) {
    cmd = commandQ.dequeue();
    if (cmd == null) {
    break;
    }
    processCmd(cmd);
    // similarly receivedBlockList. RPC should called outside lock.
    }

    // ...
    synchronized (dataAvailable) {
    if (cmdQ.size() == 0 || receivedBlockList.size() == 0) {
    dataAvailable.wait (time till next report);
    }
    }

    // heartBeat thread :
    synchronize (dataAvailable) {
    cmdQ.addAll(cmds);
    dataAvailable.notify();
    }

    // writers :
    synchronized (dataAvailable) {
    receivedBlockArray.add(block);
    dataAvailable.notify();
    }

    // etc.. may be locking around delHints can also be removed.. though not related here.

    {code}


    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Suresh Srinivas (JIRA) at Feb 6, 2009 at 7:07 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671244#action_12671244 ]

    Suresh Srinivas commented on HADOOP-4584:
    -----------------------------------------

    1. {{dataAvailable.wait (time till next report);}}

    because currently the loop exit happens when {{shouldRun}} is to {{false}} or {{shutdown()}} is called. Assuming that "time till next report" is time till next block report, the {{offerService()}} does not end for a long time. This delays shutdown in some of the cases and results in few unit testcase failures that assumes datanode shuts down quickly. Alternatively we could notify {{dataAvailable}} when {{shouldRun}} is set to false. But I think that makes the code quite ugly. Hence the 1 second {{wait}} time.

    2. Using dataAvailable for synchronizing commandQ and receivedBlockList
    I think currently dataAvailable is used to wake up a thread that is waiting for either a command or a received block. Data can be added to commandQ or receivedBlock independent of each other. Doing this assumes that a command cannot added while received block is being added to receivedBlockList.


    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Suresh Srinivas (JIRA) at Feb 8, 2009 at 7:57 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Suresh Srinivas updated HADOOP-4584:
    ------------------------------------

    Attachment: 4584.patch

    Based on discussion with Raghu, updating the patch incorporating changes suggested by his previous comment
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Feb 11, 2009 at 12:55 am
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672470#action_12672470 ]

    Raghu Angadi commented on HADOOP-4584:
    --------------------------------------

    Thanks Suresh.

    couple of things :

    - 'blocksReceived()' RPC is called with 'dataAvailable' lock held.
    - The test uses "NNBench" log.
    - (minor): The test set log level to "ALL" for many classes. Not sure if that is required or useful. It usually tends to write too much output and might even mask future bugs in the tests.
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Suresh Srinivas (JIRA) at Feb 11, 2009 at 4:35 am
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Suresh Srinivas updated HADOOP-4584:
    ------------------------------------

    Attachment: 4584.patch

    Raghu, thanks for the review. Here is the latest patch...
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Feb 11, 2009 at 11:20 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672824#action_12672824 ]

    Raghu Angadi commented on HADOOP-4584:
    --------------------------------------

    Regd more complete solution, a much simpler fix could be :

    * BlockReport does not list the directories at all, but just sends the in-memory list of blocks
    * Any mysteriously missing blocks would be caught by DataBlockScanner (default period 3 weeks)

    Pros :
    * The simplest fix with maximum benefit
    * Good precursor to eventually removing or drastically less frequent block reports

    Cons :
    * DataBlockScanner is rather slow (though the period could configured)
    ** But we have rarely seen blocks disappear. more likely there are truncated or corrupt.

    This is preferred fix for this jira.

    Separating heartbeat from block reports and deletions that this attached patch does could still be useful. I am +1 having that too.. but not a requirement.

    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Suresh Srinivas (JIRA) at Feb 23, 2009 at 9:00 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Suresh Srinivas updated HADOOP-4584:
    ------------------------------------

    Attachment: 4584.patch

    New patch with following additional changes:
    - Use in-memory data for building block report instead of building it by listing files and its size on the disk
    - Has a new test case that compares and makes sure that the block report from in-memory data matches the block report generated from the disk
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at Feb 24, 2009 at 5:32 am
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676189#action_12676189 ]

    dhruba borthakur commented on HADOOP-4584:
    ------------------------------------------

    I do not understand this patch very clearly, can you pl explain.

    Block reports are meant to find inconsistencies between what is on disk versus what is in the namenode metadata. A typical scenario can be that an administrator mistakenly ran a "rm -rf <datadir>" on a datanode. The next block report would make the namenode detect this situation.

    Does this patch suggest that a block report is created by the datanode without scanning the data directory on disk? more info HADOOP-1079.
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Feb 24, 2009 at 5:36 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676344#action_12676344 ]

    Raghu Angadi commented on HADOOP-4584:
    --------------------------------------

    Ideally there is no requirement for block reports. It is essentially to used as 'catch all' for various bugs and errors. (of course, it is now overloaded with job of informing about deletions to NameNode, this should removed).

    Yes, it specifically removes disk scan without fundamentally changing meaning of block reports. Now DN informs NameNode about the the block that it thinks it had. because :

    * 'rm -r' by admin is just one form of many many things that can go wrong with blocks on datanode. There is no perticular reason we should have this very costly disk scan (with a global lock held) just for this.
    ** In fact 'rm -r' is probably the least probable error (haven't seen even once in practice).

    * We have periodic block verification that does handle various things that can go wrong with a block (it can improve further).
    ** So 'rm -r' will be handled, just at the rate of rest of the block problems.

    * on the other hand many users have complained about datanode scans taking 10s of minutes and making datanodes lose heartbeats.
    ** This makes the system pretty unusable and a major obstruction for graceful degradation under load and for scalability.
    ** One can argue that those users should not have so many blocks. But I think DN should still handle it to the best of it abilities and not die on them.
    ** Disks might be slow for many other reasons (other tasks on the machine, etc).

    * I think this is orthogonal to HADOOP-1079 since it addresses RPC and NameNode overhead of block reports. This jira is only about DataNode side.

    Yes, this is a bigger change in semantics than what we proposed earlier : to scan the directories slowly, without holding the global lock... but offline scan looks like a workaround for a problem that does not need to be solved. Not scanning is much simpler than handling offline scan.

    Eventually we need to reduce the frequency of block reports.. this can be done as soon as we add acks for block deletions. This JIRA is major step in that direction.



    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Jason hadoop at Feb 24, 2009 at 6:19 pm
    Any reason for not using an internal or external agent that receives
    notification from the operating system about filesystem operations in the
    block storage subtree?

    On Tue, Feb 24, 2009 at 9:36 AM, Raghu Angadi (JIRA) wrote:


    [
    https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676344#action_12676344]

    Raghu Angadi commented on HADOOP-4584:
    --------------------------------------

    Ideally there is no requirement for block reports. It is essentially to
    used as 'catch all' for various bugs and errors. (of course, it is now
    overloaded with job of informing about deletions to NameNode, this should
    removed).

    Yes, it specifically removes disk scan without fundamentally changing
    meaning of block reports. Now DN informs NameNode about the the block that
    it thinks it had. because :

    * 'rm -r' by admin is just one form of many many things that can go wrong
    with blocks on datanode. There is no perticular reason we should have this
    very costly disk scan (with a global lock held) just for this.
    ** In fact 'rm -r' is probably the least probable error (haven't seen
    even once in practice).

    * We have periodic block verification that does handle various things that
    can go wrong with a block (it can improve further).
    ** So 'rm -r' will be handled, just at the rate of rest of the block
    problems.

    * on the other hand many users have complained about datanode scans taking
    10s of minutes and making datanodes lose heartbeats.
    ** This makes the system pretty unusable and a major obstruction for
    graceful degradation under load and for scalability.
    ** One can argue that those users should not have so many blocks. But I
    think DN should still handle it to the best of it abilities and not die on
    them.
    ** Disks might be slow for many other reasons (other tasks on the
    machine, etc).

    * I think this is orthogonal to HADOOP-1079 since it addresses RPC and
    NameNode overhead of block reports. This jira is only about DataNode side.

    Yes, this is a bigger change in semantics than what we proposed earlier :
    to scan the directories slowly, without holding the global lock... but
    offline scan looks like a workaround for a problem that does not need to be
    solved. Not scanning is much simpler than handling offline scan.

    Eventually we need to reduce the frequency of block reports.. this can be
    done as soon as we add acks for block deletions. This JIRA is major step in
    that direction.



    Slow generation of blockReport at DataNode causes delay of sending
    heartbeat to NameNode

    ----------------------------------------------------------------------------------------
    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch,
    4584.patch, 4584.patch

    sometimes due to disk or some other problems, datanode takes minutes or
    tens of minutes to generate a block report. It causes the datanode not able
    to send heartbeat to NameNode every 3 seconds. In the worst case, it makes
    NameNode to detect a lost heartbeat and wrongly decide that the datanode is
    dead.
    It would be nice to have two threads instead. One thread is for scanning
    data directories and generating block report, and executes the requests sent
    by NameNode; Another thread is for sending heartbeats, block reports, and
    picking up the requests from NameNode. By having these two threads, the
    sending of heartbeats will not get delayed by any slow block report or slow
    execution of NameNode requests.

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi at Feb 24, 2009 at 7:37 pm

    jason hadoop wrote:
    Any reason for not using an internal or external agent that receives
    notification from the operating system about filesystem operations in the
    block storage subtree?
    lack of a patch to do so, may be?

    Raghu.
    On Tue, Feb 24, 2009 at 9:36 AM, Raghu Angadi (JIRA) wrote:

    [
    https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676344#action_12676344]

    Raghu Angadi commented on HADOOP-4584:
    --------------------------------------

    Ideally there is no requirement for block reports. It is essentially to
    used as 'catch all' for various bugs and errors. (of course, it is now
    overloaded with job of informing about deletions to NameNode, this should
    removed).

    Yes, it specifically removes disk scan without fundamentally changing
    meaning of block reports. Now DN informs NameNode about the the block that
    it thinks it had. because :

    * 'rm -r' by admin is just one form of many many things that can go wrong
    with blocks on datanode. There is no perticular reason we should have this
    very costly disk scan (with a global lock held) just for this.
    ** In fact 'rm -r' is probably the least probable error (haven't seen
    even once in practice).

    * We have periodic block verification that does handle various things that
    can go wrong with a block (it can improve further).
    ** So 'rm -r' will be handled, just at the rate of rest of the block
    problems.

    * on the other hand many users have complained about datanode scans taking
    10s of minutes and making datanodes lose heartbeats.
    ** This makes the system pretty unusable and a major obstruction for
    graceful degradation under load and for scalability.
    ** One can argue that those users should not have so many blocks. But I
    think DN should still handle it to the best of it abilities and not die on
    them.
    ** Disks might be slow for many other reasons (other tasks on the
    machine, etc).

    * I think this is orthogonal to HADOOP-1079 since it addresses RPC and
    NameNode overhead of block reports. This jira is only about DataNode side.

    Yes, this is a bigger change in semantics than what we proposed earlier :
    to scan the directories slowly, without holding the global lock... but
    offline scan looks like a workaround for a problem that does not need to be
    solved. Not scanning is much simpler than handling offline scan.

    Eventually we need to reduce the frequency of block reports.. this can be
    done as soon as we add acks for block deletions. This JIRA is major step in
    that direction.



    Slow generation of blockReport at DataNode causes delay of sending
    heartbeat to NameNode
    ----------------------------------------------------------------------------------------
    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch,
    4584.patch, 4584.patch
    sometimes due to disk or some other problems, datanode takes minutes or
    tens of minutes to generate a block report. It causes the datanode not able
    to send heartbeat to NameNode every 3 seconds. In the worst case, it makes
    NameNode to detect a lost heartbeat and wrongly decide that the datanode is
    dead.
    It would be nice to have two threads instead. One thread is for scanning
    data directories and generating block report, and executes the requests sent
    by NameNode; Another thread is for sending heartbeats, block reports, and
    picking up the requests from NameNode. By having these two threads, the
    sending of heartbeats will not get delayed by any slow block report or slow
    execution of NameNode requests.

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi at Feb 24, 2009 at 7:54 pm

    Raghu Angadi wrote:
    jason hadoop wrote:
    Any reason for not using an internal or external agent that receives
    notification from the operating system about filesystem operations in the
    block storage subtree?
    lack of a patch to do so, may be?
    Please let us know if there is a Java prototype implementation of this.
    I think NIO2 has some interface for this.. but not sure if there is some
    equivalent solution for JDK 1.6. Once this is available, it could be
    optionally enabled.

    thanks
    Raghu.

    Raghu.
    On Tue, Feb 24, 2009 at 9:36 AM, Raghu Angadi (JIRA)
    wrote:
    [
    https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676344#action_12676344]


    Raghu Angadi commented on HADOOP-4584:
    --------------------------------------

    Ideally there is no requirement for block reports. It is essentially to
    used as 'catch all' for various bugs and errors. (of course, it is now
    overloaded with job of informing about deletions to NameNode, this
    should
    removed).

    Yes, it specifically removes disk scan without fundamentally changing
    meaning of block reports. Now DN informs NameNode about the the block
    that
    it thinks it had. because :

    * 'rm -r' by admin is just one form of many many things that can go
    wrong
    with blocks on datanode. There is no perticular reason we should have
    this
    very costly disk scan (with a global lock held) just for this.
    ** In fact 'rm -r' is probably the least probable error (haven't seen
    even once in practice).

    * We have periodic block verification that does handle various
    things that
    can go wrong with a block (it can improve further).
    ** So 'rm -r' will be handled, just at the rate of rest of the
    block
    problems.

    * on the other hand many users have complained about datanode scans
    taking
    10s of minutes and making datanodes lose heartbeats.
    ** This makes the system pretty unusable and a major obstruction for
    graceful degradation under load and for scalability.
    ** One can argue that those users should not have so many blocks.
    But I
    think DN should still handle it to the best of it abilities and not
    die on
    them.
    ** Disks might be slow for many other reasons (other tasks on the
    machine, etc).

    * I think this is orthogonal to HADOOP-1079 since it addresses RPC and
    NameNode overhead of block reports. This jira is only about DataNode
    side.

    Yes, this is a bigger change in semantics than what we proposed
    earlier :
    to scan the directories slowly, without holding the global lock... but
    offline scan looks like a workaround for a problem that does not need
    to be
    solved. Not scanning is much simpler than handling offline scan.

    Eventually we need to reduce the frequency of block reports.. this
    can be
    done as soon as we add acks for block deletions. This JIRA is major
    step in
    that direction.



    Slow generation of blockReport at DataNode causes delay of sending
    heartbeat to NameNode
    ----------------------------------------------------------------------------------------
    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch,
    4584.patch, 4584.patch
    sometimes due to disk or some other problems, datanode takes minutes or
    tens of minutes to generate a block report. It causes the datanode
    not able
    to send heartbeat to NameNode every 3 seconds. In the worst case, it
    makes
    NameNode to detect a lost heartbeat and wrongly decide that the
    datanode is
    dead.
    It would be nice to have two threads instead. One thread is for
    scanning
    data directories and generating block report, and executes the
    requests sent
    by NameNode; Another thread is for sending heartbeats, block reports,
    and
    picking up the requests from NameNode. By having these two threads, the
    sending of heartbeats will not get delayed by any slow block report
    or slow
    execution of NameNode requests.

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Feb 24, 2009 at 5:40 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676344#action_12676344 ]

    rangadi edited comment on HADOOP-4584 at 2/24/09 9:39 AM:
    ---------------------------------------------------------------

    Ideally there is no requirement for block reports. It is essentially to used as 'catch all' for various bugs and errors. (of course, it is now overloaded with job of informing about deletions to NameNode and should be separated).

    Yes, it specifically removes disk scan without fundamentally changing meaning of block reports. Now DN informs NameNode about the the block that it thinks it has. because :

    * 'rm -r' by admin is just one form of many many things that can go wrong with blocks on datanode. There is no perticular reason we should have this very costly disk scan (with a global lock held) just for this.
    ** In fact 'rm -r' is probably the least probable error (haven't seen even once in practice).

    * We have periodic block verification that does handle various things that can go wrong with a block (it can improve further).
    ** So 'rm -r' will be handled, just at the rate of rest of the block problems.

    * on the other hand many users have complained about datanode scans taking 10s of minutes and making datanodes lose heartbeats.
    ** This makes the system pretty unusable and a major obstruction for graceful degradation under load and for scalability.
    ** One can argue that those users should not have so many blocks. But I think DN should still handle it to the best of it abilities and not die on them.
    ** Disks might be slow for many other reasons (other tasks on the machine, etc).

    * I think this is orthogonal to HADOOP-1079 since it addresses RPC and NameNode overhead of block reports. This jira is only about DataNode side.

    Yes, this is a bigger change in semantics than what we proposed earlier : to scan the directories slowly, without holding the global lock... but offline scan looks like a workaround for a problem that does not need to be solved. Not scanning is much simpler than handling offline scan.

    Eventually we need to reduce the frequency of block reports.. this can be done as soon as we add acks for block deletions. This JIRA is major step in that direction.




    was (Author: rangadi):
    Ideally there is no requirement for block reports. It is essentially to used as 'catch all' for various bugs and errors. (of course, it is now overloaded with job of informing about deletions to NameNode, this should removed).

    Yes, it specifically removes disk scan without fundamentally changing meaning of block reports. Now DN informs NameNode about the the block that it thinks it had. because :

    * 'rm -r' by admin is just one form of many many things that can go wrong with blocks on datanode. There is no perticular reason we should have this very costly disk scan (with a global lock held) just for this.
    ** In fact 'rm -r' is probably the least probable error (haven't seen even once in practice).

    * We have periodic block verification that does handle various things that can go wrong with a block (it can improve further).
    ** So 'rm -r' will be handled, just at the rate of rest of the block problems.

    * on the other hand many users have complained about datanode scans taking 10s of minutes and making datanodes lose heartbeats.
    ** This makes the system pretty unusable and a major obstruction for graceful degradation under load and for scalability.
    ** One can argue that those users should not have so many blocks. But I think DN should still handle it to the best of it abilities and not die on them.
    ** Disks might be slow for many other reasons (other tasks on the machine, etc).

    * I think this is orthogonal to HADOOP-1079 since it addresses RPC and NameNode overhead of block reports. This jira is only about DataNode side.

    Yes, this is a bigger change in semantics than what we proposed earlier : to scan the directories slowly, without holding the global lock... but offline scan looks like a workaround for a problem that does not need to be solved. Not scanning is much simpler than handling offline scan.

    Eventually we need to reduce the frequency of block reports.. this can be done as soon as we add acks for block deletions. This JIRA is major step in that direction.



    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at Feb 24, 2009 at 6:33 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676356#action_12676356 ]

    dhruba borthakur commented on HADOOP-4584:
    ------------------------------------------

    I think it is imperative that missing blocks be detected by the system more aggresively that what is proposed by this JIRA. This JIRA assumes that if a block file disappears from the disk, then it will be handled by the periodic block scanner, but it might be a couple of weeks before the detection occurs. This could reduce reliability of HDFS, isn't it? Bit rot does not happen that often, but an rougue program deleting lots of block files can occur, especially when arbitrary user-written map-reduce code can execute on cluster nodes.

    One option would be to do a a brute-force-block-report (that scans the entire disk) once every day or so. The hourly block reports may skip scanning the disk. This might alleviate the problem to some extent, while at the same time detecting missing blocks much much earlier than what is proposed by the JIRA.

    In many cases, a datanode has three or four disk devices where it stores data blocks. What happens if one out of the four configured data directories go bad? if the block scanner never gets to processing a block from that data directory, then all the blocks in that data directory might be inaccesible for a long time without being detected, isn't it?


    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Brian Bockelman (JIRA) at Feb 24, 2009 at 6:33 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676355#action_12676355 ]

    Brian Bockelman commented on HADOOP-4584:
    -----------------------------------------

    Hey Raghu,

    - Regarding your above point about periodic block verification handling the various things that can go wrong with a block: Currently, it's woefully insufficient, especially on large data noes, to replace the directory scan. If we wait 3 weeks (or several months for some of our large nodes) before we find a block is missing, we're going to see lots and lots of issues crop up!

    - I have seen the 'rm -r' in practice, by the way :).

    - With a reasonably sized block, we've had 48TB servers be able to only take a few minutes for a scan: no heartbeats lost. That said, I do like your argument that the DN should handle things to the best of its abilities and not die.

    I like the idea of the patch, but only if it's combined with an occasional offline scan (even once a day!). Creeping inconsistency bugs in the NN seem to make very accurate block reports a precious commodity, one that I'd gladly pay an expensive scan for (though I agree that once an hour is probably excessive).
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Feb 24, 2009 at 7:01 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676363#action_12676363 ]

    Raghu Angadi commented on HADOOP-4584:
    --------------------------------------
    I have seen the 'rm -r' in practice, by the way .
    A human error like this has a quick work around : to restart the datanode.

    Improvements to block verification are required (mainly to catch more errors) and should be part of of block verifier, block report does not need to compensate for that.

    The scan of directories once a day is fine might be more assuring to people. But it is very important not to hold the lock (as DN does now) even if the scan is less frequent. i.e., whether we scan once a day or once every block report, the implementation of the scan looks the same. So it still needs to be implemented ...

    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Feb 24, 2009 at 7:03 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676364#action_12676364 ]

    Raghu Angadi commented on HADOOP-4584:
    --------------------------------------
    Creeping inconsistency bugs in the NN seem to make very accurate block reports a precious commodity, one that I'd gladly pay an expensive scan for (though I agree that once an hour is probably excessive).
    FYI : none of the NN inconsistencies in recent times we caught or fixed by disk scans in block reports.
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at Feb 24, 2009 at 7:37 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676375#action_12676375 ]

    dhruba borthakur commented on HADOOP-4584:
    ------------------------------------------

    is it difficult to not hold the global lock while generating block reports? I guess it would require the namenode to handle more subtle race conditions. Is it possible to break up the datanode lock into read/write lock? A block report generation and a block read request can keep the read lock while block creation may acquire the write lock. This might alleviate the problem to some extent. Combined with the fact that a full scan occurs less frequently (maybe once a day), it might be an workable solution.
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Feb 24, 2009 at 7:51 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676384#action_12676384 ]

    Raghu Angadi commented on HADOOP-4584:
    --------------------------------------

    I don't think read/write helps much (since we can't make writes wait for that long) or necessary for this.

    It is not very complicated to scan without locking.. DN just needs to keep track of changes (additions and deletions) that happen from the time scan starts to the time it ends and appy those changes at the end. That way scan can proceed without locking the storage.

    Another way to achieve the same to keep a modification timestamp with each block and at the end of the scan, if a in memory block has a timestamp older than start of the scan but not found in scan, then it should be deleted.. etc.

    None the less it not as easy as not scanning :).

    I think Suresh had more elaborate description of the above with examples.
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Konstantin Shvachko (JIRA) at Feb 24, 2009 at 7:57 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676391#action_12676391 ]

    Konstantin Shvachko commented on HADOOP-4584:
    ---------------------------------------------

    So what is wrong with just going with the original proposal in this jira that is: prepare block reports in a separate thread without delaying heartbeats and other commands, and sending them as soon as they are ready by {{offerService()}}. This seem to be the mission declared by the issue, and changing block reports to be memory based is an add-on, which is not required to solve the problem stated.
    I understand Dhruba's concerns about reliability. I can add to this that memory based reports can also slow down cleaning up disks from unnecessary blocks, which may be critical if the data-node is close to running out of disk space.
    My approach would be to drop the in-memory block report part and commit the rest. The in-memory reports can be discussed in a subsequent issue.
    I think that would be enough of a change by itself, because there may be a dangerous race condition between {{blockReceived()}} and {{blockReport()}} if it is not done right, as we had seen before.
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Feb 24, 2009 at 8:05 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676396#action_12676396 ]

    Raghu Angadi commented on HADOOP-4584:
    --------------------------------------


    Just moving heartbeat to a different thread is useful by itself. My quote from 4th comment above :

    bq. "I think this approach might work ok for now. It makes sure the data node is not marked dead. But this should be considered mostly a work around. We should note the fundamental problem still remains (a little less lethal). e.g. a) new blocks are not reported, b) no new blocks can be written during this time c) (not sure) not blocks can be read? etc. "

    If that is what everyone wants then it is ok. But we will continue to see problems with datanodes with lot of blocks. My impression is that it is about time we fix the fundamental problem with the scan in block reports.
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Suresh Srinivas (JIRA) at Feb 24, 2009 at 8:47 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676407#action_12676407 ]

    Suresh Srinivas commented on HADOOP-4584:
    -----------------------------------------

    Currently one of the version of the patches (Feb 10) introduces separate heartbeat thread without the other changes.

    I would like to get consensus on if we need to detect missing files faster than what block verification can do. Once we agree to that, we can go for long term solution, which should include:

    Block deleted report:
    Sending block deleted (much like block received) from datanode to namenode. Currently block report is the way namenode learns about the deleted blocks. With this change, we can send block reports less frequently.

    Faster block scanning functionality for missing/lingering files:
    - We could have a thread that lists the files (without holding a global lock) and reconciles the blocks on the disk with blocks maintained in the FSDataset. This could be done by deleting the blocks from FSDataset map under the following conditions:
    - block file or a block meta data file is missing on the disk but exists in FSDataset map
    - block meta data does not match block information (block size and generation stamp) in FSDataset map
    - block file or block meta data file exists on the disk and does not exist in FSDataset map

    I was thinking of doing this in a separate jira for long term.
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Suresh Srinivas (JIRA) at Feb 26, 2009 at 3:01 am
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676857#action_12676857 ]

    Suresh Srinivas commented on HADOOP-4584:
    -----------------------------------------

    Can you guys vote on your preferred fix:
    # Heartbeat in a separate thread with block report as it is *(patch is ready)*
    # Heartbeat in a separate thread with block report generated from in-memory data *(patch is ready)*
    # Heartbeat in a separate thread with block report generated in a separate thread by listing block files from the disk *(will be addressed in another jira)*
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at Feb 26, 2009 at 5:47 am
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676887#action_12676887 ]

    dhruba borthakur commented on HADOOP-4584:
    ------------------------------------------

    Option 1: +1. This sounds like the approach that will alleviate this problem to a certain extent.
    Option 2: -1
    Option 3: +1, but we can do this as a separate JIRA
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Konstantin Shvachko (JIRA) at Feb 26, 2009 at 8:03 am
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676910#action_12676910 ]

    Konstantin Shvachko commented on HADOOP-4584:
    ---------------------------------------------

    As I said I propose to isolate in-memory block reports into a separate issue. Does anybody disagree with that?

    As for the heartbeat thread, I would like to propose an alternative to the approach and discuss pros and cons of the two.

    # Now we have a single thread (call it offerServer thread) which does all three operations: heartbeat with processing command returned from the name-node, blockReceived and blockReport.
    # Current Suresh's proposal is to separate heartbeats into a new thread (heartbeat thread), which also means creating a queue of commands returned from name-node for processing by the offerServer thread later on.
    # My proposal is to separate block report preparation into a new thread (blockReport thread), which wakes up once an hour and prepares a block report. Once the report is ready the offerService thread sends it to the name-node.

    I think the last proposal (3) may have an advantage over (2) because in (2) we still delay blockReceived and the processing of commands from the name-node until the block report is getting composed.
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Feb 26, 2009 at 4:55 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677059#action_12677059 ]

    Raghu Angadi commented on HADOOP-4584:
    --------------------------------------

    I don't see any advantage to (3), based on more details, it might not even be correct. The requirement is that BlockReport should have "exact snapshot" of blocks... i.e. no changes changes can happen to FSDataset from the time block report starts until it ends. Which thread does it does not matter. Processing commands and block report in one thread makes sense since those need to happen serially.

    May be (3) still has some advantage : could you give a specific example that shows the advantage?

    Fixing the block reports properly (with a directory scan once a day or so), i.e. "Option 3" in a seperate jira is ok. But I would like to see that marked as blocker at least for 0.20 or 0.21. I for one am pretty tired of replying "oh, that is a known issue and we need fix" every time users complain about it. Some users even had a separate process to constant scan the directory tree to to keep the inode info in kernel memory.

    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Feb 26, 2009 at 5:07 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677061#action_12677061 ]

    Raghu Angadi commented on HADOOP-4584:
    --------------------------------------
    Option 1: +1. This sounds like the approach that will alleviate this problem to a certain extent.
    Not in the current form. HeartBeat still needs to be fixed not to acquire global volumes lock. Currently it locks to get disk stats. So blockreport still blocks heart beat. I think should be fixed in this jira.
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Konstantin Shvachko (JIRA) at Feb 27, 2009 at 8:25 am
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677280#action_12677280 ]

    Konstantin Shvachko commented on HADOOP-4584:
    ---------------------------------------------
    May be (3) still has some advantage : could you give a specific example that shows the advantage?
    The example is as I mentioned before: In (2) when blockReport is scanning directories, which may take minutes according to Suresh, blockRecieved can not be processed, and the commands returned from the name-node
    in reply to heartbeats like replicate and delete blocks will just accumulate on the command queue and wait until block report is done. True, the data-node will not die, but it will still be frozen in offerService thread.

    I am just proposing to do with block reports the same we did with received blocks: when they arrive we place them into {{receivedBlockList}}, and offerService sends blockRecieved when the list is not empty. Block reports are prepared by a separate thread and placed into {{readyBlockReport}} member. offerService sends it whenever the member is not null.
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Feb 27, 2009 at 5:41 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677429#action_12677429 ]

    Raghu Angadi commented on HADOOP-4584:
    --------------------------------------

    Konstantin,

    does it satisfy the requirement for block reports I mentioned :

    bq. The requirement is that BlockReport should have "exact snapshot" of blocks... i.e. no changes changes can happen to FSDataset from the time block report starts until it ends.

    If yes, how? If not, why isn't it required?

    Anyway, next version of the patch can can include the improvement you are proposing, and we can easily check if there is a clear advantage.



    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Feb 27, 2009 at 5:45 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677431#action_12677431 ]

    Raghu Angadi commented on HADOOP-4584:
    --------------------------------------

    Also, "Option 3" that Suresh proposed above does scan in a different thread and it does it correctly.. Thus avoids freezing offerService thread. Did you mean to implement "Option 3" in this patch? I am all for it... that is the proper fix I have been lobbying for for long :)



    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Suresh Srinivas (JIRA) at Feb 27, 2009 at 11:13 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Suresh Srinivas updated HADOOP-4584:
    ------------------------------------

    Attachment: 4584.hbthread.patch

    Updated patch for separating heartbeat thread. Methods {{FSDataset.getCapacity(), FSDataset.getDfsUsed(), FSDataset.getRemaining()}} use commands such as {{df}} and {{du}} on data directories and does not need to be synchronized with FSVolumeSet.

    Changes are:
    * Removed synchronizing {{FSVolumeSet.getCapacity()}} on {{FSVolumeSet}}
    * Removed synchronizing {{FSVolumeSet.getRemaining()}} on {{FSVolumeSet}}
    * Access to {{FSDataset.getCapacity(), FSDataset.getRemaining(), FSDataset.getDfsUsed()}} are synchronized

    Note if {{df}} operation blocks, the heartbeat thread will be blocked. However assuming it will be much shorter than 10 minutes and the datanodes will not be marked dead by namenodes.

    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.hbthread.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Suresh Srinivas (JIRA) at Mar 3, 2009 at 11:39 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678512#action_12678512 ]

    Suresh Srinivas commented on HADOOP-4584:
    -----------------------------------------

    Separating out heartbeat thread is a step in the direction of more permanent fix. Latest patch separates out the heartbeat thread. It removes the change of generating block report from in-memory data and reverts back to generating it from the disk. Should we commit this change?
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.hbthread.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Konstantin Shvachko (JIRA) at Mar 4, 2009 at 2:29 am
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678553#action_12678553 ]

    Konstantin Shvachko commented on HADOOP-4584:
    ---------------------------------------------

    Separating a HB thread from the main offerService thread has the following disadvantages:
    # This does not remove contention on processing blocks reports.
    That is, the data-node is still blocked preparing block report and cannot do anything useful like send blockReceived or process commands from the name-node. The only good thing is that it does not die.
    # We loose automatic data-node activity throttling with this.
    Meaning that while the data-node is busy it still sends heartbeats and name-node replies with commands, which are piled up in the queue because the DN cannot process them.
    This can probably be solved with a smart command queue maintenance or by adjusting of heartbeat frequency with respect to the length of the queue, but will require more work and very thorough tuning.
    # Related to previous. Administrators will no longer be able to judge that a data-node is in trouble by just looking at its heartbeat interval.

    So I would argue to keep HB processing in the main offerService loop, but rather separate the block report processing into a separate thread.
    In general we should keep all heavy-weight operation like delete-blocks away from the offer service loop. They can be done in separate threads.
    Does that make me a supporter of "Option 3"?
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.hbthread.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at Mar 4, 2009 at 8:20 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678896#action_12678896 ]

    Raghu Angadi commented on HADOOP-4584:
    --------------------------------------
    Does that make me a supporter of "Option 3"?
    It looks like.

    It is a good point about loosing "throttling" and "indiating to admin about slow datanodes". But fundamentally that is not job of a heartbeat. Those are couple of useful things piggy backed on current heartbeats. Strictly, it is better to make HB report some number indicating backlog at the DataNode rather than delaying HB. HB should only only mean "can this datanode be used or not".

    That said, I am fine with not having a seperate HB threads, as long as we are going to move hardware dependent expensive operations like deleting blocks out of offerservice/HB thread (here or in a new jira).

    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.hbthread.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Suresh Srinivas (JIRA) at Mar 10, 2009 at 12:58 am
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680343#action_12680343 ]

    Suresh Srinivas commented on HADOOP-4584:
    -----------------------------------------

    Based on the discussions so far, here is a proposal:
    # DataBlockScanner will be enhanced to periodically check to see if the blocks on the disk matches blocks in memory.
    # Block list is compiled from disk and in-memory map. The two lists are compared to find the following inconsistencies:
    ## Block is in memory and not on the disk
    ## Block is on the disk and not in memory
    ## Block on the disk does not match the block in memory
    # Reconciling differences is done one difference at a time. FSDataset lock is held to prevent further block changes and a check is done to ensure inconsistency found still exists (to account for changes that might have happened while checking the disk for block files):
    ## If a block file is missing on the disk, block is deleted in memory
    ## If a block metadata file is missing on the disk, block in memory is updated with generation stamp as zero (as done in block reports currently)
    ## If a block is missing in memory, then it is added to FSDataset
    ## If blocks do not match, the in-memory block is updated to reflect the block on the disk
    ## A block metafile that does not have corresponding block file will be deleted from the disk
    # Block report will be generated from the in-memory data
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.hbthread.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Suresh Srinivas (JIRA) at Mar 12, 2009 at 10:43 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Suresh Srinivas updated HADOOP-4584:
    ------------------------------------

    Attachment: 4584.brthread.2.patch

    Couple of changes that requires careful review:
    # Currently metadata file exists without a corresponding block is deleted. Could this cause a problem (that is if metadata file was created before a block file is created)?
    # DirectoryScanner is added to DataBlockScanner. While reconciling the difference between the disk and in-memory blocks, existing scanner functionality is suspended. Would this cause problem for DataBlockScanner in ensuring its constraints such as scanning with in certain period of time and using configured bandwidth?
    Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
    ----------------------------------------------------------------------------------------

    Key: HADOOP-4584
    URL: https://issues.apache.org/jira/browse/HADOOP-4584
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Suresh Srinivas
    Fix For: 0.20.0

    Attachments: 4584.brthread.2.patch, 4584.hbthread.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch


    sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
    It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedNov 4, '08 at 12:52a
activeApr 10, '09 at 8:17p
posts75
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase