FAQ
Corrupted blocks get deleted but not replicated
-----------------------------------------------

Key: HADOOP-1349
URL: https://issues.apache.org/jira/browse/HADOOP-1349
Project: Hadoop
Issue Type: Bug
Components: dfs
Reporter: Hairong Kuang
Fix For: 0.14.0


When I test the patch to HADOOP-1345 on a two node dfs cluster, I see that dfs correctly delete the corrupted replica and successfully retry reading from the other correct replica, but the block does not get replicated. The block remains with only 1 replica until the next block report comes in.

In my testcase, since the dfs cluster has only 2 datanodes, the target of replication is the same as the target of block invalidation. After poking the logs, I found out that the namenode sent the replication request before the block invalidation request.

This is because the namenode does not invalidate a block well. In FSNamesystem.invalidateBlock, it first puts the invalidate request in a queue and then immediately removes the replica from its state, which triggers the choosing a target for the block. When requests are sent back to the target datanode as a reply to a heartbeat message, the replication requests have higher priority than the invalidate requests.

This problem could be solved if a namenode removes an invalidated replica from its state only after the invalidate request is sent to the datanode.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • dhruba borthakur (JIRA) at May 10, 2007 at 11:36 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494913 ]

    dhruba borthakur commented on HADOOP-1349:
    ------------------------------------------

    This is a good one! This might be one of the reasons why we sometimes see under-replicated blocks in a cluster. When you restart the namenode, does this problem get rectified automatically?
    Corrupted blocks get deleted but not replicated
    -----------------------------------------------

    Key: HADOOP-1349
    URL: https://issues.apache.org/jira/browse/HADOOP-1349
    Project: Hadoop
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Fix For: 0.14.0


    When I test the patch to HADOOP-1345 on a two node dfs cluster, I see that dfs correctly delete the corrupted replica and successfully retry reading from the other correct replica, but the block does not get replicated. The block remains with only 1 replica until the next block report comes in.
    In my testcase, since the dfs cluster has only 2 datanodes, the target of replication is the same as the target of block invalidation. After poking the logs, I found out that the namenode sent the replication request before the block invalidation request.
    This is because the namenode does not invalidate a block well. In FSNamesystem.invalidateBlock, it first puts the invalidate request in a queue and then immediately removes the replica from its state, which triggers the choosing a target for the block. When requests are sent back to the target datanode as a reply to a heartbeat message, the replication requests have higher priority than the invalidate requests.
    This problem could be solved if a namenode removes an invalidated replica from its state only after the invalidate request is sent to the datanode.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at May 14, 2007 at 6:54 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang reassigned HADOOP-1349:
    -------------------------------------

    Assignee: Hairong Kuang
    Corrupted blocks get deleted but not replicated
    -----------------------------------------------

    Key: HADOOP-1349
    URL: https://issues.apache.org/jira/browse/HADOOP-1349
    Project: Hadoop
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assigned To: Hairong Kuang
    Fix For: 0.14.0


    When I test the patch to HADOOP-1345 on a two node dfs cluster, I see that dfs correctly delete the corrupted replica and successfully retry reading from the other correct replica, but the block does not get replicated. The block remains with only 1 replica until the next block report comes in.
    In my testcase, since the dfs cluster has only 2 datanodes, the target of replication is the same as the target of block invalidation. After poking the logs, I found out that the namenode sent the replication request before the block invalidation request.
    This is because the namenode does not invalidate a block well. In FSNamesystem.invalidateBlock, it first puts the invalidate request in a queue and then immediately removes the replica from its state, which triggers the choosing a target for the block. When requests are sent back to the target datanode as a reply to a heartbeat message, the replication requests have higher priority than the invalidate requests.
    This problem could be solved if a namenode removes an invalidated replica from its state only after the invalidate request is sent to the datanode.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at May 15, 2007 at 12:19 am
    [ https://issues.apache.org/jira/browse/HADOOP-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495817 ]

    Hairong Kuang commented on HADOOP-1349:
    ---------------------------------------

    Yes, it seems that the under-replicated blocks are gone after restarting the namenode.
    Corrupted blocks get deleted but not replicated
    -----------------------------------------------

    Key: HADOOP-1349
    URL: https://issues.apache.org/jira/browse/HADOOP-1349
    Project: Hadoop
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assigned To: Hairong Kuang
    Fix For: 0.14.0

    Attachments: blockInvalidate.patch


    When I test the patch to HADOOP-1345 on a two node dfs cluster, I see that dfs correctly delete the corrupted replica and successfully retry reading from the other correct replica, but the block does not get replicated. The block remains with only 1 replica until the next block report comes in.
    In my testcase, since the dfs cluster has only 2 datanodes, the target of replication is the same as the target of block invalidation. After poking the logs, I found out that the namenode sent the replication request before the block invalidation request.
    This is because the namenode does not invalidate a block well. In FSNamesystem.invalidateBlock, it first puts the invalidate request in a queue and then immediately removes the replica from its state, which triggers the choosing a target for the block. When requests are sent back to the target datanode as a reply to a heartbeat message, the replication requests have higher priority than the invalidate requests.
    This problem could be solved if a namenode removes an invalidated replica from its state only after the invalidate request is sent to the datanode.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at May 15, 2007 at 12:19 am
    [ https://issues.apache.org/jira/browse/HADOOP-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-1349:
    ----------------------------------

    Attachment: blockInvalidate.patch

    This patch makes sure that blocks are removed from namespace only after they are removed from a datanode.

    It adds a new data structure to FSNamesystem, pendingDeleteSets, which keeps track of all the blocks that are deleted from datanodes but have not been removed from the namespace yet.

    Functionally it makes 4 changes:
    1. InvalideateBlock does not remove a block from its namespace.
    2. When process a heartbeat, if the namenode instrcutes the datanode to remove blocks, all these blocks are moved to pendingDeleteSets.
    3. When ReplicationMonitor, the background computation thread, wakes up to work, it removes blocks in pendingDeleteSets from the namespace if there is any.
    4. This patch exposed a bug in the ChecksumException handling. Currently the code calls seekToNewSource to select a different replica. But it turned out that a following seek/read still tried to select a replica. Sometimes it happens to be the problematic replica. So this patch makes sure that a seek/read following seekToNewSource does not select a new source.

    Corrupted blocks get deleted but not replicated
    -----------------------------------------------

    Key: HADOOP-1349
    URL: https://issues.apache.org/jira/browse/HADOOP-1349
    Project: Hadoop
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assigned To: Hairong Kuang
    Fix For: 0.14.0

    Attachments: blockInvalidate.patch


    When I test the patch to HADOOP-1345 on a two node dfs cluster, I see that dfs correctly delete the corrupted replica and successfully retry reading from the other correct replica, but the block does not get replicated. The block remains with only 1 replica until the next block report comes in.
    In my testcase, since the dfs cluster has only 2 datanodes, the target of replication is the same as the target of block invalidation. After poking the logs, I found out that the namenode sent the replication request before the block invalidation request.
    This is because the namenode does not invalidate a block well. In FSNamesystem.invalidateBlock, it first puts the invalidate request in a queue and then immediately removes the replica from its state, which triggers the choosing a target for the block. When requests are sent back to the target datanode as a reply to a heartbeat message, the replication requests have higher priority than the invalidate requests.
    This problem could be solved if a namenode removes an invalidated replica from its state only after the invalidate request is sent to the datanode.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at May 15, 2007 at 6:55 am
    [ https://issues.apache.org/jira/browse/HADOOP-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495875 ]

    dhruba borthakur commented on HADOOP-1349:
    ------------------------------------------

    I will look at the code more closely, but the approach sounds pretty good. Like we discussed, this approach still cannot solve the race condition entirely. In this approach, we ensure that the namenode has sent out the delete-block request before attempting to allocate the same block on the same datanode, but these requests could still get *processed* on the datanode out-of-order. This fix reduces the race-window to a minimum.

    The other side-effect is that a client than opens the file might try a bad block replica for a longer time (because the block does not get deleted from the blocksmap for a longer time) but this irritant should be minor at best and can be ignored.
    Corrupted blocks get deleted but not replicated
    -----------------------------------------------

    Key: HADOOP-1349
    URL: https://issues.apache.org/jira/browse/HADOOP-1349
    Project: Hadoop
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assigned To: Hairong Kuang
    Fix For: 0.14.0

    Attachments: blockInvalidate.patch


    When I test the patch to HADOOP-1345 on a two node dfs cluster, I see that dfs correctly delete the corrupted replica and successfully retry reading from the other correct replica, but the block does not get replicated. The block remains with only 1 replica until the next block report comes in.
    In my testcase, since the dfs cluster has only 2 datanodes, the target of replication is the same as the target of block invalidation. After poking the logs, I found out that the namenode sent the replication request before the block invalidation request.
    This is because the namenode does not invalidate a block well. In FSNamesystem.invalidateBlock, it first puts the invalidate request in a queue and then immediately removes the replica from its state, which triggers the choosing a target for the block. When requests are sent back to the target datanode as a reply to a heartbeat message, the replication requests have higher priority than the invalidate requests.
    This problem could be solved if a namenode removes an invalidated replica from its state only after the invalidate request is sent to the datanode.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Marco Nicosia (JIRA) at Jun 18, 2007 at 10:25 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Marco Nicosia updated HADOOP-1349:
    ----------------------------------

    Priority: Blocker (was: Major)
    Corrupted blocks get deleted but not replicated
    -----------------------------------------------

    Key: HADOOP-1349
    URL: https://issues.apache.org/jira/browse/HADOOP-1349
    Project: Hadoop
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.14.0

    Attachments: blockInvalidate.patch


    When I test the patch to HADOOP-1345 on a two node dfs cluster, I see that dfs correctly delete the corrupted replica and successfully retry reading from the other correct replica, but the block does not get replicated. The block remains with only 1 replica until the next block report comes in.
    In my testcase, since the dfs cluster has only 2 datanodes, the target of replication is the same as the target of block invalidation. After poking the logs, I found out that the namenode sent the replication request before the block invalidation request.
    This is because the namenode does not invalidate a block well. In FSNamesystem.invalidateBlock, it first puts the invalidate request in a queue and then immediately removes the replica from its state, which triggers the choosing a target for the block. When requests are sent back to the target datanode as a reply to a heartbeat message, the replication requests have higher priority than the invalidate requests.
    This problem could be solved if a namenode removes an invalidated replica from its state only after the invalidate request is sent to the datanode.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Sameer Paranjpye (JIRA) at Jun 18, 2007 at 11:43 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Sameer Paranjpye updated HADOOP-1349:
    -------------------------------------

    Priority: Major (was: Blocker)

    It's unclear whether this problem actually exists.
    Corrupted blocks get deleted but not replicated
    -----------------------------------------------

    Key: HADOOP-1349
    URL: https://issues.apache.org/jira/browse/HADOOP-1349
    Project: Hadoop
    Issue Type: Bug
    Components: dfs
    Reporter: Hairong Kuang
    Assignee: Hairong Kuang
    Fix For: 0.14.0

    Attachments: blockInvalidate.patch


    When I test the patch to HADOOP-1345 on a two node dfs cluster, I see that dfs correctly delete the corrupted replica and successfully retry reading from the other correct replica, but the block does not get replicated. The block remains with only 1 replica until the next block report comes in.
    In my testcase, since the dfs cluster has only 2 datanodes, the target of replication is the same as the target of block invalidation. After poking the logs, I found out that the namenode sent the replication request before the block invalidation request.
    This is because the namenode does not invalidate a block well. In FSNamesystem.invalidateBlock, it first puts the invalidate request in a queue and then immediately removes the replica from its state, which triggers the choosing a target for the block. When requests are sent back to the target datanode as a reply to a heartbeat message, the replication requests have higher priority than the invalidate requests.
    This problem could be solved if a namenode removes an invalidated replica from its state only after the invalidate request is sent to the datanode.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedMay 10, '07 at 11:24p
activeJun 18, '07 at 11:43p
posts8
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Sameer Paranjpye (JIRA): 8 posts

People

Translate

site design / logo © 2023 Grokbase