FAQ
Datanode 'alive' but with its disk failed, Namenode thinks it's alive
---------------------------------------------------------------------

Key: HDFS-1234
URL: https://issues.apache.org/jira/browse/HDFS-1234
Project: Hadoop HDFS
Issue Type: Bug
Components: name-node
Affects Versions: 0.20.1
Reporter: Thanh Do


- Summary: Datanode 'alive' but with its disk failed, Namenode still thinks it's alive

- Setups:
+ Replication = 1
+ # available datanodes = 2
+ # disks / datanode = 1
+ # failures = 1
+ Failure type = bad disk
+ When/where failure happens = first phase of the pipeline

- Details:
In this experiment we have two datanodes. Each node has 1 disk.
However, if one datanode has a failed disk (but the node is still alive), the datanode
does not keep track of this. From the perspective of the namenode,
that datanode is still alive, and thus the namenode gives back the same datanode
to the client. The client will retry 3 times by asking the namenode to
give a new set of datanodes, and always get the same datanode.
And every time the client wants to write there, it gets an exception.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Todd Lipcon (JIRA) at Jun 17, 2010 at 5:38 pm
    [ https://issues.apache.org/jira/browse/HDFS-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Todd Lipcon resolved HDFS-1234.
    -------------------------------

    Resolution: Duplicate

    Resolved by HDFS-630
    Datanode 'alive' but with its disk failed, Namenode thinks it's alive
    ---------------------------------------------------------------------

    Key: HDFS-1234
    URL: https://issues.apache.org/jira/browse/HDFS-1234
    Project: Hadoop HDFS
    Issue Type: Bug
    Components: name-node
    Affects Versions: 0.20.1
    Reporter: Thanh Do

    - Summary: Datanode 'alive' but with its disk failed, Namenode still thinks it's alive

    - Setups:
    + Replication = 1
    + # available datanodes = 2
    + # disks / datanode = 1
    + # failures = 1
    + Failure type = bad disk
    + When/where failure happens = first phase of the pipeline

    - Details:
    In this experiment we have two datanodes. Each node has 1 disk.
    However, if one datanode has a failed disk (but the node is still alive), the datanode
    does not keep track of this. From the perspective of the namenode,
    that datanode is still alive, and thus the namenode gives back the same datanode
    to the client. The client will retry 3 times by asking the namenode to
    give a new set of datanodes, and always get the same datanode.
    And every time the client wants to write there, it gets an exception.
    This bug was found by our Failure Testing Service framework:
    http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
    For questions, please email us: Thanh Do (thanhdo@cs.wisc.edu) and
    Haryadi Gunawi (haryadi@eecs.berkeley.edu)
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouphdfs-dev @
categorieshadoop
postedJun 17, '10 at 6:37a
activeJun 17, '10 at 5:38p
posts2
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Todd Lipcon (JIRA): 2 posts

People

Translate

site design / logo © 2022 Grokbase