[ https://issues.apache.org/jira/browse/HDFS-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko resolved HDFS-1223.

Resolution: Duplicate

This is fixed as Todd mentions. BTW sometimes the behavior you describe here is desirable, see HDFS-1158 and HDFS-1161.
DataNode fails stop due to a bad disk (or storage directory)

Key: HDFS-1223
URL: https://issues.apache.org/jira/browse/HDFS-1223
Project: Hadoop HDFS
Issue Type: Bug
Components: data-node
Affects Versions: 0.20.1
Reporter: Thanh Do

A datanode can store block files in multiple volumes.
If a datanode sees a bad volume during start up (i.e, face an exception
when accessing that volume), it simply fail stops, making all block files
stored in other healthy volumes inaccessible. Consequently, these lost
replicas will be generated later on in other datanodes.
If a datanode is able to mark the bad disk and continue working with
healthy ones, this will increase availability and avoid unnecessary
regeneration. As an extreme example, consider one datanode which has
2 volumes V1 and V2, each contains about 10000 64MB block files.
During startup, the datanode gets an exception when accessing V1, it then
fail stops, making 20000 block files generated later on.
If the datanode masks V1 as bad and continues working with V2, the number
of replicas needed to be regenerated is cut in to half.
This bug was found by our Failure Testing Service framework:
For questions, please email us: Thanh Do (thanhdo@cs.wisc.edu) and
Haryadi Gunawi (haryadi@eecs.berkeley.edu)
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Search Discussions

Discussion Posts


Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 2 | next ›
Discussion Overview
grouphdfs-dev @
postedJun 17, '10 at 1:04p
activeJun 23, '10 at 2:29a

1 user in discussion

Konstantin Shvachko (JIRA): 2 posts



site design / logo © 2022 Grokbase