[ https://issues.apache.org/jira/browse/HDFS-795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon resolved HDFS-795.

Resolution: Duplicate

HDFS-101 duplicates this, and fix is under way there.
DFS Write pipeline does not detect defective datanode correctly in some cases (HADOOP-3339)

Key: HDFS-795
URL: https://issues.apache.org/jira/browse/HDFS-795
Project: Hadoop HDFS
Issue Type: Bug
Components: hdfs client
Affects Versions: 0.20.1
Reporter: Raghu Angadi
Priority: Critical
Fix For: 0.20.2

Attachments: toreproduce-5796.patch

HDFS write pipeline does not select the correct datanode in some error cases. One example : say DN2 is the second datanode and write to it times out since it is in a bad state.. pipeline actually removes the first datanode. If such a datanode happens to be the last one in the pipeline, write is aborted completely with a hard error.
Essentially the error occurs when writing to a downstream datanode fails rather than reading. This bug was actually fixed in 0.18 (HADOOP-3339). But HADOOP-1700 essentially reverted it. I am not sure why.
It is absolutely essential for HDFS to handle failures on subset of datanodes in a pipeline. We should not have at least known bugs that lead to hard failures.
I will attach patch for a hack that illustrates this problem. Still thinking of how an automated test would look like for this one.
My preferred target for this fix is 0.20.1.
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouphdfs-dev @
postedDec 18, '09 at 2:00a
activeDec 18, '09 at 2:00a

1 user in discussion

Todd Lipcon (JIRA): 1 post



site design / logo © 2023 Grokbase