FAQ
Corrupted block if a crash happens before writing to checksumOut but after writing to dataOut
---------------------------------------------------------------------------------------------

Key: HDFS-1232
URL: https://issues.apache.org/jira/browse/HDFS-1232
Project: Hadoop HDFS
Issue Type: Bug
Components: data-node
Affects Versions: 0.20.1
Reporter: Thanh Do


- Summary: block is corrupted if a crash happens before writing to checksumOut but
after writing to dataOut.

- Setup:
+ # available datanodes = 1
+ # disks / datanode = 1
+ # failures = 1
+ failure type = crash
+When/where failure happens = (see below)

- Details:
The order of processing a packet during client write/append at datanode
is first forward the packet to downstream, then write to data the block file, and
and finally, write to the checksum file. Hence if a crash happens BEFORE the write
to checksum file but AFTER the write to data file, the block is corrupted.
Worse, if this is the only available replica, the block is lost.

We also found this problem in case there are 3 replicas for a particular block,
and during append, there are two failures. (see HDFS-1231)

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (thanhdo@cs.wisc.edu) and
Haryadi Gunawi (haryadi@eecs.berkeley.edu)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Todd Lipcon (JIRA) at Jun 17, 2010 at 5:38 pm
    [ https://issues.apache.org/jira/browse/HDFS-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Todd Lipcon resolved HDFS-1232.
    -------------------------------

    Resolution: Duplicate

    This has already been discussed elsewhere. The primary assumption is that a pipeline has more than one DN in it, and this is unlikely to happen on all of the DNs simultaneously. So one replica will get corrupt, but we have others that are fine.
    Corrupted block if a crash happens before writing to checksumOut but after writing to dataOut
    ---------------------------------------------------------------------------------------------

    Key: HDFS-1232
    URL: https://issues.apache.org/jira/browse/HDFS-1232
    Project: Hadoop HDFS
    Issue Type: Bug
    Components: data-node
    Affects Versions: 0.20.1
    Reporter: Thanh Do

    - Summary: block is corrupted if a crash happens before writing to checksumOut but
    after writing to dataOut.

    - Setup:
    + # available datanodes = 1
    + # disks / datanode = 1
    + # failures = 1
    + failure type = crash
    +When/where failure happens = (see below)

    - Details:
    The order of processing a packet during client write/append at datanode
    is first forward the packet to downstream, then write to data the block file, and
    and finally, write to the checksum file. Hence if a crash happens BEFORE the write
    to checksum file but AFTER the write to data file, the block is corrupted.
    Worse, if this is the only available replica, the block is lost.

    We also found this problem in case there are 3 replicas for a particular block,
    and during append, there are two failures. (see HDFS-1231)
    This bug was found by our Failure Testing Service framework:
    http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
    For questions, please email us: Thanh Do (thanhdo@cs.wisc.edu) and
    Haryadi Gunawi (haryadi@eecs.berkeley.edu)
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouphdfs-dev @
categorieshadoop
postedJun 17, '10 at 12:39p
activeJun 17, '10 at 5:38p
posts2
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Todd Lipcon (JIRA): 2 posts

People

Translate

site design / logo © 2022 Grokbase