FAQ

I had assumed that if a replica became corrupt that it would be replaced
by a non-corrupt copy.
Is this not the case?
yes it is. Usually some random block might be corrupted for various
reasons and it gets replaced by another replica of the block.

A block might stay in corrupt state if there are no good replicas left
or new replicas could not be created. The actual reason might be
hardware related (say a lot of nodes die) or a real software bug.

If you use Hadoop-0.20 or later, you will notice a warning in red on the
NameNode front page if some blocks are left with no good replicas. You
don't need to run FSCK (which could be costly) each time.

If you are interested, you could try to trace one of these block ids in
NameNode log to see what happened it. We are always eager to hear about
irrecoverable errors. Please mention hadoop version you are using.

If the data is corrupt (rather than truncated or missing), you can fetch
the data by using "-ignoreCrc" option to 'fs -get'.

Raghu.

Mayuran Yogarajah wrote:
Hello all,

What can cause HDFS to become corrupt? I was running some jobs which
were failing. When I checked logs I saw that some files were corrupt so
I ran 'hadoop fsck /' which
showed that a few files were corrupt:

/user/data/2009-07-01/165_2009-07-01.log: CORRUPT block
blk_1697509332927954816
/user/data/2009-07-21/060_2009-07-21.log: CORRUPT block
blk_8841160612810933777
/user/data/2009-07-26/173_2009-07-26.log: CORRUPT block
blk_-6669973789246139664

I had backups of these files so what I did was delete these and reload
them, so the file system
is OK now. What I'm wondering is how these files became corrupt. There
are 6 nodes in the
cluster and I have a replication factor of 3.

I had assumed that if a replica became corrupt that it would be replaced
by a non-corrupt copy.
Is this not the case?

Would there have been some way to recover the files if I didn't have any
backups ?

Another concern is that I only found out HDFS was corrupt by accident.
I suppose I should have
a script run every few minutes to parse the results of 'hadoop fsck /'
and email if anything becomes
corrupt. How are people currently handling this ?

thank you very much
M

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 5 | next ›
Discussion Overview
groupcommon-user @
categorieshadoop
postedAug 10, '09 at 10:08p
activeAug 11, '09 at 4:44p
posts5
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2021 Grokbase