What can cause HDFS to become corrupt? I was running some jobs which
When I checked logs I saw that some files were corrupt so I ran 'hadoop
fsck /' which
showed that a few files were corrupt:
/user/data/2009-07-01/165_2009-07-01.log: CORRUPT block
/user/data/2009-07-21/060_2009-07-21.log: CORRUPT block
/user/data/2009-07-26/173_2009-07-26.log: CORRUPT block
I had backups of these files so what I did was delete these and reload
them, so the file system
is OK now. What I'm wondering is how these files became corrupt. There
are 6 nodes in the
cluster and I have a replication factor of 3.
I had assumed that if a replica became corrupt that it would be replaced
by a non-corrupt copy.
Is this not the case?
Would there have been some way to recover the files if I didn't have any
Another concern is that I only found out HDFS was corrupt by accident.
I suppose I should have
a script run every few minutes to parse the results of 'hadoop fsck /'
and email if anything becomes
corrupt. How are people currently handling this ?
thank you very much