|| at Jul 30, 2008 at 6:37 pm
You should always have more than one location (preferably on different
disks) for fsimage and editslog.
A few months back I had a proposal to keep checksums for each record on
fsimage and editslog and NameNode would recover transparently from such
corruptions when there are more than one copies available. It didn't
come up in priority since there were no such failures observed.
You should certainly report these cases and will help the feature gain
Torsten Curdt wrote:
Just a bit of a feedback here.
One of our hadoop 0.16.4 namenodes had gotten a disk full incident
today. No second backup namenode was in place. Both files fsimage and
edits seem to have gotten corrupted. After quite a bit of debugging and
fiddling with a hex edtor we managed to resurrect the files and continue
with just minor loss.
Thankfully this only happened on a development cluster - not on
production. But shouldn't that be something that should NEVER happen?