I had the same thing happen to me a few weeks ago. The solution was to modify one of the classes a bit (FSEdits.java or some such) and simple catch + swallow one of the exceptions. This let the NN come up again (at the expense of some data loss). Lohit helped me out and files a bug. Don't have the issue number handy, but it is in JIRA and still open as of a few days ago. NN HA seems to be a requirement for a lot of people... I suppose because it's (the only?) SPOF. :)

Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: Torsten Curdt <tcurdt@apache.org>
To: core-user@hadoop.apache.org
Sent: Wednesday, July 30, 2008 2:09:15 PM
Subject: corrupted fsimage and edits

Just a bit of a feedback here.

One of our hadoop 0.16.4 namenodes had gotten a disk full incident
today. No second backup namenode was in place. Both files fsimage and
edits seem to have gotten corrupted. After quite a bit of debugging
and fiddling with a hex edtor we managed to resurrect the files and
continue with just minor loss.

Thankfully this only happened on a development cluster - not on
production. But shouldn't that be something that should NEVER happen?


Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
postedAug 1, '08 at 2:24p
activeAug 1, '08 at 2:24p

1 user in discussion

Otis Gospodnetic: 1 post



site design / logo © 2022 Grokbase