For some reason my secondary namenode process died 10 days ago and that
has left me with an edits and edits.new files in my
dfs/name/current directory. The fsimage file is also there but is old and
does not have the merged changes from either the edits or the edits.new.
The cluster has been running fine since the last startup which was 2 weeks
Today i restarted the cluster and now the namenode complains with a NULL
POINTER EXCEPTION. The last checkpoint saved is of the same size as the
fsimage in the current directory so replacing it will not help.
This is a test cluster so worst case is i loose many changes that were not
merged into the fsimage. I can remove the edits.new and bring the cluster
up with a clean edits file. Will have to force the namenode out of safe
mode but then running fsck complains that HDFS is corrupt, obviously
missing blocks/files etc.
The question i have is if there is any way to salvage from such a
situation? I read that one can maybe tamper with the edits and edits.new
files to bring up the namenode but with minimum loss of data. This would
require editing these files in a hex editor?
Is there any documentation/example maybe on how to do this or maybe it is
not possible and not worth the effort. It would be good to know if there
is a way out from such a situation.
I have a 3 node test cluster running Hadoop 0.20.2+737.
Appreciate if i can get any help/pointers.
Using Opera's revolutionary email client: http://www.opera.com/mail/