I'm glad to report that it is fixed. I copied the namenode dir to a
local machine and debugged it in a patched cdh3u4 environment. I added
a try-catch-continue statement around FSEditLog:693 so that it prints
the affected path and continues reading the edits file. Around 20.000
skips were done, all of them more-or-less old disposable HBase logs. I
guess I created my own 'continue' option. A lot of edits were saved
judging by the log:
2012-06-19 13:14:46,706 INFO common.Storage - Edits file /home/ferdy/
nn/current/edits of size 6965105 edits # 63903 loaded in 1 seconds
...
2012-06-19 13:16:05,980 INFO namenode.FSNamesystem - Invalid opcode,
reached end of edit log Number of transactions found: 6874928. Bytes
read: 893828591
2012-06-19 13:16:05,980 INFO common.Storage - Edits file /home/ferdy/
nn/current/edits.new of size 894554862 edits # 6874928 loaded in 79
seconds.
I copied the newly created image back to the cluster, removed some
files with missing blocks (also mostly just old HBase logs) and
everything seems to be working fine now.
Thanks again for the responses.
On Jun 19, 9:40 am, Ferdy Galema wrote:That's right, the corruption is early on in the file. This is what nn/
current looks like:
6965105 May 18 16:54 edits
894554862 Jun 18 09:31 edits.new
283050614 May 18 15:54 fsimage
8 May 18 15:54 fstime
114 May 18 15:54 VERSION
The nn2 is exactly the same (we checked hashes), except that is does
not have the edits.new file. Cutting away the entry from the log file
makes it fail on the next. I am very curious what the cause of NPE
exactly is. When looking in the source code it seems it fails when
adding a certain child node:
Line 1099 of FSDirectory:
T addedNode = ((INodeDirectory)pathComponents[pos-1]).addChild(
child, inheritPermission);
pathComponents is initialized (at line 1011 as 'inodes') but an
element of it is null. (That is the only way to throw a NPE at this
point.) Knowing the internals will definitely help cutting away all
faulty entries. Will look into it some more.
Thanks for you help so far. It is greatly appreciated.
On Jun 19, 12:09 am, Colin Patrick McCabe wrote:
I re-read your post more carefully, and now I see that the corruption
is happening rather early in the edit log file.
Unfortunately, you can't upgrade to CDH4 and use recovery mode at the
same time. There are actually good technical reasons for this... the
edit log format changed between CDH3 and CDH4. The new format is a
lot easier to work with from recovery's point of view, because it
includes things like transaction IDs.
Do you have multiple copies of the edit log and fsimage? It's highly
discouraged to run with no backups, for exactly this reason. If you
do, you might try moving the known-corrupt edit log out of the way and
seeing if it starts after that.
I'm assuming that the secondary NN has nothing useful on it, since you
say it has not been functioning for a month.
When was the last fsimage written out?
Colin
Software Engineer, Cloudera
On Jun 18, 1:05 pm, Colin Patrick McCabe wrote:
Hi Ferdy,
In CDH3 you only get two options: stop reading the edits file at the
current offset and quitting. CDH4 also has the "continue" option. In
general, CDH4 has a lot of great new stuff, and this is just one
example. This is mentioned in the blog post somewhere I believe, but I
don't blame you for asking for clarification.
In your specific case, it might be interesting to compare the offset
of the first corrupt edit that it is giving you with the end of the
file. If your corruption happened because of an out-of-disk-space
scenario, I imagine that those two numbers should be pretty close
together. So you shouldn't lose too many edits by snipping off the
edit log at this point. It's best to check on this before you do it,
of course, but that seems pretty likely to me.
cheers,
Colin
Software Engineer, Cloudera
On Jun 18, 12:20 pm, Ferdy Galema wrote:
Pretty sure that is not the case. We ran 'hadoop namenode -recover'
while pointing to the right namenode directory. On the errors (one for
each edits file) we had only 2 options, stop reading the edits file
(abandon later edits) or quit altogether. Looks like we are missing
the 'c' option... (Then again that might be due to our specific case).
On Jun 18, 9:15 pm, Vinithra Varadharajan <vinit...@cloudera.com>
wrote:
Ferdy,
I believe this feature is available with CDH3u4. So you should be fine
without upgrading.
-Vinithra
On Mon, Jun 18, 2012 at 12:09 PM, Ferdy Galema wrote:
Do you suggest we upgrade our entire cluster or can we just the run
cdh4 recovery tool binary agains the namenode directory? (We rather
not upgrade yet.)
Thanks.
On Jun 18, 8:46 pm, Colin Patrick McCabe wrote:
Hi Ferdy,
This would be a good place to use NameNode recovery. I wrote a blog
post about it at:
You should be careful to back up your fsimage and edit log directories
before running name node recovery, of course.
sincerely,
Colin
On Jun 18, 7:36 am, Ferdy Galema wrote:
We have a corrupt edits and edits.new. Also, we did not notice that
our nn2 was not functioning for over a month. Long story short, now we
have a fsimage of 1 month old, a small edits (just standard 1 hour)
and a huge edits.new. The reason of corruption is not certain, but
probably because of the disk full at one time.
When we start the nn from the cm, we see the following log in nn:
2012-06-18 16:07:41,452 INFO
org.apache.hadoop.hdfs.server.common.Storage: Number of files =
1883781
2012-06-18 16:08:05,669 INFO
org.apache.hadoop.hdfs.server.common.Storage: Number of files under
construction = 23
2012-06-18 16:08:05,674 INFO
org.apache.hadoop.hdfs.server.common.Storage: Image file of size
283050614 loaded in 24 seconds.
2012-06-18 16:08:05,675 WARN org.apache.hadoop.hdfs.StateChange: DIR*
FSDirectory.unprotectedSetTimes: failed to setTimes /hbase/.logs/
myServerName,60020,1333115929432-splitting/myServerName
%3A60020.1337297156177 because source does not exist
... SOME MORE FILES ...
2012-06-18 16:08:05,682 WARN org.apache.hadoop.hdfs.StateChange: DIR*
FSDirectory.unprotectedRenameTo: failed to rename /hbase/.logs/
myServerName,60020,1333115929432-splitting/myServerName
%3A60020.1337221541572 to /hbase/.oldlogs/myServerName
%3A60020.1337221541572 because source does not exist
... SOME MORE FILES ...
2012-06-18 16:08:05,707 ERROR
org.apache.hadoop.hdfs.server.common.Storage: Error replaying edit log
at offset 40548
Recent opcode offsets: 39902 40055 40214 40367
java.lang.NullPointerException
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.jav
a:
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.jav
a:
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java
:
org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDir
ectory.java:
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java
:
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImag
e.java:
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.
java:
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem
.java:
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.jav
a:
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:
271)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:
467)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.jav
a:
1330)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:
1339)
Starting the nn in recovery mode just skips all edits from the corrupt
entries. (Very early on in the file, so recovery like that is
definitely not usable).
So, can we upgrade to cdh4 to use the new recovery functions? Is it
even possible to do that in the current state?
As far as we know HBase and other files on the HDFS were all
functioning healthy until we restarted the HDFS. (Putting it in the
error state it is now in).