Hello,
I recently hit a snag during a cdh3 to cdh4.2.1 upgrade:
2012-12-13 00:21:03,259 INFO
org.apache.hadoop.hdfs.server.namenode.NNStorage: Using clusterid:
CID-76ce587d-0eef-43f8-b8b8-385cde0a3e47
2012-12-13 00:21:03,280 INFO
org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering
unfinalized segments in /var/lib/hadoop/dfs/name/current
2012-12-13 00:21:03,294 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage: Loading image file
/var/lib/hadoop/dfs/name/current/fsimage using no compression
2012-12-13 00:21:03,294 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files = 43
2012-12-13 00:21:03,310 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files under
construction = 0
2012-12-13 00:21:03,311 FATAL
org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
java.lang.AssertionError: Should have reached the end of image file
/var/lib/hadoop/dfs/name/current/fsimage
at
org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.load(FSImageFormat.java:185)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:757)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:654)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(FSImage.java:342)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:255)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:534)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:424)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:386)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:398)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:432)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:589)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1140)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)
2012-12-13 00:21:03,314 INFO org.apache.hadoop.util.ExitUtil: Exiting with
status 1
2012-12-13 00:21:03,316 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu/192.168.1.60
************************************************************/
I instrumented the code around the exception and found that the loader had
read
all but 16 bytes of the file, and the remaining 16 bytes are all zeroes. So
chopping off the last 16 bytes of padding was a suitable workaround, i.e.:
fsimage=/var/lib/hadoop/dfs/name/current/fsimage
cp $fsimage{,~}
size=$(stat -c %s $fsimage)
dd if=$fsimage~ of=$fsimage bs=$[size-16] count=1
Is this a known issue? I did all these tests in a scratch cdh3u5 VM and can
replicate at will if needed.
-Bob
--