FAQ
Hi Bob,

This is not a known issue that I'm aware of, but it is very interesting.
Can you reproduce producing the fsimage with trailing zeros from a
previously-working fsimage? Were the zeros appended during the upgrade from
CDH 3u5 to CDH 4.1.2? If you'd be comfortable doing so, would you mind
sending me (perhaps off list) the complete contents of your dfs.name.dirs
before and after upgrade, but before you trimmed the trailing zeros?

Thanks a lot for reporting this.

--
Aaron T. Myers
Software Engineer, Cloudera

On Wed, Dec 19, 2012 at 9:16 AM, Bob Copeland wrote:

Hello,

I recently hit a snag during a cdh3 to cdh4.2.1 upgrade:

2012-12-13 00:21:03,259 INFO
org.apache.hadoop.hdfs.server.namenode.NNStorage: Using clusterid:
CID-76ce587d-0eef-43f8-b8b8-385cde0a3e47
2012-12-13 00:21:03,280 INFO
org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering
unfinalized segments in /var/lib/hadoop/dfs/name/current
2012-12-13 00:21:03,294 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage: Loading image file
/var/lib/hadoop/dfs/name/current/fsimage using no compression
2012-12-13 00:21:03,294 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files = 43
2012-12-13 00:21:03,310 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files under
construction = 0
2012-12-13 00:21:03,311 FATAL
org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
java.lang.AssertionError: Should have reached the end of image file
/var/lib/hadoop/dfs/name/current/fsimage
at
org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.load(FSImageFormat.java:185)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:757)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:654)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(FSImage.java:342)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:255)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:534)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:424)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:386)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:398)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:432)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:608)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:589)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1140)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)
2012-12-13 00:21:03,314 INFO org.apache.hadoop.util.ExitUtil: Exiting with
status 1
2012-12-13 00:21:03,316 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu/192.168.1.60
************************************************************/

I instrumented the code around the exception and found that the loader had
read
all but 16 bytes of the file, and the remaining 16 bytes are all zeroes. So
chopping off the last 16 bytes of padding was a suitable workaround, i.e.:

fsimage=/var/lib/hadoop/dfs/name/current/fsimage
cp $fsimage{,~}
size=$(stat -c %s $fsimage)
dd if=$fsimage~ of=$fsimage bs=$[size-16] count=1

Is this a known issue? I did all these tests in a scratch cdh3u5 VM and
can
replicate at will if needed.

-Bob

--


--

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 5 | next ›
Discussion Overview
groupcdh-user @
categorieshadoop
postedDec 19, '12 at 5:16p
activeDec 21, '12 at 2:01p
posts5
users4
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2018 Grokbase