FAQ
Hello,

I recently hit a snag during a cdh3 to cdh4.2.1 upgrade:

2012-12-13 00:21:03,259 INFO
org.apache.hadoop.hdfs.server.namenode.NNStorage: Using clusterid:
CID-76ce587d-0eef-43f8-b8b8-385cde0a3e47
2012-12-13 00:21:03,280 INFO
org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering
unfinalized segments in /var/lib/hadoop/dfs/name/current
2012-12-13 00:21:03,294 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage: Loading image file
/var/lib/hadoop/dfs/name/current/fsimage using no compression
2012-12-13 00:21:03,294 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files = 43
2012-12-13 00:21:03,310 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files under
construction = 0
2012-12-13 00:21:03,311 FATAL
org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
java.lang.AssertionError: Should have reached the end of image file
/var/lib/hadoop/dfs/name/current/fsimage
at
org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.load(FSImageFormat.java:185)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:757)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:654)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(FSImage.java:342)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:255)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:534)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:424)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:386)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:398)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:432)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:589)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1140)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)
2012-12-13 00:21:03,314 INFO org.apache.hadoop.util.ExitUtil: Exiting with
status 1
2012-12-13 00:21:03,316 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu/192.168.1.60
************************************************************/

I instrumented the code around the exception and found that the loader had
read
all but 16 bytes of the file, and the remaining 16 bytes are all zeroes. So
chopping off the last 16 bytes of padding was a suitable workaround, i.e.:

fsimage=/var/lib/hadoop/dfs/name/current/fsimage
cp $fsimage{,~}
size=$(stat -c %s $fsimage)
dd if=$fsimage~ of=$fsimage bs=$[size-16] count=1

Is this a known issue? I did all these tests in a scratch cdh3u5 VM and can
replicate at will if needed.

-Bob

--

Search Discussions

  • Aaron T. Myers at Dec 19, 2012 at 5:30 pm
    Hi Bob,

    This is not a known issue that I'm aware of, but it is very interesting.
    Can you reproduce producing the fsimage with trailing zeros from a
    previously-working fsimage? Were the zeros appended during the upgrade from
    CDH 3u5 to CDH 4.1.2? If you'd be comfortable doing so, would you mind
    sending me (perhaps off list) the complete contents of your dfs.name.dirs
    before and after upgrade, but before you trimmed the trailing zeros?

    Thanks a lot for reporting this.

    --
    Aaron T. Myers
    Software Engineer, Cloudera

    On Wed, Dec 19, 2012 at 9:16 AM, Bob Copeland wrote:

    Hello,

    I recently hit a snag during a cdh3 to cdh4.2.1 upgrade:

    2012-12-13 00:21:03,259 INFO
    org.apache.hadoop.hdfs.server.namenode.NNStorage: Using clusterid:
    CID-76ce587d-0eef-43f8-b8b8-385cde0a3e47
    2012-12-13 00:21:03,280 INFO
    org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering
    unfinalized segments in /var/lib/hadoop/dfs/name/current
    2012-12-13 00:21:03,294 INFO
    org.apache.hadoop.hdfs.server.namenode.FSImage: Loading image file
    /var/lib/hadoop/dfs/name/current/fsimage using no compression
    2012-12-13 00:21:03,294 INFO
    org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files = 43
    2012-12-13 00:21:03,310 INFO
    org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files under
    construction = 0
    2012-12-13 00:21:03,311 FATAL
    org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
    java.lang.AssertionError: Should have reached the end of image file
    /var/lib/hadoop/dfs/name/current/fsimage
    at
    org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.load(FSImageFormat.java:185)
    at
    org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:757)
    at
    org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:654)
    at
    org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(FSImage.java:342)
    at
    org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:255)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:534)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:424)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:386)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:398)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:432)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:608)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:589)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1140)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)
    2012-12-13 00:21:03,314 INFO org.apache.hadoop.util.ExitUtil: Exiting with
    status 1
    2012-12-13 00:21:03,316 INFO
    org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at ubuntu/192.168.1.60
    ************************************************************/

    I instrumented the code around the exception and found that the loader had
    read
    all but 16 bytes of the file, and the remaining 16 bytes are all zeroes. So
    chopping off the last 16 bytes of padding was a suitable workaround, i.e.:

    fsimage=/var/lib/hadoop/dfs/name/current/fsimage
    cp $fsimage{,~}
    size=$(stat -c %s $fsimage)
    dd if=$fsimage~ of=$fsimage bs=$[size-16] count=1

    Is this a known issue? I did all these tests in a scratch cdh3u5 VM and
    can
    replicate at will if needed.

    -Bob

    --


    --
  • Bob Copeland at Dec 21, 2012 at 2:01 pm
    Data sent off-list.

    On Wed, Dec 19, 2012 at 12:30 PM, Aaron T. Myers wrote:

    Hi Bob,

    This is not a known issue that I'm aware of, but it is very interesting.
    Can you reproduce producing the fsimage with trailing zeros from a
    previously-working fsimage? Were the zeros appended during the upgrade from
    CDH 3u5 to CDH 4.1.2? If you'd be comfortable doing so, would you mind
    sending me (perhaps off list) the complete contents of your dfs.name.dirs
    before and after upgrade, but before you trimmed the trailing zeros?

    Thanks a lot for reporting this.

    --
    Aaron T. Myers
    Software Engineer, Cloudera

    On Wed, Dec 19, 2012 at 9:16 AM, Bob Copeland wrote:

    Hello,

    I recently hit a snag during a cdh3 to cdh4.2.1 upgrade:

    2012-12-13 00:21:03,259 INFO
    org.apache.hadoop.hdfs.server.namenode.NNStorage: Using clusterid:
    CID-76ce587d-0eef-43f8-b8b8-385cde0a3e47
    2012-12-13 00:21:03,280 INFO
    org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering
    unfinalized segments in /var/lib/hadoop/dfs/name/current
    2012-12-13 00:21:03,294 INFO
    org.apache.hadoop.hdfs.server.namenode.FSImage: Loading image file
    /var/lib/hadoop/dfs/name/current/fsimage using no compression
    2012-12-13 00:21:03,294 INFO
    org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files = 43
    2012-12-13 00:21:03,310 INFO
    org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files under
    construction = 0
    2012-12-13 00:21:03,311 FATAL
    org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
    java.lang.AssertionError: Should have reached the end of image file
    /var/lib/hadoop/dfs/name/current/fsimage
    at
    org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.load(FSImageFormat.java:185)
    at
    org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:757)
    at
    org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:654)
    at
    org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(FSImage.java:342)
    at
    org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:255)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:534)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:424)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:386)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:398)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:432)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:608)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:589)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1140)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)
    2012-12-13 00:21:03,314 INFO org.apache.hadoop.util.ExitUtil: Exiting
    with status 1
    2012-12-13 00:21:03,316 INFO
    org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at ubuntu/192.168.1.60
    ************************************************************/

    I instrumented the code around the exception and found that the loader
    had read
    all but 16 bytes of the file, and the remaining 16 bytes are all zeroes.
    So
    chopping off the last 16 bytes of padding was a suitable workaround, i.e.:

    fsimage=/var/lib/hadoop/dfs/name/current/fsimage
    cp $fsimage{,~}
    size=$(stat -c %s $fsimage)
    dd if=$fsimage~ of=$fsimage bs=$[size-16] count=1

    Is this a known issue? I did all these tests in a scratch cdh3u5 VM and
    can
    replicate at will if needed.

    -Bob

    --


    --




    --
    Bob Copeland %% www.bobcopeland.com

    --
  • Andy Isaacson at Dec 19, 2012 at 8:20 pm

    On Wed, Dec 19, 2012 at 9:16 AM, Bob Copeland wrote:
    2012-12-13 00:21:03,311 FATAL
    org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
    java.lang.AssertionError: Should have reached the end of image file
    /var/lib/hadoop/dfs/name/current/fsimage
    at
    org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.load(FSImageFormat.java:185) ...
    I instrumented the code around the exception and found that the loader had
    read
    all but 16 bytes of the file, and the remaining 16 bytes are all zeroes. So
    chopping off the last 16 bytes of padding was a suitable workaround, i.e.:

    fsimage=/var/lib/hadoop/dfs/name/current/fsimage
    cp $fsimage{,~}
    size=$(stat -c %s $fsimage)
    dd if=$fsimage~ of=$fsimage bs=$[size-16] count=1
    What was the size of the file (before you chopped off the 0s)? What
    filesystem is your fsimage stored on, and what kernel are you running?

    -andy

    --
  • Bob Copeland at Dec 21, 2012 at 2:01 pm

    fsimage=/var/lib/hadoop/dfs/name/current/fsimage
    cp $fsimage{,~}
    size=$(stat -c %s $fsimage)
    dd if=$fsimage~ of=$fsimage bs=$[size-16] count=1
    What was the size of the file (before you chopped off the 0s)? What
    filesystem is your fsimage stored on, and what kernel are you running?
    Hi Andy,

    I actually hit this on three different HDFS filesystems, but I most recently
    did it on a minimal FS in qemu with these specs:

    -rw-r--r-- 1 bob bob 4463 Dec 19 17:19 fsimage
    /dev/sda1 on / type ext4 (rw,errors=remount-ro)
    Linux ubuntu 3.2.0-34-generic #53-Ubuntu SMP Thu Nov 15 10:48:16 UTC 2012
    x86_64 x86_64 x86_64 GNU/Linux
    (Ubuntu 12.04.1)

    Interestingly, it seems doing "hdfs dfsadmin -saveNamespace" on the old
    image
    might figure in, since I have an image prior to doing that which upgraded
    fine.

    --
    Bob Copeland %% www.bobcopeland.com

    --

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcdh-user @
categorieshadoop
postedDec 19, '12 at 5:16p
activeDec 21, '12 at 2:01p
posts5
users4
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2018 Grokbase