FAQ
Hi All,

I'm running a 3 node Cluster, the NameNode basically ran out of space &
Cluster basically crashed. We freedup space & tried to start the NN, but it
won't come up.

Its giving following exception while coming up .

STARTUP_MSG: args = []
STARTUP_MSG: version = 0.18.3
STARTUP_MSG: build =
https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18 -r 736250;
compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC 2009
************************************************************/
2010-10-30 12:07:57,626 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=NameNode, port=54310
2010-10-30 12:07:57,632 INFO org.apache.hadoop.dfs.NameNode: Namenode up at:
red.hoonur.com/192.168.100.122:54310
2010-10-30 12:07:57,635 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=NameNode, sessionId=null
2010-10-30 12:07:57,651 INFO org.apache.hadoop.dfs.NameNodeMetrics:
Initializing NameNodeMeterics using context
object:org.apache.hadoop.metrics.spi.NullContext
2010-10-30 12:07:57,739 INFO org.apache.hadoop.fs.FSNamesystem:
fsOwner=hadoop,hadoop
2010-10-30 12:07:57,740 INFO org.apache.hadoop.fs.FSNamesystem:
supergroup=supergroup
2010-10-30 12:07:57,740 INFO org.apache.hadoop.fs.FSNamesystem:
isPermissionEnabled=false
2010-10-30 12:07:57,755 INFO org.apache.hadoop.dfs.FSNamesystemMetrics:
Initializing FSNamesystemMeterics using context
object:org.apache.hadoop.metrics.spi.NullContext
2010-10-30 12:07:57,756 INFO org.apache.hadoop.fs.FSNamesystem: Registered
FSNamesystemStatusMBean
2010-10-30 12:07:57,900 INFO org.apache.hadoop.dfs.Storage: Number of files
= 2988433
2010-10-30 12:09:05,014 INFO org.apache.hadoop.dfs.Storage: Number of files
under construction = 49
2010-10-30 12:09:05,315 INFO org.apache.hadoop.dfs.Storage: Image file of
size 395864924 loaded in 67 seconds.
2010-10-30 12:09:05,351 INFO org.apache.hadoop.dfs.Storage: Edits file edits
of size 22024 edits # 215 loaded in 0 seconds.
2010-10-30 12:09:05,379 ERROR org.apache.hadoop.fs.FSNamesystem:
FSNamesystem initialization failed.
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at org.apache.hadoop.io.Text.readString(Text.java:412)
at
org.apache.hadoop.fs.permission.PermissionStatus.readFields(PermissionStatus.java:84)
at
org.apache.hadoop.fs.permission.PermissionStatus.read(PermissionStatus.java:98)
at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:483)
at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:849)
at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:675)
at org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:289)
at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80)
at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:294)
at org.apache.hadoop.dfs.FSNamesystem.(NameNode.java:148)
at org.apache.hadoop.dfs.NameNode.(NameNode.java:179)
at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830)
at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839)
2010-10-30 12:09:05,380 INFO org.apache.hadoop.ipc.Server: Stopping server
on 54310
2010-10-30 12:09:05,384 ERROR org.apache.hadoop.dfs.NameNode:
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at org.apache.hadoop.io.Text.readString(Text.java:412)
at
org.apache.hadoop.fs.permission.PermissionStatus.readFields(PermissionStatus.java:84)
at
org.apache.hadoop.fs.permission.PermissionStatus.read(PermissionStatus.java:98)
at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:483)
at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:849)
at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:675)
at org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:289)
at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80)
at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:294)
at org.apache.hadoop.dfs.FSNamesystem.(NameNode.java:148)
at org.apache.hadoop.dfs.NameNode.(NameNode.java:179)
at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830)
at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839)


I've the following things from previous successful checkpoints.

Datadir/hadoop/dfs/name/current/
total 378M

22K 2010-10-30 14:00 edits
4.0K 2010-10-30 14:23 edits.new
378M 2010-10-30 13:05 fsimage
8 2010-10-30 13:05 fstime
101 2010-10-30 13:05 VERSION

The current size fsimage under /hadoop/dfs/name/image/ is of 157 bytes. I
believe If I can replace both current edit log with the oldest one ( edits
which is of 22k) size in my case & fsimage from the previous checkpoint, I
think it would do the trick & I'd be able to get the cluster online.

If this is the not the valid way to get the cluster up, then please suggest
what should be the correct way to get it up & running.

- m1nish

Search Discussions

  • Sudhir Vallamkondu at Oct 30, 2010 at 11:09 pm
    Do you run secondary name node? You can use the copy of fsimage and the
    editlog from the SNN to recover. Remember that it will be (roughly) an hour
    old (default checkpoint config for SNN). The process
    for recovery is to copy the fsimage and editlog to a new machine,
    place them in the dfs.name.dir/current directory, and start all the
    daemons. For cases like these you should configure the namenode to write to
    multiple directories, including one over a network filesystem or SAN so
    that you always have a fresh copy.

    - Sudhir



    On 10/30/10 3:33 PM, "common-user-digest-help@hadoop.apache.org"
    wrote:
    From: Manish Nene <m1n3s6@gmail.com>
    Date: Sat, 30 Oct 2010 21:07:58 +0530
    To: <common-user@hadoop.apache.org>
    Subject: Hadoop NameNode Startup Problem

    Hi All,

    I'm running a 3 node Cluster, the NameNode basically ran out of space &
    Cluster basically crashed. We freedup space & tried to start the NN, but it
    won't come up.

    Its giving following exception while coming up .

    STARTUP_MSG: args = []
    STARTUP_MSG: version = 0.18.3
    STARTUP_MSG: build =
    https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18 -r 736250;
    compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC 2009
    ************************************************************/
    2010-10-30 12:07:57,626 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
    Initializing RPC Metrics with hostName=NameNode, port=54310
    2010-10-30 12:07:57,632 INFO org.apache.hadoop.dfs.NameNode: Namenode up at:
    red.hoonur.com/192.168.100.122:54310
    2010-10-30 12:07:57,635 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
    Initializing JVM Metrics with processName=NameNode, sessionId=null
    2010-10-30 12:07:57,651 INFO org.apache.hadoop.dfs.NameNodeMetrics:
    Initializing NameNodeMeterics using context
    object:org.apache.hadoop.metrics.spi.NullContext
    2010-10-30 12:07:57,739 INFO org.apache.hadoop.fs.FSNamesystem:
    fsOwner=hadoop,hadoop
    2010-10-30 12:07:57,740 INFO org.apache.hadoop.fs.FSNamesystem:
    supergroup=supergroup
    2010-10-30 12:07:57,740 INFO org.apache.hadoop.fs.FSNamesystem:
    isPermissionEnabled=false
    2010-10-30 12:07:57,755 INFO org.apache.hadoop.dfs.FSNamesystemMetrics:
    Initializing FSNamesystemMeterics using context
    object:org.apache.hadoop.metrics.spi.NullContext
    2010-10-30 12:07:57,756 INFO org.apache.hadoop.fs.FSNamesystem: Registered
    FSNamesystemStatusMBean
    2010-10-30 12:07:57,900 INFO org.apache.hadoop.dfs.Storage: Number of files
    = 2988433
    2010-10-30 12:09:05,014 INFO org.apache.hadoop.dfs.Storage: Number of files
    under construction = 49
    2010-10-30 12:09:05,315 INFO org.apache.hadoop.dfs.Storage: Image file of
    size 395864924 loaded in 67 seconds.
    2010-10-30 12:09:05,351 INFO org.apache.hadoop.dfs.Storage: Edits file edits
    of size 22024 edits # 215 loaded in 0 seconds.
    2010-10-30 12:09:05,379 ERROR org.apache.hadoop.fs.FSNamesystem:
    FSNamesystem initialization failed.
    java.io.EOFException
    at java.io.DataInputStream.readFully(DataInputStream.java:180)
    at org.apache.hadoop.io.Text.readString(Text.java:412)
    at
    org.apache.hadoop.fs.permission.PermissionStatus.readFields(PermissionStatus.j
    ava:84)
    at
    org.apache.hadoop.fs.permission.PermissionStatus.read(PermissionStatus.java:98>
    )
    at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:483)
    at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:849)
    at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:675)
    at org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:289)
    at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80)
    at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:294)
    at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:273)
    at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)
    at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)
    at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)
    at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830)
    at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839)
    2010-10-30 12:09:05,380 INFO org.apache.hadoop.ipc.Server: Stopping server
    on 54310
    2010-10-30 12:09:05,384 ERROR org.apache.hadoop.dfs.NameNode:
    java.io.EOFException
    at java.io.DataInputStream.readFully(DataInputStream.java:180)
    at org.apache.hadoop.io.Text.readString(Text.java:412)
    at
    org.apache.hadoop.fs.permission.PermissionStatus.readFields(PermissionStatus.j
    ava:84)
    at
    org.apache.hadoop.fs.permission.PermissionStatus.read(PermissionStatus.java:98>
    )
    at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:483)
    at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:849)
    at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:675)
    at org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:289)
    at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80)
    at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:294)
    at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:273)
    at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)
    at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)
    at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)
    at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830)
    at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839)


    I've the following things from previous successful checkpoints.

    Datadir/hadoop/dfs/name/current/
    total 378M

    22K 2010-10-30 14:00 edits
    4.0K 2010-10-30 14:23 edits.new
    378M 2010-10-30 13:05 fsimage
    8 2010-10-30 13:05 fstime
    101 2010-10-30 13:05 VERSION

    The current size fsimage under /hadoop/dfs/name/image/ is of 157 bytes. I
    believe If I can replace both current edit log with the oldest one ( edits
    which is of 22k) size in my case & fsimage from the previous checkpoint, I
    think it would do the trick & I'd be able to get the cluster online.

    If this is the not the valid way to get the cluster up, then please suggest
    what should be the correct way to get it up & running.

    - m1nish

    iCrossing Privileged and Confidential Information
    This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information of iCrossing. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedOct 30, '10 at 3:38p
activeOct 30, '10 at 11:09p
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Sudhir Vallamkondu: 1 post Manish Nene: 1 post

People

Translate

site design / logo © 2022 Grokbase