FAQ
Hi,

This morning the namenode of my hadoop cluster shut itself down after the
logs/ directory had filled itself with job configs, log files and all the
other fun things hadoop leaves there. It had been running for a few months.
I deleted all off the job configs and attempt log directories and tried to
restart the namenode, but it failed due to many LeaseManager errors.

Does anyone know what needs to be done to fix this and get the namenode back
up?

Here's what the logs report. I'm using Cloudera's 0.18.3 distro.

STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = my-host-name.com/10.15.137.204
STARTUP_MSG: args = []
STARTUP_MSG: version = 0.18.3-2
STARTUP_MSG: build = -r ; compiled by 'httpd' on Fri Jun 12 15:27:43 PDT
2009
************************************************************/
2010-02-02 13:38:31,199 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=NameNode, port=9000
2010-02-02 13:38:31,208 INFO org.apache.hadoop.dfs.NameNode: Namenode up at:
my-host-name.com/10.15.137.204:9000
2010-02-02 13:38:31,212 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=NameNode, sessionId=null
2010-02-02 13:38:31,218 INFO org.apache.hadoop.dfs.NameNodeMetrics:
Initializing NameNodeMeterics using context
object:org.apache.hadoop.metrics.spi.NullContext
2010-02-02 13:38:31,318 INFO org.apache.hadoop.fs.FSNamesystem:
fsOwner=app,app
2010-02-02 13:38:31,319 INFO org.apache.hadoop.fs.FSNamesystem:
supergroup=supergroup
2010-02-02 13:38:31,319 INFO org.apache.hadoop.fs.FSNamesystem:
isPermissionEnabled=true
2010-02-02 13:38:31,329 INFO org.apache.hadoop.dfs.FSNamesystemMetrics:
Initializing FSNamesystemMeterics using context
object:org.apache.hadoop.metrics.spi.NullContext
2010-02-02 13:38:31,331 INFO org.apache.hadoop.fs.FSNamesystem: Registered
FSNamesystemStatusMBean
2010-02-02 13:38:31,375 INFO org.apache.hadoop.dfs.Storage: Number of files
= 248675
2010-02-02 13:38:36,932 INFO org.apache.hadoop.dfs.Storage: Number of files
under construction = 2
2010-02-02 13:38:37,008 INFO org.apache.hadoop.dfs.Storage: Image file of
size 42924164 loaded in 5 seconds.
2010-02-02 13:38:37,020 ERROR org.apache.hadoop.dfs.LeaseManager:
/path/on/hdfs/_logs/history/my-host-name.com_1261508934685_job_200912221108_15967_conf.xml
not found in lease.paths
(=[/path/on/hdfs/_logs/history/my-host-name.com_1261508934685_job_200912221108_15967_app_MyJobName_20100202_10_59])

[[ a bunch more errors like the one above ]]

2010-02-02 13:38:37,076 ERROR org.apache.hadoop.fs.FSNamesystem:
FSNamesystem initialization failed.
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:585)
at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:846)
at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:675)
at
org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:289)
at
org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80)
at
org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:294)
at org.apache.hadoop.dfs.FSNamesystem.(NameNode.java:148)
at org.apache.hadoop.dfs.NameNode.(NameNode.java:179)
at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830)
at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839)
2010-02-02 13:38:37,077 INFO org.apache.hadoop.ipc.Server: Stopping server
on 9000
2010-02-02 13:38:37,081 ERROR org.apache.hadoop.dfs.NameNode:
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:585)
at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:846)
at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:675)
at
org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:289)
at
org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80)
at
org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:294)
at org.apache.hadoop.dfs.FSNamesystem.(NameNode.java:148)
at org.apache.hadoop.dfs.NameNode.(NameNode.java:179)
at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830)
at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839)

2010-02-02 13:38:37,082 INFO org.apache.hadoop.dfs.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at my-host-name.com/10.15.137.204
************************************************************/

thanks,
Bill

Search Discussions

  • Bill Graham at Feb 2, 2010 at 11:30 pm
    I was able to fix this by restoring my namenode from the last checkpoint of
    the secondary namenode. Searching the list I saw others have struggled with
    this issue so I'll share my steps.

    I did it by following Tom White's excellent instructions in Hadoop - The
    Definitive Guide:

    1. Stop the secondary name node. (Namenode was already stopped)
    2. Moved my namenode dir (configured as dfs.name.dir) aside.
    3. Started the namenode with the -importCheckpoint option like so:

    bin/hadoop-daemon.sh start namenode -importCheckpoint


    On Tue, Feb 2, 2010 at 1:54 PM, Bill Graham wrote:

    Hi,

    This morning the namenode of my hadoop cluster shut itself down after the
    logs/ directory had filled itself with job configs, log files and all the
    other fun things hadoop leaves there. It had been running for a few months.
    I deleted all off the job configs and attempt log directories and tried to
    restart the namenode, but it failed due to many LeaseManager errors.

    Does anyone know what needs to be done to fix this and get the namenode
    back up?

    Here's what the logs report. I'm using Cloudera's 0.18.3 distro.

    STARTUP_MSG: Starting NameNode
    STARTUP_MSG: host = my-host-name.com/10.15.137.204
    STARTUP_MSG: args = []
    STARTUP_MSG: version = 0.18.3-2
    STARTUP_MSG: build = -r ; compiled by 'httpd' on Fri Jun 12 15:27:43 PDT
    2009
    ************************************************************/
    2010-02-02 13:38:31,199 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
    Initializing RPC Metrics with hostName=NameNode, port=9000
    2010-02-02 13:38:31,208 INFO org.apache.hadoop.dfs.NameNode: Namenode up
    at: my-host-name.com/10.15.137.204:9000
    2010-02-02 13:38:31,212 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
    Initializing JVM Metrics with processName=NameNode, sessionId=null
    2010-02-02 13:38:31,218 INFO org.apache.hadoop.dfs.NameNodeMetrics:
    Initializing NameNodeMeterics using context
    object:org.apache.hadoop.metrics.spi.NullContext
    2010-02-02 13:38:31,318 INFO org.apache.hadoop.fs.FSNamesystem:
    fsOwner=app,app
    2010-02-02 13:38:31,319 INFO org.apache.hadoop.fs.FSNamesystem:
    supergroup=supergroup
    2010-02-02 13:38:31,319 INFO org.apache.hadoop.fs.FSNamesystem:
    isPermissionEnabled=true
    2010-02-02 13:38:31,329 INFO org.apache.hadoop.dfs.FSNamesystemMetrics:
    Initializing FSNamesystemMeterics using context
    object:org.apache.hadoop.metrics.spi.NullContext
    2010-02-02 13:38:31,331 INFO org.apache.hadoop.fs.FSNamesystem: Registered
    FSNamesystemStatusMBean
    2010-02-02 13:38:31,375 INFO org.apache.hadoop.dfs.Storage: Number of files
    = 248675
    2010-02-02 13:38:36,932 INFO org.apache.hadoop.dfs.Storage: Number of files
    under construction = 2
    2010-02-02 13:38:37,008 INFO org.apache.hadoop.dfs.Storage: Image file of
    size 42924164 loaded in 5 seconds.
    2010-02-02 13:38:37,020 ERROR org.apache.hadoop.dfs.LeaseManager:
    /path/on/hdfs/_logs/history/my-host-name.com_1261508934685_job_200912221108_15967_conf.xml
    not found in lease.paths
    (=[/path/on/hdfs/_logs/history/my-host-name.com_1261508934685_job_200912221108_15967_app_MyJobName_20100202_10_59])

    [[ a bunch more errors like the one above ]]

    2010-02-02 13:38:37,076 ERROR org.apache.hadoop.fs.FSNamesystem:
    FSNamesystem initialization failed.
    java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:585)
    at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:846)
    at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:675)
    at
    org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:289)
    at
    org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80)
    at
    org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:294)
    at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:273)
    at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)
    at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)
    at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)
    at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830)
    at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839)
    2010-02-02 13:38:37,077 INFO org.apache.hadoop.ipc.Server: Stopping server
    on 9000
    2010-02-02 13:38:37,081 ERROR org.apache.hadoop.dfs.NameNode:
    java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:585)
    at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:846)
    at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:675)
    at
    org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:289)
    at
    org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80)
    at
    org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:294)
    at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:273)
    at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)
    at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)
    at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)
    at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830)
    at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839)

    2010-02-02 13:38:37,082 INFO org.apache.hadoop.dfs.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at my-host-name.com/10.15.137.204
    ************************************************************/

    thanks,
    Bill

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedFeb 2, '10 at 9:55p
activeFeb 2, '10 at 11:30p
posts2
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Bill Graham: 2 posts

People

Translate

site design / logo © 2022 Grokbase