Hi All,
I'm running a 3 node Cluster, the NameNode basically ran out of space &
Cluster basically crashed. We freedup space & tried to start the NN, but it
won't come up.
Its giving following exception while coming up .
STARTUP_MSG: args = []
STARTUP_MSG: version = 0.18.3
STARTUP_MSG: build =
https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18 -r 736250;
compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC 2009
************************************************************/
2010-10-30 12:07:57,626 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=NameNode, port=54310
2010-10-30 12:07:57,632 INFO org.apache.hadoop.dfs.NameNode: Namenode up at:
red.hoonur.com/192.168.100.122:54310
2010-10-30 12:07:57,635 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=NameNode, sessionId=null
2010-10-30 12:07:57,651 INFO org.apache.hadoop.dfs.NameNodeMetrics:
Initializing NameNodeMeterics using context
object:org.apache.hadoop.metrics.spi.NullContext
2010-10-30 12:07:57,739 INFO org.apache.hadoop.fs.FSNamesystem:
fsOwner=hadoop,hadoop
2010-10-30 12:07:57,740 INFO org.apache.hadoop.fs.FSNamesystem:
supergroup=supergroup
2010-10-30 12:07:57,740 INFO org.apache.hadoop.fs.FSNamesystem:
isPermissionEnabled=false
2010-10-30 12:07:57,755 INFO org.apache.hadoop.dfs.FSNamesystemMetrics:
Initializing FSNamesystemMeterics using context
object:org.apache.hadoop.metrics.spi.NullContext
2010-10-30 12:07:57,756 INFO org.apache.hadoop.fs.FSNamesystem: Registered
FSNamesystemStatusMBean
2010-10-30 12:07:57,900 INFO org.apache.hadoop.dfs.Storage: Number of files
= 2988433
2010-10-30 12:09:05,014 INFO org.apache.hadoop.dfs.Storage: Number of files
under construction = 49
2010-10-30 12:09:05,315 INFO org.apache.hadoop.dfs.Storage: Image file of
size 395864924 loaded in 67 seconds.
2010-10-30 12:09:05,351 INFO org.apache.hadoop.dfs.Storage: Edits file edits
of size 22024 edits # 215 loaded in 0 seconds.
2010-10-30 12:09:05,379 ERROR org.apache.hadoop.fs.FSNamesystem:
FSNamesystem initialization failed.
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at org.apache.hadoop.io.Text.readString(Text.java:412)
at
org.apache.hadoop.fs.permission.PermissionStatus.readFields(PermissionStatus.java:84)
at
org.apache.hadoop.fs.permission.PermissionStatus.read(PermissionStatus.java:98)
at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:483)
at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:849)
at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:675)
at org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:289)
at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80)
at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:294)
at org.apache.hadoop.dfs.FSNamesystem.(NameNode.java:148)
at org.apache.hadoop.dfs.NameNode.(NameNode.java:179)
at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830)
at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839)
2010-10-30 12:09:05,380 INFO org.apache.hadoop.ipc.Server: Stopping server
on 54310
2010-10-30 12:09:05,384 ERROR org.apache.hadoop.dfs.NameNode:
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at org.apache.hadoop.io.Text.readString(Text.java:412)
at
org.apache.hadoop.fs.permission.PermissionStatus.readFields(PermissionStatus.java:84)
at
org.apache.hadoop.fs.permission.PermissionStatus.read(PermissionStatus.java:98)
at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:483)
at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:849)
at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:675)
at org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:289)
at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80)
at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:294)
at org.apache.hadoop.dfs.FSNamesystem.(NameNode.java:148)
at org.apache.hadoop.dfs.NameNode.(NameNode.java:179)
at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830)
at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839)
I've the following things from previous successful checkpoints.
Datadir/hadoop/dfs/name/current/
total 378M
22K 2010-10-30 14:00 edits
4.0K 2010-10-30 14:23 edits.new
378M 2010-10-30 13:05 fsimage
8 2010-10-30 13:05 fstime
101 2010-10-30 13:05 VERSION
The current size fsimage under /hadoop/dfs/name/image/ is of 157 bytes. I
believe If I can replace both current edit log with the oldest one ( edits
which is of 22k) size in my case & fsimage from the previous checkpoint, I
think it would do the trick & I'd be able to get the cluster online.
If this is the not the valid way to get the cluster up, then please suggest
what should be the correct way to get it up & running.
- m1nish