FAQ
Hi.

We are using a cluster of 2 computers (1 namenode and 2 secondarynodes)
to store a large number of text files in the HDFS. The process had been
running for atleast a couple of weeks when suddenly due to some power
failure, the server got reset. So, in effect, the HDFS didn't stop
cleanly. When I tried to restart the cluster, I got a Null Pointer
Exception, with the following stack trace (from the logs).

2011-05-18 06:57:39,313 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=NameNode, port=YYYYY
2011-05-18 06:57:39,321 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
master/172.XXX.XXX.XXX:YYYYY
2011-05-18 06:57:39,326 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=NameNode, sessionId=null
2011-05-18 06:57:39,329 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
Initializing NameNodeMeterics using context
object:org.apache.hadoop.metrics.spi.NullContext
2011-05-18 06:57:39,444 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=vishaal,vishaal
2011-05-18 06:57:39,444 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2011-05-18 06:57:39,444 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isPermissionEnabled=true
2011-05-18 06:57:39,459 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
Initializing FSNamesystemMetrics using context
object:org.apache.hadoop.metrics.spi.NullContext
2011-05-18 06:57:39,461 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
FSNamesystemStatusMBean
2011-05-18 06:57:39,521 INFO
org.apache.hadoop.hdfs.server.common.Storage: Number of files = 1
2011-05-18 06:57:39,531 INFO
org.apache.hadoop.hdfs.server.common.Storage: Number of files under
construction = 0
2011-05-18 06:57:39,531 INFO
org.apache.hadoop.hdfs.server.common.Storage: Image file of size 97
loaded in 0 seconds.
2011-05-18 06:57:39,532 INFO
org.apache.hadoop.hdfs.server.common.Storage: Edits file
/home/vishaal/hadoop-0.20.2/tmp/dfs/name/current/edits of size 0 edits #
0 loaded in 0 seconds.
2011-05-18 06:57:39,535 ERROR
org.apache.hadoop.hdfs.server.namenode.NameNode:
java.lang.NullPointerException
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1320)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1309)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:776)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:997)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(NameNode.java:201)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:956)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)

2011-05-18 06:57:39,537 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at 172.XXX.XXX.XXX
************************************************************/

Though this was just an experiment to test the reliability of the HDFS
storage, I would love to get it running again. This is, of course,
hoping that the data could be recovered (if it is corrupted). A couple
of more questions:

* Is this a common problem? Is there any available patch? (Although
I couldn't get after a lot of Googling).
* If the servers are prone to power failures, is it a good choice to
continue with HDFS for storage of data?
* If this occurs, does it mean that all the data is corrupt? Does it
mean not all but some data is corrupt? Can the corrupted data be
recovered?

Would appreciate a prompt reply as this was an attempt to prove the
concept of using distributed file system to store large amount of text
as opposed to a relational database. (I hope you understand that I am on
the line of fire).

Thanks in advance.
Vishaal Jatav.
(vishaal[dot]iitb04[at]gmail[dot]com)

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouphdfs-user @
categorieshadoop
postedMay 18, '11 at 11:17a
activeMay 18, '11 at 11:17a
posts1
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Vishaal Jatav: 1 post

People

Translate

site design / logo © 2022 Grokbase