FAQ
The only way to get through that was to touch the tmp file needed on each
datanode for the system to get rid of that exception.

Then I took the HighAvailibility off (to simplify debugging, it's easier to
have one machine issues than 2...) and had to modified by hand the cTime
indicated in the VERSION file on each datanode because they didn't match
the NameNode :/

I assume I lost some data but at least the cluster seems to re-born.

Le mardi 18 février 2014 18:51:58 UTC+1, Laurent Edel a écrit :
Hi

CDH 5 beta 2 HDFS is down : NameNode can't start.
The reason is apparently a lack in the recovery process.

Here are the logs on the NameNode :

2014-02-18 18:46:10,224 FATAL
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error:
recoverUnfinalizedSegments failed for required journal
(JournalAndStream(mgr=QJM to [x.x.x.x:8485, ...:8485], stream=null))
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
exceptions to achieve quorum size 3/5. 4 exceptions thrown:
x.x.x.x:8485: /mnt/dd1/journalnode/journalhdfs/current/paxos/4180828.tmp
(No such file or directory)
at java.io.FileOutputStream.open(Native Method)
[...]
y.y.y.y:8485: /mnt/dd1/journalnode/journalhdfs/current/paxos/4180828.tmp
(No such file or directory)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
[...]
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)

at
org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
at
org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)
[...]
2014-02-18 18:46:10,227 INFO org.apache.hadoop.util.ExitUtil: Exiting with
status 1
2014-02-18 18:46:10,228 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at xxxx
************************************************************/


I looked into one JournalNode logs :

2014-02-18 18:27:38,768 INFO
org.apache.hadoop.hdfs.qjournal.server.Journal: Updating lastPromisedEpoch
from 53 to 54 for client /x.x.x.x
2014-02-18 18:27:38,770 INFO
org.apache.hadoop.hdfs.qjournal.server.Journal: Scanning storage
FileJournalManager(root=/mnt/dd1/journalnode/journalhdfs)
2014-02-18 18:27:38,793 INFO
org.apache.hadoop.hdfs.qjournal.server.Journal: Latest log is
EditLogFile(file=/mnt/dd1/journalnode/journalhdfs/current/edits_inprogress_0000000000004180828,first=000000000

0004180828,last=0000000000004181139,inProgress=true,hasCorruptHeader=false)
2014-02-18 18:27:38,820 INFO
org.apache.hadoop.hdfs.qjournal.server.Journal: getSegmentInfo(4180828):
EditLogFile(file=/mnt/dd1/journalnode/journalhdfs/current/edits_inprogress_0000000000004180828,firs

t=0000000000004180828,last=0000000000004181139,inProgress=true,hasCorruptHeader=false)
-> startTxId: 4180828 endTxId: 4181139 isInProgress: true
2014-02-18 18:27:38,821 INFO
org.apache.hadoop.hdfs.qjournal.server.Journal: Prepared recovery for
segment 4180828: segmentState { startTxId: 4180828 endTxId: 4181139
isInProgress: true } lastWriterEpoch: 1 lastCommittedTxId: 4181138
2014-02-18 18:27:38,869 INFO
org.apache.hadoop.hdfs.qjournal.server.Journal: getSegmentInfo(4180828):
EditLogFile(file=/mnt/dd1/journalnode/journalhdfs/current/edits_inprogress_0000000000004180828,first=0000000000004180828,last=0000000000004181139,inProgress=true,hasCorruptHeader=false)
-> startTxId: 4180828 endTxId: 4181139 isInProgress: true
2014-02-18 18:27:38,869 INFO
org.apache.hadoop.hdfs.qjournal.server.Journal: Skipping download of log
startTxId: 4180828 endTxId: 4181139 isInProgress: true: already have
up-to-date logs
2014-02-18 18:27:38,870 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:hdfs (auth:SIMPLE) *cause:java.io.FileNotFoundException:
/mnt/dd1/journalnode/journalhdfs/current/paxos/4180828.tmp (No such file or
directory)*
2014-02-18 18:27:38,870 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 3 on 8485, call
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.acceptRecovery
from x.x.x.x:34134 Call#27 Retry#0: error: java.io.FileNotFoundException:
/mnt/dd1/journalnode/journalhdfs/current/paxos/4180828.tmp (No such file or
directory)
java.io.FileNotFoundException:
/mnt/dd1/journalnode/journalhdfs/current/paxos/4180828.tmp (No such file or
directory)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
at java.io.FileOutputStream.<init>(FileOutputStream.java:171)
at
org.apache.hadoop.hdfs.util.AtomicFileOutputStream.<init>(AtomicFileOutputStream.java:56)
at
org.apache.hadoop.hdfs.qjournal.server.Journal.persistPaxosData(Journal.java:963)
at
org.apache.hadoop.hdfs.qjournal.server.Journal.acceptRecovery(Journal.java:838)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.acceptRecovery(JournalNodeRpcServer.java:205)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.acceptRecovery(QJournalProtocolServerSideTranslatorPB.java:242)
at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:24288)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)

Yes, the .tmp file which seems to be requested doesn't exists.

Maybe this is unrecoverable, but even if it's true, is there any way to
bypass this message, even if I'm loosing some data here, my priority is to
set the cluster up asap ?

JournalNodes are all up.
To unsubscribe from this group and stop receiving emails from it, send an email to scm-users+unsubscribe@cloudera.org.

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 2 | next ›
Discussion Overview
groupscm-users @
categorieshadoop
postedFeb 18, '14 at 5:52p
activeFeb 19, '14 at 10:50a
posts2
users1
websitecloudera.com
irc#hadoop

1 user in discussion

Laurent Edel: 2 posts

People

Translate

site design / logo © 2022 Grokbase