FAQ
Hi

CDH 5 beta 2 HDFS is down : NameNode can't start.
The reason is apparently a lack in the recovery process.

Here are the logs on the NameNode :

2014-02-18 18:46:10,224 FATAL
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error:
recoverUnfinalizedSegments failed for required journal
(JournalAndStream(mgr=QJM to [x.x.x.x:8485, ...:8485], stream=null))
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
exceptions to achieve quorum size 3/5. 4 exceptions thrown:
x.x.x.x:8485: /mnt/dd1/journalnode/journalhdfs/current/paxos/4180828.tmp
(No such file or directory)
         at java.io.FileOutputStream.open(Native Method)
[...]
y.y.y.y:8485: /mnt/dd1/journalnode/journalhdfs/current/paxos/4180828.tmp
(No such file or directory)
         at java.io.FileOutputStream.open(Native Method)
         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
[...]
         at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)

         at
org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
         at
org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)
[...]
2014-02-18 18:46:10,227 INFO org.apache.hadoop.util.ExitUtil: Exiting with
status 1
2014-02-18 18:46:10,228 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at xxxx
************************************************************/


I looked into one JournalNode logs :

2014-02-18 18:27:38,768 INFO
org.apache.hadoop.hdfs.qjournal.server.Journal: Updating lastPromisedEpoch
from 53 to 54 for client /x.x.x.x
2014-02-18 18:27:38,770 INFO
org.apache.hadoop.hdfs.qjournal.server.Journal: Scanning storage
FileJournalManager(root=/mnt/dd1/journalnode/journalhdfs)
2014-02-18 18:27:38,793 INFO
org.apache.hadoop.hdfs.qjournal.server.Journal: Latest log is
EditLogFile(file=/mnt/dd1/journalnode/journalhdfs/current/edits_inprogress_0000000000004180828,first=000000000

  0004180828,last=0000000000004181139,inProgress=true,hasCorruptHeader=false)
2014-02-18 18:27:38,820 INFO
org.apache.hadoop.hdfs.qjournal.server.Journal: getSegmentInfo(4180828):
EditLogFile(file=/mnt/dd1/journalnode/journalhdfs/current/edits_inprogress_0000000000004180828,firs

  t=0000000000004180828,last=0000000000004181139,inProgress=true,hasCorruptHeader=false)
-> startTxId: 4180828 endTxId: 4181139 isInProgress: true
2014-02-18 18:27:38,821 INFO
org.apache.hadoop.hdfs.qjournal.server.Journal: Prepared recovery for
segment 4180828: segmentState { startTxId: 4180828 endTxId: 4181139
isInProgress: true } lastWriterEpoch: 1 lastCommittedTxId: 4181138
2014-02-18 18:27:38,869 INFO
org.apache.hadoop.hdfs.qjournal.server.Journal: getSegmentInfo(4180828):
EditLogFile(file=/mnt/dd1/journalnode/journalhdfs/current/edits_inprogress_0000000000004180828,first=0000000000004180828,last=0000000000004181139,inProgress=true,hasCorruptHeader=false)
-> startTxId: 4180828 endTxId: 4181139 isInProgress: true
2014-02-18 18:27:38,869 INFO
org.apache.hadoop.hdfs.qjournal.server.Journal: Skipping download of log
startTxId: 4180828 endTxId: 4181139 isInProgress: true: already have
up-to-date logs
2014-02-18 18:27:38,870 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:hdfs (auth:SIMPLE) *cause:java.io.FileNotFoundException:
/mnt/dd1/journalnode/journalhdfs/current/paxos/4180828.tmp (No such file or
directory)*
2014-02-18 18:27:38,870 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 3 on 8485, call
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.acceptRecovery
from x.x.x.x:34134 Call#27 Retry#0: error: java.io.FileNotFoundException:
/mnt/dd1/journalnode/journalhdfs/current/paxos/4180828.tmp (No such file or
directory)
java.io.FileNotFoundException:
/mnt/dd1/journalnode/journalhdfs/current/paxos/4180828.tmp (No such file or
directory)
         at java.io.FileOutputStream.open(Native Method)
         at java.io.FileOutputStream.(FileOutputStream.java:171)
         at
org.apache.hadoop.hdfs.util.AtomicFileOutputStream.(Journal.java:963)
         at
org.apache.hadoop.hdfs.qjournal.server.Journal.acceptRecovery(Journal.java:838)
         at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.acceptRecovery(JournalNodeRpcServer.java:205)
         at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.acceptRecovery(QJournalProtocolServerSideTranslatorPB.java:242)
         at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:24288)
         at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:415)
         at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)

Yes, the .tmp file which seems to be requested doesn't exists.

Maybe this is unrecoverable, but even if it's true, is there any way to
bypass this message, even if I'm loosing some data here, my priority is to
set the cluster up asap ?

JournalNodes are all up.

To unsubscribe from this group and stop receiving emails from it, send an email to scm-users+unsubscribe@cloudera.org.

Search Discussions

  • Laurent Edel at Feb 19, 2014 at 10:50 am
    The only way to get through that was to touch the tmp file needed on each
    datanode for the system to get rid of that exception.

    Then I took the HighAvailibility off (to simplify debugging, it's easier to
    have one machine issues than 2...) and had to modified by hand the cTime
    indicated in the VERSION file on each datanode because they didn't match
    the NameNode :/

    I assume I lost some data but at least the cluster seems to re-born.

    Le mardi 18 février 2014 18:51:58 UTC+1, Laurent Edel a écrit :
    Hi

    CDH 5 beta 2 HDFS is down : NameNode can't start.
    The reason is apparently a lack in the recovery process.

    Here are the logs on the NameNode :

    2014-02-18 18:46:10,224 FATAL
    org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error:
    recoverUnfinalizedSegments failed for required journal
    (JournalAndStream(mgr=QJM to [x.x.x.x:8485, ...:8485], stream=null))
    org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
    exceptions to achieve quorum size 3/5. 4 exceptions thrown:
    x.x.x.x:8485: /mnt/dd1/journalnode/journalhdfs/current/paxos/4180828.tmp
    (No such file or directory)
    at java.io.FileOutputStream.open(Native Method)
    [...]
    y.y.y.y:8485: /mnt/dd1/journalnode/journalhdfs/current/paxos/4180828.tmp
    (No such file or directory)
    at java.io.FileOutputStream.open(Native Method)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
    [...]
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)

    at
    org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
    at
    org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)
    [...]
    2014-02-18 18:46:10,227 INFO org.apache.hadoop.util.ExitUtil: Exiting with
    status 1
    2014-02-18 18:46:10,228 INFO
    org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at xxxx
    ************************************************************/


    I looked into one JournalNode logs :

    2014-02-18 18:27:38,768 INFO
    org.apache.hadoop.hdfs.qjournal.server.Journal: Updating lastPromisedEpoch
    from 53 to 54 for client /x.x.x.x
    2014-02-18 18:27:38,770 INFO
    org.apache.hadoop.hdfs.qjournal.server.Journal: Scanning storage
    FileJournalManager(root=/mnt/dd1/journalnode/journalhdfs)
    2014-02-18 18:27:38,793 INFO
    org.apache.hadoop.hdfs.qjournal.server.Journal: Latest log is
    EditLogFile(file=/mnt/dd1/journalnode/journalhdfs/current/edits_inprogress_0000000000004180828,first=000000000

    0004180828,last=0000000000004181139,inProgress=true,hasCorruptHeader=false)
    2014-02-18 18:27:38,820 INFO
    org.apache.hadoop.hdfs.qjournal.server.Journal: getSegmentInfo(4180828):
    EditLogFile(file=/mnt/dd1/journalnode/journalhdfs/current/edits_inprogress_0000000000004180828,firs

    t=0000000000004180828,last=0000000000004181139,inProgress=true,hasCorruptHeader=false)
    -> startTxId: 4180828 endTxId: 4181139 isInProgress: true
    2014-02-18 18:27:38,821 INFO
    org.apache.hadoop.hdfs.qjournal.server.Journal: Prepared recovery for
    segment 4180828: segmentState { startTxId: 4180828 endTxId: 4181139
    isInProgress: true } lastWriterEpoch: 1 lastCommittedTxId: 4181138
    2014-02-18 18:27:38,869 INFO
    org.apache.hadoop.hdfs.qjournal.server.Journal: getSegmentInfo(4180828):
    EditLogFile(file=/mnt/dd1/journalnode/journalhdfs/current/edits_inprogress_0000000000004180828,first=0000000000004180828,last=0000000000004181139,inProgress=true,hasCorruptHeader=false)
    -> startTxId: 4180828 endTxId: 4181139 isInProgress: true
    2014-02-18 18:27:38,869 INFO
    org.apache.hadoop.hdfs.qjournal.server.Journal: Skipping download of log
    startTxId: 4180828 endTxId: 4181139 isInProgress: true: already have
    up-to-date logs
    2014-02-18 18:27:38,870 ERROR
    org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
    as:hdfs (auth:SIMPLE) *cause:java.io.FileNotFoundException:
    /mnt/dd1/journalnode/journalhdfs/current/paxos/4180828.tmp (No such file or
    directory)*
    2014-02-18 18:27:38,870 INFO org.apache.hadoop.ipc.Server: IPC Server
    handler 3 on 8485, call
    org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.acceptRecovery
    from x.x.x.x:34134 Call#27 Retry#0: error: java.io.FileNotFoundException:
    /mnt/dd1/journalnode/journalhdfs/current/paxos/4180828.tmp (No such file or
    directory)
    java.io.FileNotFoundException:
    /mnt/dd1/journalnode/journalhdfs/current/paxos/4180828.tmp (No such file or
    directory)
    at java.io.FileOutputStream.open(Native Method)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:171)
    at
    org.apache.hadoop.hdfs.util.AtomicFileOutputStream.<init>(AtomicFileOutputStream.java:56)
    at
    org.apache.hadoop.hdfs.qjournal.server.Journal.persistPaxosData(Journal.java:963)
    at
    org.apache.hadoop.hdfs.qjournal.server.Journal.acceptRecovery(Journal.java:838)
    at
    org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.acceptRecovery(JournalNodeRpcServer.java:205)
    at
    org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.acceptRecovery(QJournalProtocolServerSideTranslatorPB.java:242)
    at
    org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:24288)
    at
    org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)

    Yes, the .tmp file which seems to be requested doesn't exists.

    Maybe this is unrecoverable, but even if it's true, is there any way to
    bypass this message, even if I'm loosing some data here, my priority is to
    set the cluster up asap ?

    JournalNodes are all up.
    To unsubscribe from this group and stop receiving emails from it, send an email to scm-users+unsubscribe@cloudera.org.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupscm-users @
categorieshadoop
postedFeb 18, '14 at 5:52p
activeFeb 19, '14 at 10:50a
posts2
users1
websitecloudera.com
irc#hadoop

1 user in discussion

Laurent Edel: 2 posts

People

Translate

site design / logo © 2022 Grokbase