Hi


we are running RabbitMQ v 2.8.4 in a two node cluster configuration.


we had an unplanned power outage and both the servers went down. when we
tried to restart the rabbitmq servers, only rabbit2 node starts up and the
node rabbit1 crashes on start.
we are running several mirrored queues between these nodes.one such queue
"Aiken" contained more than 65K messages before the outage.Now rabbit1 wont
start and rabbit2 starts fine but shows that there are only 109 old
messages in the "Aiken" Queue.We are afraid if we have lost the messages
from the rabbit1 crash.


Rabbit1 node crashes on startup on both conditions where rabbit2 was down
and also when rabbit2 was up.


we could see the following message in the startup log,


BOOT FAILED
==========

Error description:


{badmatch,{error,{"/var/lib/rabbitmq/mnesia/rabbit at rabbit1/queues/1NGZF3JZJR0SU2C0VE2S25JRP/clean.dot",
eacces}}}


but could not understand what it actually means. But I am guessing rabbit1
and rabbit2 went out of sync.


rabbitmqctl status would yield the following message.
[root at rabbit1 ~]# rabbitmqctl status
Status of node rabbit at rabbit1 ...
Error: unable to connect to node rabbit at rabbit1: nodedown


DIAGNOSTICS
==========

nodes in question: [rabbit at rabbit1]


hosts, their running nodes and ports:
- rabbit1: [{rabbitmqctl7856,46808}]


current node details:
- node name: rabbitmqctl7856 at rabbit1
- home dir: /var/lib/rabbitmq
- cookie hash: WYsTAr/DZ8KD7QQhMu5SSg=



logs are available here:
rabbit at rabbit1-sasl.log -->
https://docs.google.com/open?id 2mCr6qtz2xOS01YZndfTy1DWms
rabbit at rabbit1.log -->
https://docs.google.com/open?id 2mCr6qtz2xOcEYwcWl2RDV3YTg
startup_log --> https://docs.google.com/open?id 2mCr6qtz2xOOHI2bXQ5OWw4TUE


Can anyone help me with ideas to recover rabbit1 ??
Is there a way to tweak the startup of Rabbit1 so that it would start as an
independent node ?


The data in stake is really important. I would appreciate any help.


- Aravindh








-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20121215/1cf67dbd/attachment.htm>

Search Discussions

  • Simon MacMullen at Dec 17, 2012 at 10:59 am

    On 15/12/2012 8:22PM, Aravindh S wrote:
    Hi

    Hi.

    we are running RabbitMQ v 2.8.4 in a two node cluster configuration.

    we had an unplanned power outage and both the servers went down. when we
    tried to restart the rabbitmq servers, only rabbit2 node starts up and
    the node rabbit1 crashes on start.
    we are running several mirrored queues between these nodes.one such
    queue "Aiken" contained more than 65K messages before the outage.Now
    rabbit1 wont start and rabbit2 starts fine but shows that there are only
    109 old messages in the "Aiken" Queue.We are afraid if we have lost the
    messages from the rabbit1 crash.

    At the risk of asking something obvious: were all the messages published
    to "Aiken" published with delivery_mode=2 (persistent)? And
    non-persistent messages will be removed from the queue after restart.

    Rabbit1 node crashes on startup on both conditions where rabbit2 was
    down and also when rabbit2 was up.

    we could see the following message in the startup log,

    BOOT FAILED
    ===========

    Error description:

    {badmatch,{error,{"/var/lib/rabbitmq/mnesia/rabbit at rabbit1/queues/1NGZF3JZJR0SU2C0VE2S25JRP/clean.dot",
    eacces}}}

    "eacces" is the key here - for some reason the server is not being
    permitted to read the file by the operating system. Assuming you have
    installed via debs / RPMs, all files under /var/lib/rabbitmq/mnesia
    should be owned by the "rabbitmq" user - are they?

    logs are available here:

    Looking at the logs it looks like you had several attempts to start
    rabbit1 before that error message showed up, but they were stymied by a
    bug in the management plugin startup code that had been fixed since 2.8.4...

    Can anyone help me with ideas to recover rabbit1 ??
    Is there a way to tweak the startup of Rabbit1 so that it would start as
    an independent node ?

    ...however, even if you start rabbit1 as part of the cluster it will
    start its mirrored queues from scratch (see
    http://www.rabbitmq.com/ha.html#unsynchronised-slaves).


    It's not easy to start such a node independently in 2.x I'm afraid (this
    was improved in 3.0). I wrote some rather ad-hoc instructions here:
    http://rabbitmq.1065348.n5.nabble.com/Repairing-a-a-crashed-cluster-td22466.html


    But I'm afraid that if the messages were originally published in
    non-persistent mode you won't get them back - they would never even have
    made it to disc.


    Cheers, Simon

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouprabbitmq-discuss @
categoriesrabbitmq
postedDec 15, '12 at 8:22p
activeDec 17, '12 at 10:59a
posts2
users2
websiterabbitmq.com
irc#rabbitmq

2 users in discussion

Simon MacMullen: 1 post Aravindh S: 1 post

People

Translate

site design / logo © 2017 Grokbase