On 15/12/2012 8:22PM, Aravindh S wrote:
Hi

Hi.

we are running RabbitMQ v 2.8.4 in a two node cluster configuration.

we had an unplanned power outage and both the servers went down. when we
tried to restart the rabbitmq servers, only rabbit2 node starts up and
the node rabbit1 crashes on start.
we are running several mirrored queues between these nodes.one such
queue "Aiken" contained more than 65K messages before the outage.Now
rabbit1 wont start and rabbit2 starts fine but shows that there are only
109 old messages in the "Aiken" Queue.We are afraid if we have lost the
messages from the rabbit1 crash.

At the risk of asking something obvious: were all the messages published
to "Aiken" published with delivery_mode=2 (persistent)? And
non-persistent messages will be removed from the queue after restart.

Rabbit1 node crashes on startup on both conditions where rabbit2 was
down and also when rabbit2 was up.

we could see the following message in the startup log,

BOOT FAILED
===========

Error description:

{badmatch,{error,{"/var/lib/rabbitmq/mnesia/rabbit at rabbit1/queues/1NGZF3JZJR0SU2C0VE2S25JRP/clean.dot",
eacces}}}

"eacces" is the key here - for some reason the server is not being
permitted to read the file by the operating system. Assuming you have
installed via debs / RPMs, all files under /var/lib/rabbitmq/mnesia
should be owned by the "rabbitmq" user - are they?

logs are available here:

Looking at the logs it looks like you had several attempts to start
rabbit1 before that error message showed up, but they were stymied by a
bug in the management plugin startup code that had been fixed since 2.8.4...

Can anyone help me with ideas to recover rabbit1 ??
Is there a way to tweak the startup of Rabbit1 so that it would start as
an independent node ?

...however, even if you start rabbit1 as part of the cluster it will
start its mirrored queues from scratch (see
http://www.rabbitmq.com/ha.html#unsynchronised-slaves).


It's not easy to start such a node independently in 2.x I'm afraid (this
was improved in 3.0). I wrote some rather ad-hoc instructions here:
http://rabbitmq.1065348.n5.nabble.com/Repairing-a-a-crashed-cluster-td22466.html


But I'm afraid that if the messages were originally published in
non-persistent mode you won't get them back - they would never even have
made it to disc.


Cheers, Simon

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 2 | next ›
Discussion Overview
grouprabbitmq-discuss @
categoriesrabbitmq
postedDec 15, '12 at 8:22p
activeDec 17, '12 at 10:59a
posts2
users2
websiterabbitmq.com
irc#rabbitmq

2 users in discussion

Simon MacMullen: 1 post Aravindh S: 1 post

People

Translate

site design / logo © 2017 Grokbase