Alright, thanks, I'll try to contact you on IRC next time this happens. I updated the gist with some logs from another node, the third node produced *a lot*, so those I have to email you.




On Thursday 6 June 2013 at 18:07, Tim Watson-5 [via RabbitMQ] wrote:

Carl
On 5 Jun 2013, at 13:56, carlhoerberg <[hidden email] (/user/SendEmail.jtp?type=node&node'221&i=0)> wrote:
On a three node cluster, one ec2 machine reboots unexpectedly, and when it
starts up again RabbitMQ fails to start. I've put all logs here:
https://gist.github.com/carlhoerberg/ff6c6bd4f7639bf4b2f5

That seems to contain only the logs from one node, what about the others?
When the troubled node is restarted manually again it's unable to join,
stopping at "adding mirrors", staying there forever.

The other nodes now start to behave weird too, new queues can't be declared,
but existing queues seems to continue deliver messages. They also can't
respond to "rabbitmqctl status", or /api/overview. I'm forced to stop them
with "kill -9". Only when all nodes are stopped the cluster can be brought
up again normally.

If you kill -9 the nodes, it's a bit tricky to get live info for diagnosis, assuming there's nothing in the logs. If the logs are available, please post them. Next time this happens, jump on irc (the #rabbitmq channel on freenode) and we can try a few things to diagnose what's going on. If you can arrange for me to have ssh access to these nodes whilst the symptoms are present, I'll be more likely to solve the issue quickly - we might be able to sign some kind of privacy agreement if necessary.

Also please post your full setup whenever possible, detailing which plugins you're using (if any) and what kind of ha setup you're using.

Cheers,
Tim
_______________________________________________
rabbitmq-discuss mailing list
[hidden email] (/user/SendEmail.jtp?type=node&node'221&i=1)
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss


If you reply to this email, your message will be added to the discussion below: http://rabbitmq.1065348.n5.nabble.com/Node-crash-then-cluster-collapse-tp27206p27221.html
To unsubscribe from Node crash, then cluster collapse, click here (http://rabbitmq.1065348.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node'206&code=Y2FybC5ob2VyYmVyZ0BnbWFpbC5jb218MjcyMDZ8LTEyNDcxMDc4NjM=).
NAML (http://rabbitmq.1065348.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml)













--
View this message in context: http://rabbitmq.1065348.n5.nabble.com/Node-crash-then-cluster-collapse-tp27206p27239.html
Sent from the RabbitMQ mailing list archive at Nabble.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130606/e9b7dc95/attachment.htm>

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 3 | next ›
Discussion Overview
grouprabbitmq-discuss @
categoriesrabbitmq
postedJun 5, '13 at 12:56p
activeJun 6, '13 at 3:28p
posts3
users2
websiterabbitmq.com
irc#rabbitmq

2 users in discussion

Carlhoerberg: 2 posts Tim Watson: 1 post

People

Translate

site design / logo © 2017 Grokbase