Hi,


We are trying to restart our RabbitMQ cluster after an unexpected
environment failure.


We are running:


    - rabbitmq_server-3.1.0 on Windows
    - erl5.10.1


Our cluster is configured like so:


    - web01, web02, web03, web04, web05, app05, app06


During earlier testing we were able to take down any and all of the nodes
with a Windows restart and the cluster would recover. Though, after the
unexpected crash that brought down the entire cluster the rabbit services
will no longer start.


We receive the following error:


*C:\Program Files (x86)\RabbitMQ
Server\rabbitmq_server-3.1.0\sbin>rabbitmq-server.bat*
*
*
* RabbitMQ 3.1.0. Copyright (C) 2007-2013 VMware, Inc.*
* ## ## Licensed under the MPL. See http://www.rabbitmq.com/*
* ## ##*
* ########## Logs: C:/RabbitMQ/log/rabbit at OTLABWEB02.log*
* ###### ## C:/RabbitMQ/log/rabbit at OTLABWEB02-sasl.log*
* ##########*
* Starting broker...*
*
*
*BOOT FAILED*
*===========*
*Timeout contacting cluster nodes: [rabbit at OTLABWEB05,rabbit at OTLABWEB04,*
* rabbit at OTLABWEB03,rabbit at OTLABWEB01,*
* rabbit at OTLABAPP06,rabbit at OTLABAPP05].*
*
*
*DIAGNOSTICS*
*===========*
*nodes in question: [rabbit at OTLABWEB05,rabbit at OTLABWEB04,rabbit at OTLABWEB03,*
* rabbit at OTLABWEB01,rabbit at OTLABAPP06,rabbit at OTLABAPP05]*
*
*
*hosts, their running nodes and ports:*
*- OTLABAPP05: []*
*- OTLABAPP06: []*
*- OTLABWEB01: []*
*- OTLABWEB03: []*
*- OTLABWEB04: []*
*- OTLABWEB05: []*
*
*
*current node details:*
*- node name: rabbit at OTLABWEB02*
*- home dir: U:\*
*- cookie hash: j9x9r680xF6JzFI7IVDLew==*
*
*
*BOOT FAILED*
*===========*
*Error description:*
* {could_not_start,rabbit,*
* {bad_return,*
* {{rabbit,start,[normal,[]]},*
* {'EXIT',*
* {rabbit,failure_during_boot,*
* {error,*
* {timeout_waiting_for_tables,*
*
[rabbit_user,rabbit_user_permission,rabbit_vhost,*
* rabbit_durable_route,rabbit_durable_exchange,*
* rabbit_runtime_parameters,*
* rabbit_durable_queue]}}}}}}}*
*
*
*Log files (may contain more information):*
* C:/RabbitMQ/log/rabbit at OTLABWEB02.log*
* C:/RabbitMQ/log/rabbit at OTLABWEB02-sasl.log*
*
*
*{"init terminating in
do_boot",{rabbit,failure_during_boot,{could_not_start,rabb*
*
it,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{rabbit,failure_during_boot,{
*
*
error,{timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_vho
*
*
st,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit
*
*_durable_queue]}}}}}}}}}*
*
*
*Crash dump was written to: erl_crash.dump*
*init terminating in do_boot ()*




I have attached the log files from web02.


Reading the groups and Googling we have managed to recreate the cluster
before, but at the loss of the queues. We would like to retain our queues
and the information they contained. We hope that this is easy to solve,
since servers do unexpectedly go down. :(


Any help would be greatly appreciated.


Thanks
Brendan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130609/55356de1/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rabbit at OTLABWEB02.log
Type: application/octet-stream
Size: 1994 bytes
Desc: not available
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130609/55356de1/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rabbit at OTLABWEB02-sasl.log
Type: application/octet-stream
Size: 965 bytes
Desc: not available
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130609/55356de1/attachment-0001.obj>

Search Discussions

  • Simon MacMullen at Jun 10, 2013 at 2:15 pm

    On 10/06/13 06:32, Brendan Fry wrote:
    During earlier testing we were able to take down any and all of the
    nodes with a Windows restart and the cluster would recover. Though,
    after the unexpected crash that brought down the entire cluster the
    rabbit services will no longer start.

    We receive the following error:

    /C:\Program Files (x86)\RabbitMQ
    Server\rabbitmq_server-3.1.0\sbin>rabbitmq-server.bat/
    /
    /
    / RabbitMQ 3.1.0. Copyright (C) 2007-2013 VMware, Inc./
    / ## ## Licensed under the MPL. See http://www.rabbitmq.com//
    / ## ##/
    / ########## Logs: C:/RabbitMQ/log/rabbit at OTLABWEB02.log/
    / ###### ## C:/RabbitMQ/log/rabbit at OTLABWEB02-sasl.log/
    / ##########/
    / Starting broker.../
    /
    /
    /BOOT FAILED/
    /===========/
    /Timeout contacting cluster nodes:
    [rabbit at OTLABWEB05,rabbit at OTLABWEB04,/
    /
    rabbit at OTLABWEB03,rabbit at OTLABWEB01,/
    /
    rabbit at OTLABAPP06,rabbit at OTLABAPP05]./

    Hi. When starting a cluster from scratch, RabbitMQ will want the last
    node stopped to be the first node started (since the last node stopped
    may have seen changes that no other node saw).


    So if your nodes were shut down correctly then you would just need to
    make sure you start the last node first (after that you can start them
    in any order). Starting any other node first will lead to an error
    message similar to the one you posted.


    However, if all nodes were shut down abruptly and simultaneously then
    they can all decide that they were not the last one to shut down and
    display this error. In that case, make sure you start all the nodes
    simultaneously (well, within the 30 second timeout anyway).


    Cheers, Simon


    --
    Simon MacMullen
    RabbitMQ, Pivotal

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouprabbitmq-discuss @
categoriesrabbitmq
postedJun 10, '13 at 5:32a
activeJun 10, '13 at 2:15p
posts2
users2
websiterabbitmq.com
irc#rabbitmq

2 users in discussion

Simon MacMullen: 1 post Brendan Fry: 1 post

People

Translate

site design / logo © 2017 Grokbase