Hello


We have a cluster of 6 Rabbit Nodes.
All disk nodes.


One keeps crashing (around 3 times a day), the others are fine.


From what I can see, it restarts itself and carries on.


The last time however, it didn?t come up because it said ?already running?.


When the process was killed manually, the entire cluster came down ?


I cannot interpret the crash report.


I have attached it.


Can anyone see what the problem is ?


To read FirstRand Bank's Disclaimer for this email click on the following address or copy into your Internet browser:
https://www.fnb.co.za/disclaimer.html


If you are unable to access the Disclaimer, send a blank e-mail to
firstrandbankdisclaimer at fnb.co.za and we will send you a copy of the Disclaimer.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: rabbit.txt
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130422/d58e8e61/attachment.txt>

Search Discussions

  • Simon MacMullen at Apr 22, 2013 at 10:04 am
    Which version of RabbitMQ are you running?


    Also: is there nothing in the logs before these {error,not_found}
    reports? It looks like a side effect of something else already having
    gone wrong. Could you post the complete logs (regular and sasl ones)
    somewhere?


    Cheers, Simon

    On 22/04/13 07:26, Jeffery, Mark wrote:
    Hello

    We have a cluster of 6 Rabbit Nodes.
    All disk nodes.

    One keeps crashing (around 3 times a day), the others are fine.

    From what I can see, it restarts itself and carries on.

    The last time however, it didn?t come up because it said ?already running?.

    When the process was killed manually, the entire cluster came down ?

    I cannot interpret the crash report.

    I have attached it.

    Can anyone see what the problem is ?

    To read FirstRand Bank's Disclaimer for this email click on the following address or copy into your Internet browser:
    https://www.fnb.co.za/disclaimer.html

    If you are unable to access the Disclaimer, send a blank e-mail to
    firstrandbankdisclaimer at fnb.co.za and we will send you a copy of the Disclaimer.



    _______________________________________________
    rabbitmq-discuss mailing list
    rabbitmq-discuss at lists.rabbitmq.com
    https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss



    --
    Simon MacMullen
    RabbitMQ, VMware
  • Jeffery, Mark at Apr 22, 2013 at 10:23 am
    Thanks.


    It is Version 2.8.4


    Attached are the logs.


    I really appreciate any help.


    Jeff


    -----Original Message-----
    From: Simon MacMullen [mailto:simon at rabbitmq.com]
    Sent: Monday, April 22, 2013 12:05 PM
    To: Discussions about RabbitMQ
    Cc: Jeffery, Mark
    Subject: Re: [rabbitmq-discuss] Rabbit crash


    Which version of RabbitMQ are you running?


    Also: is there nothing in the logs before these {error,not_found} reports? It looks like a side effect of something else already having gone wrong. Could you post the complete logs (regular and sasl ones) somewhere?


    Cheers, Simon

    On 22/04/13 07:26, Jeffery, Mark wrote:
    Hello

    We have a cluster of 6 Rabbit Nodes.
    All disk nodes.

    One keeps crashing (around 3 times a day), the others are fine.

    From what I can see, it restarts itself and carries on.

    The last time however, it didn?t come up because it said ?already running?.

    When the process was killed manually, the entire cluster came down ?

    I cannot interpret the crash report.

    I have attached it.

    Can anyone see what the problem is ?

    To read FirstRand Bank's Disclaimer for this email click on the following address or copy into your Internet browser:
    https://www.fnb.co.za/disclaimer.html

    If you are unable to access the Disclaimer, send a blank e-mail to
    firstrandbankdisclaimer at fnb.co.za and we will send you a copy of the Disclaimer.



    _______________________________________________
    rabbitmq-discuss mailing list
    rabbitmq-discuss at lists.rabbitmq.com
    https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss



    --
    Simon MacMullen
    RabbitMQ, VMware


    To read FirstRand Bank's Disclaimer for this email click on the following address or copy into your Internet browser:
    https://www.fnb.co.za/disclaimer.html


    If you are unable to access the Disclaimer, send a blank e-mail to
    firstrandbankdisclaimer at fnb.co.za and we will send you a copy of the Disclaimer.
    -------------- next part --------------
    A non-text attachment was scrubbed...
    Name: logs.tar.gz
    Type: application/x-gzip
    Size: 583245 bytes
    Desc: logs.tar.gz
    URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130422/94ad5276/attachment.bin>
  • Simon MacMullen at Apr 22, 2013 at 12:27 pm
    Hi. That's a reasonably old version of RabbitMQ; quite a few bugs in HA
    have been fixed since then. So an upgrade is recommended. Having said
    that there is nothing with the exact stack trace from your logs but it's
    possible that it's a side effect of something else that we have fixed.


    Cheers, Simon

    On 22/04/13 11:23, Jeffery, Mark wrote:
    Thanks.

    It is Version 2.8.4

    Attached are the logs.

    I really appreciate any help.

    Jeff

    -----Original Message-----
    From: Simon MacMullen [mailto:simon at rabbitmq.com]
    Sent: Monday, April 22, 2013 12:05 PM
    To: Discussions about RabbitMQ
    Cc: Jeffery, Mark
    Subject: Re: [rabbitmq-discuss] Rabbit crash

    Which version of RabbitMQ are you running?

    Also: is there nothing in the logs before these {error,not_found} reports? It looks like a side effect of something else already having gone wrong. Could you post the complete logs (regular and sasl ones) somewhere?

    Cheers, Simon
    On 22/04/13 07:26, Jeffery, Mark wrote:
    Hello

    We have a cluster of 6 Rabbit Nodes.
    All disk nodes.

    One keeps crashing (around 3 times a day), the others are fine.

    From what I can see, it restarts itself and carries on.

    The last time however, it didn?t come up because it said ?already running?.

    When the process was killed manually, the entire cluster came down ?

    I cannot interpret the crash report.

    I have attached it.

    Can anyone see what the problem is ?

    To read FirstRand Bank's Disclaimer for this email click on the following address or copy into your Internet browser:
    https://www.fnb.co.za/disclaimer.html

    If you are unable to access the Disclaimer, send a blank e-mail to
    firstrandbankdisclaimer at fnb.co.za and we will send you a copy of the Disclaimer.



    _______________________________________________
    rabbitmq-discuss mailing list
    rabbitmq-discuss at lists.rabbitmq.com
    https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

    --
    Simon MacMullen
    RabbitMQ, VMware

    To read FirstRand Bank's Disclaimer for this email click on the following address or copy into your Internet browser:
    https://www.fnb.co.za/disclaimer.html

    If you are unable to access the Disclaimer, send a blank e-mail to
    firstrandbankdisclaimer at fnb.co.za and we will send you a copy of the Disclaimer.



    --
    Simon MacMullen
    RabbitMQ, VMware
  • Jeffery, Mark at Apr 22, 2013 at 1:38 pm
    Ok, thanks.


    Upgrading our live boxes will be a lengthy process.


    Is there anything I can do in the meantime ?


    Where can I look ? Is it network related ? OS related ?


    I will schedule the upgrade so long.


    -----Original Message-----
    From: Simon MacMullen [mailto:simon at rabbitmq.com]
    Sent: Monday, April 22, 2013 2:27 PM
    To: Jeffery, Mark
    Cc: Discussions about RabbitMQ
    Subject: Re: [rabbitmq-discuss] Rabbit crash


    Hi. That's a reasonably old version of RabbitMQ; quite a few bugs in HA have been fixed since then. So an upgrade is recommended. Having said that there is nothing with the exact stack trace from your logs but it's possible that it's a side effect of something else that we have fixed.


    Cheers, Simon

    On 22/04/13 11:23, Jeffery, Mark wrote:
    Thanks.

    It is Version 2.8.4

    Attached are the logs.

    I really appreciate any help.

    Jeff

    -----Original Message-----
    From: Simon MacMullen [mailto:simon at rabbitmq.com]
    Sent: Monday, April 22, 2013 12:05 PM
    To: Discussions about RabbitMQ
    Cc: Jeffery, Mark
    Subject: Re: [rabbitmq-discuss] Rabbit crash

    Which version of RabbitMQ are you running?

    Also: is there nothing in the logs before these {error,not_found} reports? It looks like a side effect of something else already having gone wrong. Could you post the complete logs (regular and sasl ones) somewhere?

    Cheers, Simon
    On 22/04/13 07:26, Jeffery, Mark wrote:
    Hello

    We have a cluster of 6 Rabbit Nodes.
    All disk nodes.

    One keeps crashing (around 3 times a day), the others are fine.

    From what I can see, it restarts itself and carries on.

    The last time however, it didn?t come up because it said ?already running?.

    When the process was killed manually, the entire cluster came down ?

    I cannot interpret the crash report.

    I have attached it.

    Can anyone see what the problem is ?

    To read FirstRand Bank's Disclaimer for this email click on the following address or copy into your Internet browser:
    https://www.fnb.co.za/disclaimer.html

    If you are unable to access the Disclaimer, send a blank e-mail to
    firstrandbankdisclaimer at fnb.co.za and we will send you a copy of the Disclaimer.



    _______________________________________________
    rabbitmq-discuss mailing list
    rabbitmq-discuss at lists.rabbitmq.com
    https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

    --
    Simon MacMullen
    RabbitMQ, VMware

    To read FirstRand Bank's Disclaimer for this email click on the following address or copy into your Internet browser:
    https://www.fnb.co.za/disclaimer.html

    If you are unable to access the Disclaimer, send a blank e-mail to
    firstrandbankdisclaimer at fnb.co.za and we will send you a copy of the Disclaimer.



    --
    Simon MacMullen
    RabbitMQ, VMware


    To read FirstRand Bank's Disclaimer for this email click on the following address or copy into your Internet browser:
    https://www.fnb.co.za/disclaimer.html


    If you are unable to access the Disclaimer, send a blank e-mail to
    firstrandbankdisclaimer at fnb.co.za and we will send you a copy of the Disclaimer.
  • Jeffery, Mark at Apr 23, 2013 at 8:00 am
    Hello


    Sorry, one last question.


    Could the fact that all the nodes (6) are disk nodes be the reason that the entire cluster came to a halt ?


    Should I reduce the number of disk nodes ?


    Thanks


    Jeff
    ________________________________________
    From: Simon MacMullen [simon at rabbitmq.com]
    Sent: 22 April 2013 02:27 PM
    To: Jeffery, Mark
    Cc: Discussions about RabbitMQ
    Subject: Re: [rabbitmq-discuss] Rabbit crash


    Hi. That's a reasonably old version of RabbitMQ; quite a few bugs in HA
    have been fixed since then. So an upgrade is recommended. Having said
    that there is nothing with the exact stack trace from your logs but it's
    possible that it's a side effect of something else that we have fixed.


    Cheers, Simon

    On 22/04/13 11:23, Jeffery, Mark wrote:
    Thanks.

    It is Version 2.8.4

    Attached are the logs.

    I really appreciate any help.

    Jeff

    -----Original Message-----
    From: Simon MacMullen [mailto:simon at rabbitmq.com]
    Sent: Monday, April 22, 2013 12:05 PM
    To: Discussions about RabbitMQ
    Cc: Jeffery, Mark
    Subject: Re: [rabbitmq-discuss] Rabbit crash

    Which version of RabbitMQ are you running?

    Also: is there nothing in the logs before these {error,not_found} reports? It looks like a side effect of something else already having gone wrong. Could you post the complete logs (regular and sasl ones) somewhere?

    Cheers, Simon
    On 22/04/13 07:26, Jeffery, Mark wrote:
    Hello

    We have a cluster of 6 Rabbit Nodes.
    All disk nodes.

    One keeps crashing (around 3 times a day), the others are fine.

    From what I can see, it restarts itself and carries on.

    The last time however, it didn?t come up because it said ?already running?.

    When the process was killed manually, the entire cluster came down ?

    I cannot interpret the crash report.

    I have attached it.

    Can anyone see what the problem is ?

    To read FirstRand Bank's Disclaimer for this email click on the following address or copy into your Internet browser:
    https://www.fnb.co.za/disclaimer.html

    If you are unable to access the Disclaimer, send a blank e-mail to
    firstrandbankdisclaimer at fnb.co.za and we will send you a copy of the Disclaimer.



    _______________________________________________
    rabbitmq-discuss mailing list
    rabbitmq-discuss at lists.rabbitmq.com
    https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

    --
    Simon MacMullen
    RabbitMQ, VMware

    To read FirstRand Bank's Disclaimer for this email click on the following address or copy into your Internet browser:
    https://www.fnb.co.za/disclaimer.html

    If you are unable to access the Disclaimer, send a blank e-mail to
    firstrandbankdisclaimer at fnb.co.za and we will send you a copy of the Disclaimer.



    --
    Simon MacMullen
    RabbitMQ, VMware


    To read FirstRand Bank's Disclaimer for this email click on the following address or copy into your Internet browser:
    https://www.fnb.co.za/disclaimer.html


    If you are unable to access the Disclaimer, send a blank e-mail to
    firstrandbankdisclaimer at fnb.co.za and we will send you a copy of the Disclaimer.
  • Tim Watson at Apr 23, 2013 at 8:21 am
    Hi

    On 23 Apr 2013, at 09:00, "Jeffery, Mark" wrote:

    Could the fact that all the nodes (6) are disk nodes be the reason that the entire cluster came to a halt ?

    That should not cause the entire cluster to come to a halt, as it's usually quite reasonable to stop an start clustered nodes (or join/leave a cluster) at runtime.


    Examin the logs from all nodes around the time of the incident might yield some useful hints, but as Simon pointed out, it's most liked that the best initial course of action is to upgrade.


    Cheers,
    Tim
  • Jeffery, Mark at Jul 16, 2013 at 6:53 am
    Just for information.


    An upgrade to 3.0.4 solved this problem for us.


    Thanks !




    -----Original Message-----
    From: Simon MacMullen [mailto:simon at rabbitmq.com]
    Sent: Monday, April 22, 2013 2:27 PM
    To: Jeffery, Mark
    Cc: Discussions about RabbitMQ
    Subject: Re: [rabbitmq-discuss] Rabbit crash


    Hi. That's a reasonably old version of RabbitMQ; quite a few bugs in HA have been fixed since then. So an upgrade is recommended. Having said that there is nothing with the exact stack trace from your logs but it's possible that it's a side effect of something else that we have fixed.


    Cheers, Simon

    On 22/04/13 11:23, Jeffery, Mark wrote:
    Thanks.

    It is Version 2.8.4

    Attached are the logs.

    I really appreciate any help.

    Jeff

    -----Original Message-----
    From: Simon MacMullen [mailto:simon at rabbitmq.com]
    Sent: Monday, April 22, 2013 12:05 PM
    To: Discussions about RabbitMQ
    Cc: Jeffery, Mark
    Subject: Re: [rabbitmq-discuss] Rabbit crash

    Which version of RabbitMQ are you running?

    Also: is there nothing in the logs before these {error,not_found} reports? It looks like a side effect of something else already having gone wrong. Could you post the complete logs (regular and sasl ones) somewhere?

    Cheers, Simon
    On 22/04/13 07:26, Jeffery, Mark wrote:
    Hello

    We have a cluster of 6 Rabbit Nodes.
    All disk nodes.

    One keeps crashing (around 3 times a day), the others are fine.

    From what I can see, it restarts itself and carries on.

    The last time however, it didn?t come up because it said ?already running?.

    When the process was killed manually, the entire cluster came down ?

    I cannot interpret the crash report.

    I have attached it.

    Can anyone see what the problem is ?

    To read FirstRand Bank's Disclaimer for this email click on the following address or copy into your Internet browser:
    https://www.fnb.co.za/disclaimer.html

    If you are unable to access the Disclaimer, send a blank e-mail to
    firstrandbankdisclaimer at fnb.co.za and we will send you a copy of the Disclaimer.



    _______________________________________________
    rabbitmq-discuss mailing list
    rabbitmq-discuss at lists.rabbitmq.com
    https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

    --
    Simon MacMullen
    RabbitMQ, VMware

    To read FirstRand Bank's Disclaimer for this email click on the following address or copy into your Internet browser:
    https://www.fnb.co.za/disclaimer.html

    If you are unable to access the Disclaimer, send a blank e-mail to
    firstrandbankdisclaimer at fnb.co.za and we will send you a copy of the Disclaimer.



    --
    Simon MacMullen
    RabbitMQ, VMware


    To read FirstRand Bank's Disclaimer for this email click on the following address or copy into your Internet browser:
    https://www.fnb.co.za/disclaimer.html


    If you are unable to access the Disclaimer, send a blank e-mail to
    firstrandbankdisclaimer at fnb.co.za and we will send you a copy of the Disclaimer.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouprabbitmq-discuss @
categoriesrabbitmq
postedApr 22, '13 at 6:26a
activeJul 16, '13 at 6:53a
posts8
users3
websiterabbitmq.com
irc#rabbitmq

People

Translate

site design / logo © 2017 Grokbase