Hi RabbitMQ Team,


This morning while I checked the RabbitMQs status through web admin, I
found that one of the RabbitMQ node stopped.


RabbitMQ v2.8.7
Erlang v*R14B04*
*Cluster: Yes, 3 RabbitMQs*


Attached is the log for your reference.


After I restarted the service, then everything back to normal.


I wonder is the problem related to partitioned network:


http://rabbitmq.1065348.n5.nabble.com/Statistics-database-could-not-be-contacted-Message-rates-and-queue-lengths-will-not-be-shown-td22331.html


Thanks & Regards,
Wong
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20121016/85d36ff9/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rabbit.zip
Type: application/zip
Size: 4446 bytes
Desc: not available
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20121016/85d36ff9/attachment.zip>

Search Discussions

  • Simon MacMullen at Oct 16, 2012 at 9:10 am
    Hi. There's nothing in that log to indicate why the node shut down - can
    you post the sasl log somewhere?


    I don't know if it's related to the network partition. But please bear
    in mind that network partitions are really bad for RabbitMQ clusters.


    Cheers, Simon

    On 16/10/12 03:20, Wong Kam Hoong wrote:
    Hi RabbitMQ Team,

    This morning while I checked the RabbitMQs status through web admin, I
    found that one of the RabbitMQ node stopped.

    RabbitMQ v2.8.7
    Erlang v*R14B04*
    *Cluster: Yes, 3 RabbitMQs*

    Attached is the log for your reference.

    After I restarted the service, then everything back to normal.

    I wonder is the problem related to partitioned network:

    http://rabbitmq.1065348.n5.nabble.com/Statistics-database-could-not-be-contacted-Message-rates-and-queue-lengths-will-not-be-shown-td22331.html

    Thanks & Regards,
    Wong


    _______________________________________________
    rabbitmq-discuss mailing list
    rabbitmq-discuss at lists.rabbitmq.com
    https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss



    --
    Simon MacMullen
    RabbitMQ, VMware
  • Wong Kam Hoong at Oct 16, 2012 at 9:32 am
    Hi Simon,


    I check in the sasl log, but it not being update for long time, the issue
    happened at 16-oct-2012 but the latest info showed in the log is only up to
    2-oct-2012.


    Attached is the requested sasl log for the server.


    Yea, I remember you mentioned before RabbitMQ not recommend to run in
    partitioned network, we still waiting network team to tell us whether those
    RabbitMQs is it really deployed in a partitioned network..


    Just curios, how RabbitMQ identify whether the nodes deployed
    in partitioned network? I asked this question so that I can discuss better
    with network team.


    Regards,
    Wong




    On Tue, Oct 16, 2012 at 5:10 PM, Simon MacMullen wrote:

    Hi. There's nothing in that log to indicate why the node shut down - can
    you post the sasl log somewhere?

    I don't know if it's related to the network partition. But please bear in
    mind that network partitions are really bad for RabbitMQ clusters.

    Cheers, Simon

    On 16/10/12 03:20, Wong Kam Hoong wrote:

    Hi RabbitMQ Team,

    This morning while I checked the RabbitMQs status through web admin, I
    found that one of the RabbitMQ node stopped.

    RabbitMQ v2.8.7
    Erlang v*R14B04*
    *Cluster: Yes, 3 RabbitMQs*


    Attached is the log for your reference.

    After I restarted the service, then everything back to normal.

    I wonder is the problem related to partitioned network:

    http://rabbitmq.1065348.n5.**nabble.com/Statistics-**
    database-could-not-be-**contacted-Message-rates-and-**
    queue-lengths-will-not-be-**shown-td22331.html<http://rabbitmq.1065348.n5.nabble.com/Statistics-database-could-not-be-contacted-Message-rates-and-queue-lengths-will-not-be-shown-td22331.html>

    Thanks & Regards,
    Wong


    ______________________________**_________________
    rabbitmq-discuss mailing list
    rabbitmq-discuss at lists.**rabbitmq.com<rabbitmq-discuss@lists.rabbitmq.com>
    https://lists.rabbitmq.com/**cgi-bin/mailman/listinfo/**rabbitmq-discuss<https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss>
    --
    Simon MacMullen
    RabbitMQ, VMware
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20121016/e7af806d/attachment.htm>
    -------------- next part --------------
    A non-text attachment was scrubbed...
    Name: sasl.zip
    Type: application/zip
    Size: 11352 bytes
    Desc: not available
    URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20121016/e7af806d/attachment.zip>
  • Simon MacMullen at Oct 16, 2012 at 10:33 am
    Hmm. The SASL log will not necessarily have to contain anything, it is
    really more of an error log. So I guess there was no error.


    Aha! The log contains:


    =INFO REPORT==== 16-Oct-2012::00:48:36 ===
    Halting Erlang VM


    We only log that after invocation of "rabbitmqctl stop". So the reason
    that node shut down was, umm, someone told it to.


    And regarding network partitions, we get information about that from
    Mnesia. Mnesia will log something like:


    =ERROR REPORT==== 16-Oct-2012::00:04:19 ===
    Mnesia(nplay at app2): ** ERROR ** mnesia_event got
    {inconsistent_database, running_partitioned_network, nplay at web2}


    when it has detected a network partition. Note the
    "running_partitioned_network" - it will also log a very similar message
    with "starting_partitioned_network" the first time it starts *after* a
    partition.


    Future versions of RabbitMQ will make this information more accessible.


    Cheers, Simon

    On 16/10/12 10:32, Wong Kam Hoong wrote:
    Hi Simon,

    I check in the sasl log, but it not being update for long time, the
    issue happened at 16-oct-2012 but the latest info showed in the log is
    only up to 2-oct-2012.

    Attached is the requested sasl log for the server.

    Yea, I remember you mentioned before RabbitMQ not recommend to run in
    partitioned network, we still waiting network team to tell us whether
    those RabbitMQs is it really deployed in a partitioned network..

    Just curios, how RabbitMQ identify whether the nodes deployed
    in partitioned network? I asked this question so that I can discuss
    better with network team.

    Regards,
    Wong


    On Tue, Oct 16, 2012 at 5:10 PM, Simon MacMullen <simon at rabbitmq.com
    wrote:

    Hi. There's nothing in that log to indicate why the node shut down -
    can you post the sasl log somewhere?

    I don't know if it's related to the network partition. But please
    bear in mind that network partitions are really bad for RabbitMQ
    clusters.

    Cheers, Simon


    On 16/10/12 03:20, Wong Kam Hoong wrote:

    Hi RabbitMQ Team,

    This morning while I checked the RabbitMQs status through web
    admin, I
    found that one of the RabbitMQ node stopped.

    RabbitMQ v2.8.7
    Erlang v*R14B04*
    *Cluster: Yes, 3 RabbitMQs*


    Attached is the log for your reference.

    After I restarted the service, then everything back to normal.

    I wonder is the problem related to partitioned network:

    http://rabbitmq.1065348.n5.__nabble.com/Statistics-__database-could-not-be-__contacted-Message-rates-and-__queue-lengths-will-not-be-__shown-td22331.html
    <http://rabbitmq.1065348.n5.nabble.com/Statistics-database-could-not-be-contacted-Message-rates-and-queue-lengths-will-not-be-shown-td22331.html>

    Thanks & Regards,
    Wong


    _________________________________________________
    rabbitmq-discuss mailing list
    rabbitmq-discuss at lists.__rabbitmq.com
    <mailto:rabbitmq-discuss@lists.rabbitmq.com>
    https://lists.rabbitmq.com/__cgi-bin/mailman/listinfo/__rabbitmq-discuss
    <https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss>



    --
    Simon MacMullen
    RabbitMQ, VMware



    --
    Simon MacMullen
    RabbitMQ, VMware
  • Wong Kam Hoong at Oct 17, 2012 at 2:04 am
    Hi Simon,


    I checked the log and noticed every time when start this particular
    RabbitMQ, the log showed some error messages as follow:


    =ERROR REPORT==== 4-Oct-2012::05:46:57 ===
    Mnesia(nplay at app2): ** ERROR ** mnesia_event got {inconsistent_database,
    starting_partitioned_network, nplay at web1}


    =ERROR REPORT==== 4-Oct-2012::05:46:57 ===
    Mnesia(nplay at app2): ** ERROR ** mnesia_event got {inconsistent_database,
    starting_partitioned_network, nplay at web2}


    Why the errors logged? Is it related to my cluster setting?


    This is production environment, I can guaranteed no one dare to execute the
    command "rabbitmqctl stop".


    Regards,
    Wong


    On Tue, Oct 16, 2012 at 6:33 PM, Simon MacMullen wrote:

    Hmm. The SASL log will not necessarily have to contain anything, it is
    really more of an error log. So I guess there was no error.

    Aha! The log contains:

    =INFO REPORT==== 16-Oct-2012::00:48:36 ===
    Halting Erlang VM

    We only log that after invocation of "rabbitmqctl stop". So the reason
    that node shut down was, umm, someone told it to.

    And regarding network partitions, we get information about that from
    Mnesia. Mnesia will log something like:

    =ERROR REPORT==== 16-Oct-2012::00:04:19 ===
    Mnesia(nplay at app2): ** ERROR ** mnesia_event got
    {inconsistent_database, running_partitioned_network, nplay at web2}

    when it has detected a network partition. Note the
    "running_partitioned_network" - it will also log a very similar message
    with "starting_partitioned_network" the first time it starts *after* a
    partition.

    Future versions of RabbitMQ will make this information more accessible.

    Cheers, Simon

    On 16/10/12 10:32, Wong Kam Hoong wrote:

    Hi Simon,

    I check in the sasl log, but it not being update for long time, the
    issue happened at 16-oct-2012 but the latest info showed in the log is
    only up to 2-oct-2012.

    Attached is the requested sasl log for the server.

    Yea, I remember you mentioned before RabbitMQ not recommend to run in
    partitioned network, we still waiting network team to tell us whether
    those RabbitMQs is it really deployed in a partitioned network..

    Just curios, how RabbitMQ identify whether the nodes deployed
    in partitioned network? I asked this question so that I can discuss
    better with network team.

    Regards,
    Wong


    On Tue, Oct 16, 2012 at 5:10 PM, Simon MacMullen <simon at rabbitmq.com
    wrote:

    Hi. There's nothing in that log to indicate why the node shut down -
    can you post the sasl log somewhere?

    I don't know if it's related to the network partition. But please
    bear in mind that network partitions are really bad for RabbitMQ
    clusters.

    Cheers, Simon


    On 16/10/12 03:20, Wong Kam Hoong wrote:

    Hi RabbitMQ Team,

    This morning while I checked the RabbitMQs status through web
    admin, I
    found that one of the RabbitMQ node stopped.

    RabbitMQ v2.8.7
    Erlang v*R14B04*
    *Cluster: Yes, 3 RabbitMQs*


    Attached is the log for your reference.

    After I restarted the service, then everything back to normal.

    I wonder is the problem related to partitioned network:

    http://rabbitmq.1065348.n5.__n**abble.com/Statistics-__**
    database-could-not-be-__**contacted-Message-rates-and-__**
    queue-lengths-will-not-be-__**shown-td22331.html<http://nabble.com/Statistics-__database-could-not-be-__contacted-Message-rates-and-__queue-lengths-will-not-be-__shown-td22331.html>

    <http://rabbitmq.1065348.n5.**nabble.com/Statistics-**
    database-could-not-be-**contacted-Message-rates-and-**
    queue-lengths-will-not-be-**shown-td22331.html<http://rabbitmq.1065348.n5.nabble.com/Statistics-database-could-not-be-contacted-Message-rates-and-queue-lengths-will-not-be-shown-td22331.html>
    Thanks & Regards,
    Wong


    ______________________________**___________________
    rabbitmq-discuss mailing list
    rabbitmq-discuss at lists.__rabbi**tmq.com <http://rabbitmq.com>
    <mailto:rabbitmq-discuss@**lists.rabbitmq.com<rabbitmq-discuss@lists.rabbitmq.com>
    https://lists.rabbitmq.com/__**cgi-bin/mailman/listinfo/__**
    rabbitmq-discuss<https://lists.rabbitmq.com/__cgi-bin/mailman/listinfo/__rabbitmq-discuss>

    <https://lists.rabbitmq.com/**cgi-bin/mailman/listinfo/**
    rabbitmq-discuss<https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss>


    --
    Simon MacMullen
    RabbitMQ, VMware

    --
    Simon MacMullen
    RabbitMQ, VMware
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20121017/06d3e598/attachment.htm>
  • Simon MacMullen at Oct 17, 2012 at 9:40 am

    On 17/10/12 03:04, Wong Kam Hoong wrote:
    Hi Simon,

    I checked the log and noticed every time when start this particular
    RabbitMQ, the log showed some error messages as follow:

    =ERROR REPORT==== 4-Oct-2012::05:46:57 ===
    Mnesia(nplay at app2): ** ERROR ** mnesia_event got {inconsistent_database,
    starting_partitioned_network, nplay at web1}

    =ERROR REPORT==== 4-Oct-2012::05:46:57 ===
    Mnesia(nplay at app2): ** ERROR ** mnesia_event got {inconsistent_database,
    starting_partitioned_network, nplay at web2}

    Why the errors logged? Is it related to my cluster setting?

    That log message ("starting_partitioned_network") means that Mnesia
    detected it was partitioned last time it was up, and is recovering from
    it now (where the recovery is fairly brutal - the node logging that
    message has thrown away any local changes, and has resynchronised with
    whichever nodes it found running).

    This is production environment, I can guaranteed no one dare to execute
    the command "rabbitmqctl stop".

    Well, I can promise that we only log that message "Halting Erlang VM" in
    response to "rabbitmqctl stop". If you don't believe me you can check -
    it's emitted in rabbit:stop_and_halt():


    http://hg.rabbitmq.com/rabbitmq-server/file/rabbitmq_v2_8_x/src/rabbit.erl#l310


    which is only invoked here:


    http://hg.rabbitmq.com/rabbitmq-server/file/rabbitmq_v2_8_x/src/rabbit_control.erl#l161


    - the handler for "rabbitmqctl stop".


    Cheers, Simon


    --
    Simon MacMullen
    RabbitMQ, VMware
  • Wong Kam Hoong at Oct 17, 2012 at 9:58 am
    Hi Simon,


    Usually i used command "/sbin/service rabbitmq-server stop" to stop the
    rabbitmq.


    Is it possible the command invoke the "rabbitmqctl stop"?


    If yes, then it explain the situation.


    Regards,
    Wong


    On Wed, Oct 17, 2012 at 5:40 PM, Simon MacMullen wrote:

    On 17/10/12 03:04, Wong Kam Hoong wrote:

    Hi Simon,

    I checked the log and noticed every time when start this particular
    RabbitMQ, the log showed some error messages as follow:

    =ERROR REPORT==== 4-Oct-2012::05:46:57 ===
    Mnesia(nplay at app2): ** ERROR ** mnesia_event got {inconsistent_database,
    starting_partitioned_network, nplay at web1}

    =ERROR REPORT==== 4-Oct-2012::05:46:57 ===
    Mnesia(nplay at app2): ** ERROR ** mnesia_event got {inconsistent_database,
    starting_partitioned_network, nplay at web2}

    Why the errors logged? Is it related to my cluster setting?
    That log message ("starting_partitioned_**network") means that Mnesia
    detected it was partitioned last time it was up, and is recovering from it
    now (where the recovery is fairly brutal - the node logging that message
    has thrown away any local changes, and has resynchronised with whichever
    nodes it found running).


    This is production environment, I can guaranteed no one dare to execute
    the command "rabbitmqctl stop".
    Well, I can promise that we only log that message "Halting Erlang VM" in
    response to "rabbitmqctl stop". If you don't believe me you can check -
    it's emitted in rabbit:stop_and_halt():

    http://hg.rabbitmq.com/**rabbitmq-server/file/rabbitmq_**
    v2_8_x/src/rabbit.erl#l310<http://hg.rabbitmq.com/rabbitmq-server/file/rabbitmq_v2_8_x/src/rabbit.erl#l310>

    which is only invoked here:

    http://hg.rabbitmq.com/**rabbitmq-server/file/rabbitmq_**
    v2_8_x/src/rabbit_control.erl#**l161<http://hg.rabbitmq.com/rabbitmq-server/file/rabbitmq_v2_8_x/src/rabbit_control.erl#l161>

    - the handler for "rabbitmqctl stop".

    Cheers, Simon


    --
    Simon MacMullen
    RabbitMQ, VMware
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20121017/f758f539/attachment.htm>
  • Simon MacMullen at Oct 17, 2012 at 9:59 am

    On 17/10/12 10:58, Wong Kam Hoong wrote:
    Usually i used command "/sbin/service rabbitmq-server stop" to stop the
    rabbitmq.

    Is it possible the command invoke the "rabbitmqctl stop"?

    If yes, then it explain the situation.

    Oh, yes, it does. Sorry if I didn't make that clear.


    Cheers, Simon


    --
    Simon MacMullen
    RabbitMQ, VMware

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouprabbitmq-discuss @
categoriesrabbitmq
postedOct 16, '12 at 2:20a
activeOct 17, '12 at 9:59a
posts8
users2
websiterabbitmq.com
irc#rabbitmq

2 users in discussion

Simon MacMullen: 4 posts Wong Kam Hoong: 4 posts

People

Translate

site design / logo © 2017 Grokbase