This is admittedly a shot in the dark, and with limited repro details, but
asking if somebody else has seen this.


We run our clustered RabbitMQ 3.0.2 brokers in production without restarts
for months at a time. Everything is looking good - No memory leaks, no
out-of-disk-space, etc...


However, in looking at the rabbit@<node>.log and rabbit@<node>-sasl.log
files, I noticed that at some point (e.g. a month earlier), no new entries
were being added to the log file. Normally there's a steady stream of new
connections and connection drops.


I've listed the open file handles from the rabbitmq beam instance and I
still see live file handles for the log files.


Anybody have any idea what might be happening? And better still, how we
might kick things without restarting the broker?


Thanks,


Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130716/b48d2c30/attachment.htm>

Search Discussions

  • Jeffery, Mark at Jul 16, 2013 at 7:19 pm
    Hello


    Assuming Linux, what does ls -latr show in the fd directory in the rabbitmq process's directory under /proc ?




    Matt Pietrek wrote:






    This is admittedly a shot in the dark, and with limited repro details, but asking if somebody else has seen this.


    We run our clustered RabbitMQ 3.0.2 brokers in production without restarts for months at a time. Everything is looking good - No memory leaks, no out-of-disk-space, etc...


    However, in looking at the rabbit@<node>.log and rabbit@<node>-sasl.log files, I noticed that at some point (e.g. a month earlier), no new entries were being added to the log file. Normally there's a steady stream of new connections and connection drops.


    I've listed the open file handles from the rabbitmq beam instance and I still see live file handles for the log files.


    Anybody have any idea what might be happening? And better still, how we might kick things without restarting the broker?


    Thanks,


    Matt




    To read FirstRand Bank's Disclaimer for this email click on the following address or copy into your Internet browser:
    https://www.fnb.co.za/disclaimer.html


    If you are unable to access the Disclaimer, send a blank e-mail to
    firstrandbankdisclaimer at fnb.co.za and we will send you a copy of the Disclaimer.
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130716/354d6c70/attachment.htm>
  • Matt Pietrek at Jul 16, 2013 at 11:21 pm
    Here you go Mark. The file date/times line up with when I last saw logging
    in the files:


    23:14 PROD mpietrek at foomq1:/proc/16050$ sudo ls -latr fd
    total 0
    l-wx------ 1 foobar foobar 64 2013-06-02 11:50 2 -> /foobar/logs/
    foomq1.foo.bar.com/rabbitmq-server.log
    dr-xr-xr-x 7 foobar foobar 0 2013-06-02 11:50 ..
    dr-x------ 2 foobar foobar 0 2013-06-02 11:50 .
    lr-x------ 1 foobar foobar 64 2013-06-02 11:50 9 -> pipe:[122339989]
    l-wx------ 1 foobar foobar 64 2013-06-02 11:50 8 -> /foobar/logs/
    foomq1.foo.bar.com/rabbit at foomq1sasl.log
    l-wx------ 1 foobar foobar 64 2013-06-02 11:50 7 -> /foobar/logs/
    foomq1.foo.bar.com/rabbit at foomq1.log
    lrwx------ 1 foobar foobar 64 2013-06-02 11:50 6 -> socket:[122339982]
    lrwx------ 1 foobar foobar 64 2013-06-02 11:50 5 -> socket:[122339980]
    l-wx------ 1 foobar foobar 64 2013-06-02 11:50 4 -> pipe:[122339970]
    lr-x------ 1 foobar foobar 64 2013-06-02 11:50 3 -> pipe:[122339970]
    lr-x------ 1 foobar foobar 64 2013-06-02 11:50 17 -> socket:[122340310]
    lrwx------ 1 foobar foobar 64 2013-06-02 11:50 16 -> socket:[122340285]
    lrwx------ 1 foobar foobar 64 2013-06-02 11:50 15 ->
    /foobar/var/lib/rabbit at foomq1/msg_store_persistent/397.rdq
    lrwx------ 1 foobar foobar 64 2013-06-02 11:50 14 ->
    /foobar/var/lib/rabbit at foomq1/msg_store_transient/0.rdq
    lrwx------ 1 foobar foobar 64 2013-06-02 11:50 13 ->
    /foobar/var/lib/rabbit at foomq1/queues/D1KLLKOJLVJ4YZGYLP2D32A9W/journal.jif
    l-wx------ 1 foobar foobar 64 2013-06-02 11:50 12 -> pipe:[122339990]
    lrwx------ 1 foobar foobar 64 2013-06-02 11:50 11 ->
    /foobar/var/lib/rabbit at foomq1/LATEST.LOG
    lrwx------ 1 foobar foobar 64 2013-06-02 11:50 10 -> socket:[122339997]
    l-wx------ 1 foobar foobar 64 2013-06-02 11:50 1 -> /foobar/logs/
    foomq1.foo.bar.com/rabbitmq-server.log
    lr-x------ 1 foobar foobar 64 2013-06-02 11:50 0 -> /dev/null
    lrwx------ 1 foobar foobar 64 2013-06-02 11:51 18 -> socket:[122340314]
    l-wx------ 1 foobar foobar 64 2013-06-02 12:28 21 ->
    /foobar/var/lib/rabbit at foomq1/queues/ETWUA9HXDWBIXFTEF0DVH5FBZ/journal.jif
    lr-x------ 1 foobar foobar 64 2013-06-02 12:28 19 ->
    /foobar/var/lib/rabbit at foomq1/queues/22TABTKBIJ6DQCA5GC6EZ0B8L/journal.jif
    l-wx------ 1 foobar foobar 64 2013-06-02 13:12 22 ->
    /foobar/var/lib/rabbit at foomq1/queues/A9AWP930R556ERL81W9PYEELL/journal.jif
    lrwx------ 1 foobar foobar 64 2013-06-02 13:12 20 ->
    /foobar/var/lib/rabbit at foomq1/queues/A8EPKFQ4VCO5A3APTTWYEQRFT/journal.jif




    On Tue, Jul 16, 2013 at 12:19 PM, Jeffery, Mark wrote:

    Hello

    Assuming Linux, what does ls -latr show in the fd directory in the rabbitmq process's directory under /proc ?


    Matt Pietrek wrote:


    This is admittedly a shot in the dark, and with limited repro details,
    but asking if somebody else has seen this.

    We run our clustered RabbitMQ 3.0.2 brokers in production without
    restarts for months at a time. Everything is looking good - No memory
    leaks, no out-of-disk-space, etc...

    However, in looking at the rabbit@<node>.log and rabbit@<node>-sasl.log
    files, I noticed that at some point (e.g. a month earlier), no new entries
    were being added to the log file. Normally there's a steady stream of new
    connections and connection drops.

    I've listed the open file handles from the rabbitmq beam instance and I
    still see live file handles for the log files.

    Anybody have any idea what might be happening? And better still, how we
    might kick things without restarting the broker?

    Thanks,

    Matt

    To read FirstRand Bank's Disclaimer for this email click on the
    following address or copy into your Internet browser:
    https://www.fnb.co.za/disclaimer.html

    ****

    If you are unable to access the Disclaimer, send a blank e-mail to
    firstrandbankdisclaimer at fnb.co.za and we will send you a copy of the
    Disclaimer.****


    _______________________________________________
    rabbitmq-discuss mailing list
    rabbitmq-discuss at lists.rabbitmq.com
    https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130716/435cb5db/attachment.htm>
  • Jeffery, Mark at Jul 17, 2013 at 5:51 am
    Hello


    You mentioned a cluster.
    Is this happening on every box ?


    Also, is the rabbit process running as "foobar" ?


    Jeff


    Matt Pietrek wrote:






    Here you go Mark. The file date/times line up with when I last saw logging in the files:


    23:14 PROD mpietrek at foomq1:/proc/16050$ sudo ls -latr fd
    total 0
    l-wx------ 1 foobar foobar 64 2013-06-02 11:50 2 -> /foobar/logs/foomq1.foo.bar.com/rabbitmq-server.log<http://foomq1.foo.bar.com/rabbitmq-server.log>
    dr-xr-xr-x 7 foobar foobar 0 2013-06-02 11:50 ..
    dr-x------ 2 foobar foobar 0 2013-06-02 11:50 .
    lr-x------ 1 foobar foobar 64 2013-06-02 11:50 9 -> pipe:[122339989]
    l-wx------ 1 foobar foobar 64 2013-06-02 11:50 8 -> /foobar/logs/foomq1.foo.bar.com/rabbit at foomq1sasl.log<http://foomq1.foo.bar.com/rabbit@foomq1sasl.log>
    l-wx------ 1 foobar foobar 64 2013-06-02 11:50 7 -> /foobar/logs/foomq1.foo.bar.com/rabbit at foomq1.log<http://foomq1.foo.bar.com/rabbit@foomq1.log>
    lrwx------ 1 foobar foobar 64 2013-06-02 11:50 6 -> socket:[122339982]
    lrwx------ 1 foobar foobar 64 2013-06-02 11:50 5 -> socket:[122339980]
    l-wx------ 1 foobar foobar 64 2013-06-02 11:50 4 -> pipe:[122339970]
    lr-x------ 1 foobar foobar 64 2013-06-02 11:50 3 -> pipe:[122339970]
    lr-x------ 1 foobar foobar 64 2013-06-02 11:50 17 -> socket:[122340310]
    lrwx------ 1 foobar foobar 64 2013-06-02 11:50 16 -> socket:[122340285]
    lrwx------ 1 foobar foobar 64 2013-06-02 11:50 15 -> /foobar/var/lib/rabbit at foomq1/msg_store_persistent/397.rdq
    lrwx------ 1 foobar foobar 64 2013-06-02 11:50 14 -> /foobar/var/lib/rabbit at foomq1/msg_store_transient/0.rdq
    lrwx------ 1 foobar foobar 64 2013-06-02 11:50 13 -> /foobar/var/lib/rabbit at foomq1/queues/D1KLLKOJLVJ4YZGYLP2D32A9W/journal.jif
    l-wx------ 1 foobar foobar 64 2013-06-02 11:50 12 -> pipe:[122339990]
    lrwx------ 1 foobar foobar 64 2013-06-02 11:50 11 -> /foobar/var/lib/rabbit at foomq1/LATEST.LOG
    lrwx------ 1 foobar foobar 64 2013-06-02 11:50 10 -> socket:[122339997]
    l-wx------ 1 foobar foobar 64 2013-06-02 11:50 1 -> /foobar/logs/foomq1.foo.bar.com/rabbitmq-server.log<http://foomq1.foo.bar.com/rabbitmq-server.log>
    lr-x------ 1 foobar foobar 64 2013-06-02 11:50 0 -> /dev/null
    lrwx------ 1 foobar foobar 64 2013-06-02 11:51 18 -> socket:[122340314]
    l-wx------ 1 foobar foobar 64 2013-06-02 12:28 21 -> /foobar/var/lib/rabbit at foomq1/queues/ETWUA9HXDWBIXFTEF0DVH5FBZ/journal.jif
    lr-x------ 1 foobar foobar 64 2013-06-02 12:28 19 -> /foobar/var/lib/rabbit at foomq1/queues/22TABTKBIJ6DQCA5GC6EZ0B8L/journal.jif
    l-wx------ 1 foobar foobar 64 2013-06-02 13:12 22 -> /foobar/var/lib/rabbit at foomq1/queues/A9AWP930R556ERL81W9PYEELL/journal.jif
    lrwx------ 1 foobar foobar 64 2013-06-02 13:12 20 -> /foobar/var/lib/rabbit at foomq1/queues/A8EPKFQ4VCO5A3APTTWYEQRFT/journal.jif




    On Tue, Jul 16, 2013 at 12:19 PM, Jeffery, Mark <MJeffery at fnb.co.zawrote:


    Hello


    Assuming Linux, what does ls -latr show in the fd directory in the rabbitmq process's directory under /proc ?




    Matt Pietrek <mpietrek at skytap.comwrote:






    This is admittedly a shot in the dark, and with limited repro details, but asking if somebody else has seen this.


    We run our clustered RabbitMQ 3.0.2 brokers in production without restarts for months at a time. Everything is looking good - No memory leaks, no out-of-disk-space, etc...


    However, in looking at the rabbit@<node>.log and rabbit@<node>-sasl.log files, I noticed that at some point (e.g. a month earlier), no new entries were being added to the log file. Normally there's a steady stream of new connections and connection drops.


    I've listed the open file handles from the rabbitmq beam instance and I still see live file handles for the log files.


    Anybody have any idea what might be happening? And better still, how we might kick things without restarting the broker?


    Thanks,


    Matt


    To read FirstRand Bank's Disclaimer for this email click on the following address or copy into your Internet browser:
    https://www.fnb.co.za/disclaimer.html


    If you are unable to access the Disclaimer, send a blank e-mail to firstrandbankdisclaimer at fnb.co.za<mailto:firstrandbankdisclaimer@fnb.co.za> and we will send you a copy of the Disclaimer.


    _______________________________________________
    rabbitmq-discuss mailing list
    rabbitmq-discuss at lists.rabbitmq.com<mailto:rabbitmq-discuss@lists.rabbitmq.com>
    https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss






    To read FirstRand Bank's Disclaimer for this email click on the following address or copy into your Internet browser:
    https://www.fnb.co.za/disclaimer.html


    If you are unable to access the Disclaimer, send a blank e-mail to
    firstrandbankdisclaimer at fnb.co.za and we will send you a copy of the Disclaimer.
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130717/cf7a4534/attachment.htm>
  • Matt Pietrek at Jul 17, 2013 at 5:55 pm
    Jeff,


    The "foobar" in the listing I provided is the result of my search/replace
    obfuscation. Don't want internal server names, account names, etc. revealed
    unnecessarily.


    The logging stopped on only one node of the cluster. The other node
    continued to log as expected.


    Interestingly, on the broker that still logged, I see this message at the
    time of the last log entry of the non-logging machine:


    =INFO REPORT==== 2-Jun-2013::11:50:57 ===
    rabbit on node rabbit at foobar up


    (again, where foobar is obfuscated).


    Digging around some other logs at the time, I see there was a
    mnesia/network split issue just proceeding this. However, the broker now
    looks to be happily a part of the cluster, participating in mirrored
    queues, and with a reported uptime matching that of the "INFO REPORT" above.




    On Tue, Jul 16, 2013 at 10:51 PM, Jeffery, Mark wrote:

    Hello

    You mentioned a cluster.
    Is this happening on every box ?

    Also, is the rabbit process running as "foobar" ?

    Jeff

    Matt Pietrek wrote:


    Here you go Mark. The file date/times line up with when I last saw
    logging in the files:

    23:14 PROD mpietrek at foomq1:/proc/16050$ sudo ls -latr fd
    total 0
    l-wx------ 1 foobar foobar 64 2013-06-02 11:50 2 -> /foobar/logs/
    foomq1.foo.bar.com/rabbitmq-server.log
    dr-xr-xr-x 7 foobar foobar 0 2013-06-02 11:50 ..
    dr-x------ 2 foobar foobar 0 2013-06-02 11:50 .
    lr-x------ 1 foobar foobar 64 2013-06-02 11:50 9 -> pipe:[122339989]
    l-wx------ 1 foobar foobar 64 2013-06-02 11:50 8 -> /foobar/logs/
    foomq1.foo.bar.com/rabbit at foomq1sasl.log
    l-wx------ 1 foobar foobar 64 2013-06-02 11:50 7 -> /foobar/logs/
    foomq1.foo.bar.com/rabbit at foomq1.log
    lrwx------ 1 foobar foobar 64 2013-06-02 11:50 6 -> socket:[122339982]
    lrwx------ 1 foobar foobar 64 2013-06-02 11:50 5 -> socket:[122339980]
    l-wx------ 1 foobar foobar 64 2013-06-02 11:50 4 -> pipe:[122339970]
    lr-x------ 1 foobar foobar 64 2013-06-02 11:50 3 -> pipe:[122339970]
    lr-x------ 1 foobar foobar 64 2013-06-02 11:50 17 -> socket:[122340310]
    lrwx------ 1 foobar foobar 64 2013-06-02 11:50 16 -> socket:[122340285]
    lrwx------ 1 foobar foobar 64 2013-06-02 11:50 15 ->
    /foobar/var/lib/rabbit at foomq1/msg_store_persistent/397.rdq
    lrwx------ 1 foobar foobar 64 2013-06-02 11:50 14 ->
    /foobar/var/lib/rabbit at foomq1/msg_store_transient/0.rdq
    lrwx------ 1 foobar foobar 64 2013-06-02 11:50 13 ->
    /foobar/var/lib/rabbit at foomq1/queues/D1KLLKOJLVJ4YZGYLP2D32A9W/journal.jif
    l-wx------ 1 foobar foobar 64 2013-06-02 11:50 12 -> pipe:[122339990]
    lrwx------ 1 foobar foobar 64 2013-06-02 11:50 11 ->
    /foobar/var/lib/rabbit at foomq1/LATEST.LOG
    lrwx------ 1 foobar foobar 64 2013-06-02 11:50 10 -> socket:[122339997]
    l-wx------ 1 foobar foobar 64 2013-06-02 11:50 1 -> /foobar/logs/
    foomq1.foo.bar.com/rabbitmq-server.log
    lr-x------ 1 foobar foobar 64 2013-06-02 11:50 0 -> /dev/null
    lrwx------ 1 foobar foobar 64 2013-06-02 11:51 18 -> socket:[122340314]
    l-wx------ 1 foobar foobar 64 2013-06-02 12:28 21 ->
    /foobar/var/lib/rabbit at foomq1/queues/ETWUA9HXDWBIXFTEF0DVH5FBZ/journal.jif
    lr-x------ 1 foobar foobar 64 2013-06-02 12:28 19 ->
    /foobar/var/lib/rabbit at foomq1/queues/22TABTKBIJ6DQCA5GC6EZ0B8L/journal.jif
    l-wx------ 1 foobar foobar 64 2013-06-02 13:12 22 ->
    /foobar/var/lib/rabbit at foomq1/queues/A9AWP930R556ERL81W9PYEELL/journal.jif
    lrwx------ 1 foobar foobar 64 2013-06-02 13:12 20 ->
    /foobar/var/lib/rabbit at foomq1/queues/A8EPKFQ4VCO5A3APTTWYEQRFT/journal.jif

    On Tue, Jul 16, 2013 at 12:19 PM, Jeffery, Mark wrote:

    Hello

    Assuming Linux, what does ls -latr show in the fd directory in the rabbitmq process's directory under /proc ?


    Matt Pietrek wrote:


    This is admittedly a shot in the dark, and with limited repro
    details, but asking if somebody else has seen this.

    We run our clustered RabbitMQ 3.0.2 brokers in production without
    restarts for months at a time. Everything is looking good - No memory
    leaks, no out-of-disk-space, etc...

    However, in looking at the rabbit@<node>.log and rabbit@<node>-sasl.log
    files, I noticed that at some point (e.g. a month earlier), no new entries
    were being added to the log file. Normally there's a steady stream of new
    connections and connection drops.

    I've listed the open file handles from the rabbitmq beam instance and I
    still see live file handles for the log files.

    Anybody have any idea what might be happening? And better still, how we
    might kick things without restarting the broker?

    Thanks,

    Matt

    To read FirstRand Bank's Disclaimer for this email click on the
    following address or copy into your Internet browser:
    https://www.fnb.co.za/disclaimer.html

    ****

    If you are unable to access the Disclaimer, send a blank e-mail to
    firstrandbankdisclaimer at fnb.co.za and we will send you a copy of the
    Disclaimer.****


    _______________________________________________
    rabbitmq-discuss mailing list
    rabbitmq-discuss at lists.rabbitmq.com
    https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
    To read FirstRand Bank's Disclaimer for this email click on the
    following address or copy into your Internet browser:
    https://www.fnb.co.za/disclaimer.html

    ****

    If you are unable to access the Disclaimer, send a blank e-mail to
    firstrandbankdisclaimer at fnb.co.za and we will send you a copy of the
    Disclaimer.****


    _______________________________________________
    rabbitmq-discuss mailing list
    rabbitmq-discuss at lists.rabbitmq.com
    https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130717/7fe14220/attachment.htm>
  • Matthias Radestock at Jul 18, 2013 at 2:28 am
    Matt,

    On 17/07/13 18:55, Matt Pietrek wrote:
    Interestingly, on the broker that still logged, I see this message at
    the time of the last log entry of the non-logging machine:

    =INFO REPORT==== 2-Jun-2013::11:50:57 ===
    rabbit on node rabbit at foobar up

    (again, where foobar is obfuscated).

    Digging around some other logs at the time, I see there was a
    mnesia/network split issue just proceeding this. However, the broker now
    looks to be happily a part of the cluster, participating in mirrored
    queues, and with a reported uptime matching that of the "INFO REPORT" above.

    Hmm. I wonder whether the cluster didn't fully heal. Since you a running
    3.0.x, you do not have the new (>=3.1.0) cluster_partition_handling
    strategy setting available to you, so almost certainly have a
    half-formed cluster now. This is also born out by the fact that...

    23:14 PROD mpietrek at foomq1:/proc/16050$ sudo ls -latr fd
    total 0
    l-wx------ 1 foobar foobar 64 2013-06-02 11:50 2 ->
    /foobar/logs/foomq1.foo.bar.com/rabbitmq-server.log
    /foobar/logs/foomq1.foo.bar.com/rabbit at foomq1sasl.log
    l-wx------ 1 foobar foobar 64 2013-06-02 11:50 7 ->
    /foobar/logs/foomq1.foo.bar.com/rabbit at foomq1.log
    /foobar/var/lib/rabbit at foomq1/msg_store_persistent/397.rdq
    /foobar/var/lib/rabbit at foomq1/msg_store_transient/0.rdq

    ...*no* files got written to after 2013-06-02 11:50, not just log files
    but also none of the files associated with storing persistent and paged
    messages.

    how we might kick things without restarting the broker?

    It looks like that node is not participating fully in the cluster, so
    you have little to lose by restarting it. You may in fact have to reset
    it and re-join it to the cluster.


    Regards,


    Matthias.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouprabbitmq-discuss @
categoriesrabbitmq
postedJul 16, '13 at 6:05p
activeJul 18, '13 at 2:28a
posts6
users3
websiterabbitmq.com
irc#rabbitmq

People

Translate

site design / logo © 2017 Grokbase