I'm having a couple problems with the management plugin on
2.2.0/Erlang 13B/Ubuntu 10.04.

First, I have several independent rabbitmq clusters, but they all seem
to wind up reporting back to the same stats database. I assume this
is some Erlang network service advertising feature, but I'd much
rather have each cluster use its own separate stats db. How can I
force that?

Second, it keeps dying. That is, the nodes are running, but the
statistics database goes down, and I have to restart the node that
contains it to get the management API back. I'm seeing messages like
this in the log:

=CRASH REPORT==== 18-Jan-2011::17:12:02 ===
crasher:
initial call: rabbit_mgmt_db:init/1
pid: <0.289.0>
registered_name: []
exception exit: {{badmatch,[]},
[{rabbit_mgmt_db,augment_queue_pid,2},
{rabbit_mgmt_db,augment,4},
{rabbit_mgmt_db,'-augment/3-lc$^0/1-0-',3},
{rabbit_mgmt_db,'-augment/3-lc$^0/1-0-',3},
{rabbit_mgmt_db,augment,3},
{rabbit_mgmt_db,'-handle_call/3-lc$^4/1-4-',4},
{rabbit_mgmt_db,'-handle_call/3-fun-2-',4},
{rabbit_mgmt_db,handle_call,3}]}
in function gen_server:terminate/6
ancestors: [<0.288.0>,rabbit_sup,<0.122.0>]
messages: []
links: [<0.288.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 377
stack_size: 24
reductions: 2235592
neighbours:

Can anyone advise as to next steps in troubleshooting?



--
Mark J. Reed <markjreed at gmail.com>

Search Discussions

  • Simon MacMullen at Jan 19, 2011 at 11:54 am

    On 18/01/11 23:20, Mark J. Reed wrote:
    I'm having a couple problems with the management plugin on
    2.2.0/Erlang 13B/Ubuntu 10.04.

    First, I have several independent rabbitmq clusters, but they all seem
    to wind up reporting back to the same stats database. I assume this
    is some Erlang network service advertising feature, but I'd much
    rather have each cluster use its own separate stats db. How can I
    force that?
    Gosh, that's quite unexpected (and almost certainly the source of your
    second problem). How are these clusters set up? Do nodes from any two
    clusters end up on the same machine?
    Second, it keeps dying. That is, the nodes are running, but the
    statistics database goes down, and I have to restart the node that
    contains it to get the management API back. I'm seeing messages like
    this in the log:
    Unfortunately there's a bug in released versions of the management
    plugin that can cause the stats database to die if it sees details from
    a queue it can't find in Mnesia (which would certainly happen in your case).

    This is fixed on default, and if you can build from source (either
    default of *everything*, or tag rabbitmq_v2_2_0 of rabbitmq-management
    with the following patch applied)

    http://hg.rabbitmq.com/rabbitmq-management/diff/802c2abe4387/src/rabbit_mgmt_db.erl

    then this will fix *that* problem, but I'd really like to figure out how
    the stats database is getting shared across clusters.

    Cheers, Simon

    --
    Simon MacMullen
    Staff Engineer, RabbitMQ
    SpringSource, a division of VMware
  • Mark J. Reed at Jan 19, 2011 at 5:40 pm

    On Wed, Jan 19, 2011 at 6:54 AM, Simon MacMullen wrote:
    First, I have several independent rabbitmq clusters, but they all seem
    to wind up reporting back to the same stats database. ?I assume this
    is some Erlang network service advertising feature, but I'd much
    rather have each cluster use its own separate stats db. ?How can I
    force that?
    Gosh, that's quite unexpected (and almost certainly the source of your
    second problem). How are these clusters set up? Do nodes from any two
    clusters end up on the same machine?
    No.

    The clusters are set up via /etc/rabbitmq/rabbitmq.config files
    listing the hosts that belong in a cluster together. There are three
    different clusters. Well, one is a "cluster" of one node on one host,
    but regardless. I've verified that the config files are correct; the
    nodes have all been reset (actually, the mnesia directory deleted and
    rabbitmq-server package purged and reinstalled).

    They all came up in their proper clusters.

    dev1: stats db on dev1

    refq1 and refq2: stats db on refq1

    prod3q1, prod3q2, prods1q1, and prods1q2: stats db on prod3q1

    However, then I got these messages on prods1q1:

    =WARNING REPORT==== 19-Jan-2011::12:18:16 ===
    global: rabbit at prods1q1 failed to connect to rabbit at dev1

    =WARNING REPORT==== 19-Jan-2011::12:18:16 ===
    global: rabbit at prods1q1 failed to connect to rabbit at refq1

    =INFO REPORT==== 19-Jan-2011::12:18:32 ===
    application: mnesia
    exited: stopped
    type: permanent


    I tried restarting s1q1 but it hung on "starting management statistics
    database". After 5 minutes I killed the start process, nuked the
    mnesia directory and tried again. It came up fine. But now the prod
    cluster is using the dev box as its statistics database.

    So I don't know where the crosstalk is coming from.


    --
    Mark J. Reed <markjreed at gmail.com>
  • Matthias Radestock at Jan 19, 2011 at 6:11 pm
    Mark,
    On 19/01/11 17:40, Mark J. Reed wrote:
    all came up in their proper clusters.

    dev1: stats db on dev1

    refq1 and refq2: stats db on refq1

    prod3q1, prod3q2, prods1q1, and prods1q2: stats db on prod3q1

    However, then I got these messages on prods1q1:

    =WARNING REPORT==== 19-Jan-2011::12:18:16 ===
    global: rabbit at prods1q1 failed to connect to rabbit at dev1

    =WARNING REPORT==== 19-Jan-2011::12:18:16 ===
    global: rabbit at prods1q1 failed to connect to rabbit at refq1

    [...]
    I don't know where the crosstalk is coming from.
    I am pretty sure you are the first person trying to run multiple
    independent rabbit clusters on the same machine.

    To do that you'll probably have to change some settings for each of the
    clusters. See http://www.erlang.org/doc/man/epmd.html

    ERL_EPMD_PORT
    This environment variable can contain the port number epmd will use. The
    default port will work fine in most cases. A different port can be
    specified to allow several instances of epmd, representing independent
    clusters of nodes, to co-exist on the same host. All nodes in a cluster
    must use the same epmd port number.


    Giving each cluster a different erlang cookie might be sufficient too,
    though I haven't tried that.


    Regards,

    Matthias.
  • Mark J. Reed at Jan 19, 2011 at 6:38 pm

    On Wed, Jan 19, 2011 at 1:11 PM, Matthias Radestock wrote:
    Mark,
    I don't know where the crosstalk is coming from.
    I am pretty sure you are the first person trying to run multiple independent
    rabbit clusters on the same machine.
    They're not on the same machine!

    There are seven machines, grouped into three clusters: one of four
    nodes, one of two nodes, and one standalone. Nowhere do I have even
    two nodes on the same machine, much less two clusters.

    --
    Mark J. Reed <markjreed at gmail.com>
  • Matthias Radestock at Jan 19, 2011 at 6:43 pm
    Mark,
    On 19/01/11 18:38, Mark J. Reed wrote:
    There are seven machines, grouped into three clusters: one of four
    nodes, one of two nodes, and one standalone. Nowhere do I have even
    two nodes on the same machine, much less two clusters.
    Interesting. Are you using rabbitmqctl across machines from different
    clusters? That might be enough for them to find out about each other.

    Try my suggestions from the previous email, i.e. use different cookies
    or ERL_EPMD_PORT settings for each cluster.


    Matthias.
  • Mark J. Reed at Jan 20, 2011 at 3:33 pm

    On Wed, Jan 19, 2011 at 1:43 PM, Matthias Radestock wrote:
    Interesting. Are you using rabbitmqctl across machines from different
    clusters? That might be enough for them to find out about each other.
    Nope.

    The clusters were all using the same Erlang cookie; that's the only
    connection between them. Giving them distinct cookie values has
    resolved the issue.

    I thought the cookie was just an authentication key; I didn't realize
    that the stats module went door-to-door trying the key out and walking
    through whichever ones opened. :)

    Anyway, my issues are resolved. Thanks!

    --
    Mark J. Reed <markjreed at gmail.com>

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouprabbitmq-discuss @
categoriesrabbitmq
postedJan 18, '11 at 11:20p
activeJan 20, '11 at 3:33p
posts7
users3
websiterabbitmq.com
irc#rabbitmq

People

Translate

site design / logo © 2021 Grokbase