Hi,

Are there any best practices how to achieve broker failover?

We are currently using two clustered nodes with durable queues and
exchanges. The clients are configured to connect to the first node. In the
event that this node dies, I would like both existing consumers as well as
newly started ones to connect to the other node. Are there standard patterns
or recipies to achieve this?

Thanks!
Niko
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20090819/24a3bd28/attachment.htm

Search Discussions

  • Matthew Sackman at Aug 19, 2009 at 2:41 pm
    Hi Niko,
    On Wed, Aug 19, 2009 at 03:06:50PM +0100, Niko Felger wrote:
    Are there any best practices how to achieve broker failover?

    We are currently using two clustered nodes with durable queues and
    exchanges. The clients are configured to connect to the first node. In the
    event that this node dies, I would like both existing consumers as well as
    newly started ones to connect to the other node. Are there standard patterns
    or recipies to achieve this?
    There's nothing standard just yet, but we're getting a lot of interest
    in this area and are working on solutions. Just at the moment the
    situation is as follows:

    Due to the way mnesia works, you can't just transfer the files from one
    machine to another and start the broker up. To make this work, both
    machines must have the same hostname as mnesia records this in the
    database. To solve this, you can just use the nodename of
    rabbit at localhost. However, this prevents you doing clustering, which is
    a shame.

    Therefore, if HA and failover is important to you, we'd recommend the
    following:

    1) Put a simple TCP/IP load balancer in front of the nodes of rabbits,
    but do this only for producers. The load balancer needs to be able to
    dynamically cope with nodes going down, reappearing etc.
    2) For consumers you really want them to all try and consume from all
    the nodes at the same time. They also need to be able to silently cope
    with nodes going down and reappearing. Obviously the exact details of
    this vary between application.
    3) Have a SAN with some shared storage which is not partitioned. All the
    rabbit nodes need access to this.
    4) Use Linux-HA or equiv to do monitoring of your rabbit nodes, and
    start up all the brokers with the nodename of rabbit at localhost

    Now, when a node fails, Linux-HA will notice, and should tell a spare
    node to start up, setting the RABBITMQ_MNESIA_DIR to the location on the
    SAN of the files for the failed node. It should all just start up.

    Obviously, this depends on the reliability and availability of your SAN,
    and the drawbacks of not having clustering available complicate at least
    consumers. However, if HA and failover is more important then this may
    be a tradeoff you're willing to make just at the moment.

    Also, be aware that with this solution, non persistent messages can be
    lost as a node goes down, and even persistent messages which are not
    part of a transaction can also be lost.

    Needless to say, a more comprehensive solution is on our TODO list, but
    may be a little way off just at the moment.

    I hope this helps,

    Matthew
  • Matthew Sackman at Aug 19, 2009 at 2:45 pm
    Niko,
    On Wed, Aug 19, 2009 at 03:41:59PM +0100, Matthew Sackman wrote:
    Therefore, if HA and failover is important to you, we'd recommend the
    following:
    ...

    One further issue with this is that it means really all the nodes need
    to manually be configured the same, in terms of queues, exchanges and
    bindings. As producers don't know which node they're connected to, this
    really demands that:
    a) Every producer can attempt configuration whenever it connects; or
    b) As consumers may need to be connected to every node, they could do
    the configuration, as they're not in front of the load balancer; or
    c) You have some other process that does configuration.

    This is definitely one area where the clustered setup saves you effort
    as all nodes implicitly get configured in the same way.

    Matthew
  • Niko Felger at Aug 20, 2009 at 10:09 am
    Matthew,

    Thanks a lot for all this info!

    Is there a way to achieve some of this in a clustered setup? I guess our
    requirements are not so much HA of the whole messaging subsystem, but rather
    that an as-large-as-possible proportion of messages gets processed
    _eventually_. The scenario I am mainly worried about is when producers
    suddenly cannot publish anymore because the server has gone away and thus
    any messages are lost at that point.

    We tried using a dumb load balancer (in front of both producers and
    consumers) to achieve this, but so far this has caused us more trouble than
    it saved, see here:
    http://www.nabble.com/RabbitMQ-load-balancing-failover-with-LVS-td24683230.html#a24683230

    Thanks!
    niko
    On Wed, Aug 19, 2009 at 15:45, Matthew Sackman wrote:

    Niko,
    On Wed, Aug 19, 2009 at 03:41:59PM +0100, Matthew Sackman wrote:
    Therefore, if HA and failover is important to you, we'd recommend the
    following:
    ...

    One further issue with this is that it means really all the nodes need
    to manually be configured the same, in terms of queues, exchanges and
    bindings. As producers don't know which node they're connected to, this
    really demands that:
    a) Every producer can attempt configuration whenever it connects; or
    b) As consumers may need to be connected to every node, they could do
    the configuration, as they're not in front of the load balancer; or
    c) You have some other process that does configuration.

    This is definitely one area where the clustered setup saves you effort
    as all nodes implicitly get configured in the same way.

    Matthew

    _______________________________________________
    rabbitmq-discuss mailing list
    rabbitmq-discuss at lists.rabbitmq.com
    http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20090820/a89c647a/attachment.htm
  • Matthew Sackman at Aug 20, 2009 at 10:18 am
    Hi Niko,
    On Thu, Aug 20, 2009 at 11:09:32AM +0100, Niko Felger wrote:
    Is there a way to achieve some of this in a clustered setup? I guess our
    requirements are not so much HA of the whole messaging subsystem, but rather
    that an as-large-as-possible proportion of messages gets processed
    _eventually_. The scenario I am mainly worried about is when producers
    suddenly cannot publish anymore because the server has gone away and thus
    any messages are lost at that point.

    We tried using a dumb load balancer (in front of both producers and
    consumers) to achieve this, but so far this has caused us more trouble than
    it saved, see here:
    http://www.nabble.com/RabbitMQ-load-balancing-failover-with-LVS-td24683230.html#a24683230
    Ahh, interesting.

    We do have some suspicions that the failover can be made to work with
    clustering - provided that when the new node comes up it takes over the
    IP / hostname of the failed node, it *might* just work. However, be
    aware this pretty much came out of a 5 minute conversation in the office
    yesterday and we've not even attempted it let alone fully tested it.
    However, we think it might work! :D

    LinuxHA can indeed do MAC address stealing and thus IP etc. So I would
    suggest, if you have the time to spare, you start down that route.

    Matthew
  • Jason J. W. Williams at Aug 20, 2009 at 3:26 pm
    Hey Niko,

    Can you set your SLB to persistent mapping based on client IP. That
    should keep each client on the server they are intially mapped to
    until that server fails.

    That being said, I still believe HA should be done in Rabbit. SLB is
    not the right hammer in my opinion.

    -J

    Sent via iPhone
    On Aug 20, 2009, at 4:09, Niko Felger wrote:

    Matthew,

    Thanks a lot for all this info!

    Is there a way to achieve some of this in a clustered setup? I guess
    our requirements are not so much HA of the whole messaging
    subsystem, but rather that an as-large-as-possible proportion of
    messages gets processed _eventually_. The scenario I am mainly
    worried about is when producers suddenly cannot publish anymore
    because the server has gone away and thus any messages are lost at
    that point.

    We tried using a dumb load balancer (in front of both producers and
    consumers) to achieve this, but so far this has caused us more
    trouble than it saved, see here: http://www.nabble.com/RabbitMQ-load-balancing-failover-with-LVS-td24683230.html#a24683230

    Thanks!
    niko

    On Wed, Aug 19, 2009 at 15:45, Matthew Sackman wrote:
    Niko,
    On Wed, Aug 19, 2009 at 03:41:59PM +0100, Matthew Sackman wrote:
    Therefore, if HA and failover is important to you, we'd recommend the
    following:
    ...

    One further issue with this is that it means really all the nodes need
    to manually be configured the same, in terms of queues, exchanges and
    bindings. As producers don't know which node they're connected to,
    this
    really demands that:
    a) Every producer can attempt configuration whenever it connects; or
    b) As consumers may need to be connected to every node, they could do
    the configuration, as they're not in front of the load balancer; or
    c) You have some other process that does configuration.

    This is definitely one area where the clustered setup saves you effort
    as all nodes implicitly get configured in the same way.

    Matthew

    _______________________________________________
    rabbitmq-discuss mailing list
    rabbitmq-discuss at lists.rabbitmq.com
    http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

    _______________________________________________
    rabbitmq-discuss mailing list
    rabbitmq-discuss at lists.rabbitmq.com
    http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20090820/599f0b93/attachment.htm

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouprabbitmq-discuss @
categoriesrabbitmq
postedAug 19, '09 at 2:06p
activeAug 20, '09 at 3:26p
posts6
users3
websiterabbitmq.com
irc#rabbitmq

People

Translate

site design / logo © 2017 Grokbase