Here at work, we've been having a spirited debate about the RabbitMQ
descriptions on how HA/mirrored durable queues work. I apologize in advance
if this seems like I'm splitting hairs below, but our team have some very
specific questions regarding the exact wording. While nobody doubts the end
result of mirroring, the mechanics of *exactly* how this happen are the
subject of debate here. As the designated point person for RabbitMQ in our
company, I'm responsible for driving these questions to closure.

First, I'll mention that we've read and re-read "Highly Available Queues"
and "High Availability in RabbitMQ: solving part of the puzzle" multiple
times.

http://www.rabbitmq.com/ha.html
http://www.rabbitmq.com/blog/2011/10/25/high-availability-in-rabbitmq-solvin
g-part-of-the-puzzle/

The crux of the questions come down to these quotes:

--
"The slaves apply the operations that occur to the master in exactly the
same order as the master and thus maintain the same state. All actions other
than publishes go only to the master, and the master then broadcasts the
effect of the actions to the slaves. Thus clients consuming from a mirrored
queue are in fact consuming from the master."

and

"messages published to a mirrored-queue are always published directly to the
master and all slaves."
--

One interpretation of the 2nd quote above ("published directly to...") is
that clients are responsible for writing their messages to the master *as
well as* to all slaves. Perhaps there's some mechanism where at connect
time, the master sends a list of servers to the connecting client, and the
client writes to all instances in the list.

Other people have a different interpretation of how this works, which is
that publishes to *any* broker instances are forwarded by the receiving
slave to the master instance, which in turn pushes the publish requests to
all slaves. (Of course, if the master received the message initially,
there's no need for forwarding.)

Simple question: Which of the above interpretations is correct?

Follow up question: What is the exact flow for a publish that goes to a
slave? Does the slave do anything other than push the message to the master,
which in turn handles the message as if it came to the master in the first
place? Perhaps it writes the message to the local store first, then pushes
it to the master? (Like I said, spirited debate.)

Thanks much for your help,

Matt



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120206/3e6e9dd3/attachment.htm>

Search Discussions

  • Emile Joubert at Feb 7, 2012 at 5:18 pm
    Hi Matt,
    On 06/02/12 19:27, Matt Pietrek wrote:
    "The slaves apply the operations that occur to the master in exactly the
    same order as the master and thus maintain the same state. All actions
    other than publishes go only to the master, and the master then
    broadcasts the effect of the actions to the slaves. Thus clients
    consuming from a mirrored queue are in fact consuming from the master."

    and

    "messages published to a mirrored-queue are always published directly to
    the master and all slaves."
    --

    One interpretation of the 2nd quote above ("published directly to...")
    is that clients are responsible for writing their messages to the master
    *as well as* to all slaves. Perhaps there's some mechanism where at
    connect time, the master sends a list of servers to the connecting
    client, and the client writes to all instances in the list.
    No, that won't work because the number of slaves can change over the
    lifetime of the queue. Publishing clients are not required to alter
    their behaviour if their messages are routed to mirrored queues.
    Other people have a different interpretation of how this works, which is
    that publishes to *any* broker instances are forwarded by the receiving
    slave to the master instance, which in turn pushes the publish requests
    to all slaves. (Of course, if the master received the message initially,
    there's no need for forwarding.)

    Simple question: Which of the above interpretations is correct?
    Neither. RabbitMQ uses a separate Erlang process per channel and this
    channel process is responsible for sending publish messages to each
    slave as well as the master. The slaves and master make use of a
    separate fault-tolerant framework (Guaranteed Multicast) to communicate.
    The master uses GM to communicate all messages (including publish
    messages) to slaves. Slaves therefore receive publish messages from the
    channel as well as GM, and all other messages from GM only.
    Follow up question: What is the exact flow for a publish that goes to a
    slave? Does the slave do anything other than push the message to the
    master, which in turn handles the message as if it came to the master in
    the first place? Perhaps it writes the message to the local store first,
    then pushes it to the master? (Like I said, spirited debate.)
    Slaves do not push messages to the master. Slaves simply mimic the
    actions of the master in order to have an identical copy of the queue.
    The messages from the master over GM impose a consistent ordering across
    all the queues.


    The sources contain some further detail which may be of interest:
    http://hg.rabbitmq.com/rabbitmq-server/file/default/src/rabbit_mirror_queue_coordinator.erl
    http://hg.rabbitmq.com/rabbitmq-server/file/default/src/gm.erl



    -Emile
  • Matthew Sackman at Feb 7, 2012 at 5:41 pm
    Hi Matt,
    On Tue, Feb 07, 2012 at 05:18:54PM +0000, Emile Joubert wrote:
    Neither. RabbitMQ uses a separate Erlang process per channel and this
    channel process is responsible for sending publish messages to each
    slave as well as the master.
    Indeed. It's actually no different from a message being published to an
    exchange which then routes the message to several different queues - the
    channel process on the broker is responsible for finding out which
    queues the message is destined for and forwarding the message to all
    those queues. In the case of a mirrored queue, the expansion step to go
    from "queue name" to "queue process ID" returns several process IDs.
    The slaves and master make use of a
    separate fault-tolerant framework (Guaranteed Multicast) to communicate.
    The master uses GM to communicate all messages (including publish
    messages) to slaves. Slaves therefore receive publish messages from the
    channel as well as GM, and all other messages from GM only.
    ...and the purpose of this is as follows.

    Because you can have multiple channels publishing messages to the same
    mirrored queue at the same time, there is the possibility that different
    members of the mirrored queue see the publishes in different orders when
    they receive the publishes directly from the channel processes. This
    will not do - the messages *must* be in the same order in all members of
    the mirrored queue. This is why publishes *also* go via GM - the master
    pushes each publish onto GM and the slaves received that and use it to
    derive the correct order.

    But, if you *only* had publishes being sent to the master and then the
    master pushes them via GM to all the slaves, then, in the event of the
    death of the master, there's a window of time before any of the slaves
    notice the death of the master during which there could be in-flight
    publishes going from the channels to the old master which will be lost -
    the master is dead so will not be able to process those publishes and
    push them onto GM.

    So as a result, publishes go via both routes - directly to all the
    members of the mirrored queue to ensure that no publishes ever get lost,
    and secondly via GM, pushed by the master, so that the slaves can
    actually enqueue the messages in the right order.
    Indeed. In general, it's much more complex than you can imagine. For
    example, a publishing client that's using publisher confirms could be
    publishing to a mirrored queue. Each confirm will only be issued once
    all members of the mirrored queue have received *and* correctly
    enqueued the message. And even if the master and any-but-not-all slaves
    fail, the code will ensure that not only are no messages lost, but all
    the confirms will still be correctly issued, assuming both the node to
    which the client is connected survives and at least one member of the
    mirrored queue survives.

    Most if not all of the complexity arises from a) we do almost everything
    regarding publishing asynchronously to take advantage of parallelism and
    ensure performance, but this means some nodes could fall a long way
    behind others; and b) failures and births can occur at any time and we
    try pretty hard to cope transparently with almost everything. And some
    of it we even get right...

    Matthew

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouprabbitmq-discuss @
categoriesrabbitmq
postedFeb 6, '12 at 7:27p
activeFeb 7, '12 at 5:41p
posts3
users3
websiterabbitmq.com
irc#rabbitmq

People

Translate

site design / logo © 2022 Grokbase