While avoiding the current technical details of queues and exchanges
and routing so as not to bias the discussion, we have an application
where we want this to happen:

1. A producer publishes a single message.

2. N copies of that message get distributed to N consumers.

3. These N consumers are selected from a total of M consumers, each of
which has declared which of the N sets of messages it is interested in
receiving.

That's it. Basically, there are N "topics", and each message gets
published to all of those topics - but without the producer having to
do N publishes, and with only one consumable copy of each message per
topic.

Our initial design had a single exchange with N queues bound to it;
each of the M consumers then subscribes to one of the N queues.
Simple.

But not reliable. The problem is that a queue lives on a single node,
and if that node goes down, the queue is gone. It's not the loss of
data from the queue - no big deal in our use case - but the fact that
the queue itself becomes unusable within the cluster, and that entire
set of consumers stop working.

We don't need durable queues or persistent messages and we certainly
don't want to get into Pacemaker; I don't care if some messages are
lost. I just want the overall operation to survive the loss of any
given node in the cluster. That makes me think that queues are the
wrong entity to use for the "topics".

We could do N exchanges, where each node has its own local queue bound
to one of them (and use exchange-to-exchange binding to set up a
single exchange for the producer to publish to), but then *all* the
nodes would get copies of all the messages sent to their "topic",
instead of just one.

So I'm kind of at a loss as to how best to fit our workflow into the
RabbitMQ/AMQP model in a moderately resilient fashion. Any
suggestions appreciated.

--
Mark J. Reed <markjreed at gmail.com>

Search Discussions

  • Matthias Radestock at Mar 3, 2011 at 7:24 am
    Mark,

    Mark J. Reed wrote:
    That's it. Basically, there are N "topics", and each message gets
    published to all of those topics - but without the producer having to
    do N publishes, and with only one consumable copy of each message per
    topic.

    Our initial design had a single exchange with N queues bound to it;
    each of the M consumers then subscribes to one of the N queues.
    Simple.
    How do you avoid the producer having to publish the message N times in
    this setting? I am asking because there is a new feature appearing in
    the next release that addresses that very problem :)
    But not reliable. The problem is that a queue lives on a single node,
    and if that node goes down, the queue is gone. It's not the loss of
    data from the queue - no big deal in our use case - but the fact that
    the queue itself becomes unusable within the cluster, and that entire
    set of consumers stop working.
    When a node goes down, taking all its queues with it, the clients
    connected to that node and consuming from the queues will have their
    connections torn down. They would notice that and could simply reconnect
    and redeclare the queues & bindings & their consumers.

    The problem you are describing arises for clients that are connected to
    the other, surviving, nodes. They will have no idea that anything is
    amiss and that their consumers (in the AMQP sense, i.e. the things
    declared by basic.consume) are no longer functional.

    We have yet another new feature in development which will inform clients
    when their consumers have been cancelled by the server, e.g. when the
    queue from which they were consuming has been deleted or destroyed.

    In your setup, when a client receives that notification it could
    redeclare the queue & bindings & their consumers.

    Until that feature makes it into a release, as a stop gap clients could
    periodically check whether the queue they are consuming from still
    exist, e.g. by issuing passive queue.declares.


    Regards,

    Matthias.
  • Mark J. Reed at Mar 3, 2011 at 2:29 pm

    On Thu, Mar 3, 2011 at 2:24 AM, Matthias Radestock wrote:
    Our initial design had a single exchange with N queues bound to it;
    each of the M consumers then subscribes to one of the N queues.
    Simple.
    How do you avoid the producer having to publish the message N times in this
    setting? I am asking because there is a new feature appearing in the next
    release that addresses that very problem :)
    Fanout exchange.
    When a node goes down, taking all its queues with it, the clients connected
    to that node and consuming from the queues will have their connections torn
    down. They would notice that and could simply reconnect and redeclare the
    queues & bindings & their consumers.
    The problem I'm seeing is that the surviving nodes seem to remember that
    the queue was declared on the dead node and fail when attempting to
    declare it again.
  • Matthias Radestock at Mar 3, 2011 at 2:32 pm
    Mark,
    On 03/03/11 14:29, Mark J. Reed wrote:
    On Thu, Mar 3, 2011 at 2:24 AM, Matthias Radestock
    How do you avoid the producer having to publish the message N times in this
    setting?
    Fanout exchange.
    Ah, I didn't realise that each message goes to the same set of "topics".
    Yes, fanout is fine for that.
    The problem I'm seeing is that the surviving nodes seem to remember that
    the queue was declared on the dead node and fail when attempting to
    declare it again.
    That should only happen when the queue is durable. Given that losing the
    odd message isn't a concern for you, it would make sense to declare the
    queues as non-durable.

    Matthias.
  • Mark J. Reed at Mar 3, 2011 at 2:39 pm

    On Thu, Mar 3, 2011 at 9:32 AM, Matthias Radestock wrote:
    The problem I'm seeing is that the surviving nodes seem to remember that
    the queue was declared on the dead node and fail when attempting to
    declare it again.
    That should only happen when the queue is durable. Given that losing the odd
    message isn't a concern for you, it would make sense to declare the queues
    as non-durable.
    Hm; I'm not declaring the queue as durable. Let me dig up some
    detailed error/log output...



    --
    Mark J. Reed <markjreed at gmail.com>
  • Mark J. Reed at Mar 3, 2011 at 10:02 pm

    On Thu, Mar 3, 2011 at 9:32 AM, Matthias Radestock
    Hm; I'm not declaring the queue as durable. Let me dig up some
    detailed error/log output...
    OK, I tried a very simple test case. Am I hitting a bug, or am I
    wrong about how this should work?

    I have a two-node cluster, both disk nodes, freshly reset. I declare
    a queue on one and publish to it, then pop the message off on the
    other. Then I bring the node with the queue down and try to declare
    the queue on the other node. This triggers an internal error (HTTP
    541 response) and causes Rabbit to close the connection to the client
    that tried to declare the queue.

    Transcript with interleaved commentary and log contents below (sorry
    if I should have attached them instead; not sure what the guidelines
    are for this list).

    This is 2.3.1 running on Ubuntu 10.04 x86_64.

    ubuntu at rabbit1:~$ sudo rabbitmqctl status
    Status of node rabbit at rabbit1 ...
    [{running_applications,
    [{rabbit,"RabbitMQ","2.3.1"},
    {mnesia,"MNESIA CXC 138 12","4.4.12"},
    {os_mon,"CPO CXC 138 46","2.2.4"},
    {rabbit_management,"RabbitMQ Management Console","2.3.1"},
    {webmachine,"webmachine","1.7.0"},
    {rabbit_management_agent,"RabbitMQ Management Agent","2.3.1"},
    {amqp_client,"RabbitMQ AMQP Client","2.3.1"},
    {sasl,"SASL CXC 138 11","2.1.8"},
    {rabbit_mochiweb,"RabbitMQ Mochiweb Embedding","2.3.1"},
    {mochiweb,"MochiMedia Web Server","1.3"},
    {inets,"INETS CXC 138 49","5.2"},
    {stdlib,"ERTS CXC 138 10","1.16.4"},
    {kernel,"ERTS CXC 138 10","2.13.4"}]},
    {nodes,[{disc,[rabbit at rabbit2,rabbit at rabbit1]}]},
    {running_nodes,[rabbit at rabbit2,rabbit at rabbit1]}]
    ...done.
    ubuntu at rabbit1:~$ sudo rabbitmqctl list_queues
    Listing queues ...
    ...done.

    ubuntu at rabbit2:~$ sudo rabbitmqctl status
    Status of node rabbit at rabbit2 ...
    [{running_applications,
    [{rabbit,"RabbitMQ","2.3.1"},
    {mnesia,"MNESIA CXC 138 12","4.4.12"},
    {os_mon,"CPO CXC 138 46","2.2.4"},
    {rabbit_management,"RabbitMQ Management Console","2.3.1"},
    {webmachine,"webmachine","1.7.0"},
    {rabbit_management_agent,"RabbitMQ Management Agent","2.3.1"},
    {amqp_client,"RabbitMQ AMQP Client","2.3.1"},
    {sasl,"SASL CXC 138 11","2.1.8"},
    {rabbit_mochiweb,"RabbitMQ Mochiweb Embedding","2.3.1"},
    {mochiweb,"MochiMedia Web Server","1.3"},
    {inets,"INETS CXC 138 49","5.2"},
    {stdlib,"ERTS CXC 138 10","1.16.4"},
    {kernel,"ERTS CXC 138 10","2.13.4"}]},
    {nodes,[{disc,[rabbit at rabbit2,rabbit at rabbit1]}]},
    {running_nodes,[rabbit at rabbit1,rabbit at rabbit2]}]
    ...done.
    ubuntu at rabbit2:~$ sudo rabbitmqctl list_queues
    Listing queues ...
    ...done.

    Declare and publish to a new queue:

    ubuntu at rabbit1:~$ irb -rubygems -rbunny
    irb(main):001:0> (b=Bunny.new).start
    => :connected
    irb(main):002:0> b.queue('jiggleflop').publish("hello")
    => nil
    irb(main):003:0> exit
    ubuntu at rabbit1:~$ sudo rabbitmqctl list_queues name messages durable
    Listing queues ...
    jiggleflop 1 false
    ...done.
  • Matthias Radestock at Mar 3, 2011 at 10:11 pm
    Mark,

    Mark J. Reed wrote:
    OK, I tried a very simple test case. Am I hitting a bug, or am I
    wrong about how this should work?

    I have a two-node cluster, both disk nodes, freshly reset. I declare
    a queue on one and publish to it, then pop the message off on the
    other. Then I bring the node with the queue down and try to declare
    the queue on the other node. This triggers an internal error (HTTP
    541 response) and causes Rabbit to close the connection to the client
    that tried to declare the queue.
    Ah yes. Same bug as
    http://old.nabble.com/Re%3A-Pika-reconnection-error-p30975002.html

    Sorry, forgot about that one.

    Matthias
  • Mark J. Reed at Mar 3, 2011 at 10:46 pm

    On Thu, Mar 3, 2011 at 5:11 PM, Matthias Radestock wrote:
    Ah yes. Same bug as
    http://old.nabble.com/Re%3A-Pika-reconnection-error-p30975002.html
    OK, good, it's a bug. So does that mean that, once that bug is fixed,
    my app will work as designed? Or will there still be problems with
    clients not noticing that they need to redeclare the queue?



    --
    Mark J. Reed <markjreed at gmail.com>
  • Matthias Radestock at Mar 3, 2011 at 11:03 pm
    Mark,

    Mark J. Reed wrote:
    OK, good, it's a bug. So does that mean that, once that bug is fixed,
    my app will work as designed? Or will there still be problems with
    clients not noticing that they need to redeclare the queue?
    Until we introduce the new feature I mentioned there is no way for
    clients to find out when their consumers become ineffective - whether
    that is because a queue got deleted or the node on which it resided went
    down, or any other reason.

    Regards,

    Matthias.
  • Matthew Sackman at Mar 3, 2011 at 11:38 pm

    On Thu, Mar 03, 2011 at 11:03:21PM +0000, Matthias Radestock wrote:
    Mark J. Reed wrote:
    OK, good, it's a bug. So does that mean that, once that bug is fixed,
    my app will work as designed? Or will there still be problems with
    clients not noticing that they need to redeclare the queue?
    Until we introduce the new feature I mentioned there is no way for
    clients to find out when their consumers become ineffective -
    whether that is because a queue got deleted or the node on which it
    resided went down, or any other reason.
    But that feature has been written, is pending a bit of tidying, and then
    should make it for the next release.

    Matthew
  • Mark J. Reed at Mar 3, 2011 at 11:54 pm

    On Thu, Mar 3, 2011 at 6:03 PM, Matthias Radestock wrote:
    Until we introduce the new feature I mentioned there is no way for clients
    to find out when their consumers become ineffective - whether that is
    because a queue got deleted or the node on which it resided went down, or
    any other reason.
    So basically, if a consumer declares queue A, then the consumer blocks
    on receiving a message from that queue, and then that queue goes away
    before a message comes along, the consumer will not wake up. Got it.

    Thanks for the replies. Very helpful.

    --
    Mark J. Reed <markjreed at gmail.com>

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouprabbitmq-discuss @
categoriesrabbitmq
postedMar 2, '11 at 11:35p
activeMar 3, '11 at 11:54p
posts11
users3
websiterabbitmq.com
irc#rabbitmq

People

Translate

site design / logo © 2022 Grokbase