We've been using Rabbit successfully for about a year. Recently have
upgraded to v2.6.1, because we want to use clusters with replicated
message queues.

My testing has hit a puzzling behavior that smells like a Rabbit bug
to me. The test that uncovers this is working with a two-node cluster.
Both nodes are running v2.6.1. Both nodes have disk. Both nodes are
running on Mac OS, though I doubt this is pertinent.

I'm also running Alice on the node that runs the test. The test uses
it to programmatically do a stop_app on one of the nodes, because the
test is trying to validate that if the cluster master fails, and a
slave is elevated to take its place, that we don't lose messages.

So, the test has a small thread pool, which is given tasks that
periodically 1) publish messages, and 2) toggle the state of the
Rabbit master node (stopped if running; started if stopped). Other
threads are consuming messages from queues.

I'm using publisher confirms, and I'm also acknowledging the messages
in the consumers (using autoAck=false for channel.basicConsume()).

When the master node is stopped, I see both the producers and
consumers catching ShutdownSignalException. They handle this by
attempting to reconnect to the cluster. This works fine. When
reconnected, they continue with their business.

Sometimes, what I see is that a consumer has successfully fetched a
message from the broker, and is calling channel.basicAck() when it
gets that ShutdownSignalException.

Later, when the consumer has reconnected, it again pulls down the same
message. (The message bodies are tagged with a UUID, so I know it is
the same one.) This time, when the consumer attempts to basicAck() the
message, it again gets ShutdownSignalException, but this one has the
following text in it: "reply-text=PRECONDITION_FAILED - unknown
delivery tag 7".

In fact, that is the same delivery tag that was offered to the
consumer by the broker before the master went down and the consumer
reconnected.

Googling suggests that this event means that the consumer is
attempting to ack the same message more than once.

But, how can this be so? If the first ack succeeded, then the message
should have been removed from the broker's queues, and the consumer
shouldn't see the same message again.

Yet, if the first ack did not succeed, then the consumer shouldn't be
dinged for attempting to re-ack the message.

Anyone seen this before? It smells like a bug in Rabbit's replicated
queues to me, but I'm still new to Rabbit, and so am willing to
believe there's a subtlety here in consuming from a clustered broker
that I haven't yet grokked!

Thanks, --Steve

Search Discussions

  • Matthias Radestock at Sep 30, 2011 at 5:59 pm
    Steve,
    On 30/09/11 18:31, Steve Rehrauer wrote:
    Later, when the consumer has reconnected, it again pulls down the same
    message. (The message bodies are tagged with a UUID, so I know it is
    the same one.) This time, when the consumer attempts to basicAck() the
    message, it again gets ShutdownSignalException, but this one has the
    following text in it: "reply-text=PRECONDITION_FAILED - unknown
    delivery tag 7".
    Delivery tags are allocated by server-side channel processes; nothing to
    do with queues and hence certainly nothing to do with replication.
    In fact, that is the same delivery tag that was offered to the
    consumer by the broker before the master went down and the consumer
    reconnected.
    The symptoms you are describing are most likely cause by some misrouting
    in the app or, less likely, the rabbitmq client library.

    Are you 100% sure that you are issuing the channel.basicAck on the
    channel on which the message was received, particularly when there is
    some reconnect going on?

    Connecting via the tracer (http://www.rabbitmq.com/examples.html#tracer)
    might shed some light on what's going on.


    Regards,

    Matthias.
  • Steve Rehrauer at Sep 30, 2011 at 6:15 pm

    On Fri, Sep 30, 2011 at 1:59 PM, Matthias Radestock wrote:

    Delivery tags are allocated by server-side channel processes; nothing to do
    with queues and hence certainly nothing to do with replication.
    Thanks for the clarification!
    The symptoms you are describing are most likely cause by some misrouting in
    the app or, less likely, the rabbitmq client library.

    Are you 100% sure that you are issuing the channel.basicAck on the channel
    on which the message was received, particularly when there is some reconnect
    going on?
    Positive about the same channel being used, with the caveat that my
    consumer client is reconnecting, so of course I get a new channel on
    the second basicAck.
    Connecting via the tracer (http://www.rabbitmq.com/examples.html#tracer)
    might shed some light on what's going on.
    Thanks, I'll try that!

    --Steve

    Regards,

    Matthias.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouprabbitmq-discuss @
categoriesrabbitmq
postedSep 30, '11 at 5:31p
activeSep 30, '11 at 6:15p
posts3
users2
websiterabbitmq.com
irc#rabbitmq

People

Translate

site design / logo © 2022 Grokbase