We've been using Rabbit successfully for about a year. Recently have
upgraded to v2.6.1, because we want to use clusters with replicated
My testing has hit a puzzling behavior that smells like a Rabbit bug
to me. The test that uncovers this is working with a two-node cluster.
Both nodes are running v2.6.1. Both nodes have disk. Both nodes are
running on Mac OS, though I doubt this is pertinent.
I'm also running Alice on the node that runs the test. The test uses
it to programmatically do a stop_app on one of the nodes, because the
test is trying to validate that if the cluster master fails, and a
slave is elevated to take its place, that we don't lose messages.
So, the test has a small thread pool, which is given tasks that
periodically 1) publish messages, and 2) toggle the state of the
Rabbit master node (stopped if running; started if stopped). Other
threads are consuming messages from queues.
I'm using publisher confirms, and I'm also acknowledging the messages
in the consumers (using autoAck=false for channel.basicConsume()).
When the master node is stopped, I see both the producers and
consumers catching ShutdownSignalException. They handle this by
attempting to reconnect to the cluster. This works fine. When
reconnected, they continue with their business.
Sometimes, what I see is that a consumer has successfully fetched a
message from the broker, and is calling channel.basicAck() when it
gets that ShutdownSignalException.
Later, when the consumer has reconnected, it again pulls down the same
message. (The message bodies are tagged with a UUID, so I know it is
the same one.) This time, when the consumer attempts to basicAck() the
message, it again gets ShutdownSignalException, but this one has the
following text in it: "reply-text=PRECONDITION_FAILED - unknown
delivery tag 7".
In fact, that is the same delivery tag that was offered to the
consumer by the broker before the master went down and the consumer
Googling suggests that this event means that the consumer is
attempting to ack the same message more than once.
But, how can this be so? If the first ack succeeded, then the message
should have been removed from the broker's queues, and the consumer
shouldn't see the same message again.
Yet, if the first ack did not succeed, then the consumer shouldn't be
dinged for attempting to re-ack the message.
Anyone seen this before? It smells like a bug in Rabbit's replicated
queues to me, but I'm still new to Rabbit, and so am willing to
believe there's a subtlety here in consuming from a clustered broker
that I haven't yet grokked!