I'm load testing my RabbitMQ setup and I'm having problems I can't get
my head around. After publishing a lot of messages (millions) it seems
like the cluster is put into a bad state. The performance completely
rots and all connections are marked as blocked -- and even when all
queues eventually have been drained (after I've shut down the
producers) and all connections closed at least one of the nodes is
still showing memory usage well above the high watermark. This has
happened multiple times during my testing, and seems completely
Please have a look at this screenshot:
(the screenshot shows the web based management console, zero messages
queued, but very high memory usage for one of the cluster nodes).
At the same time that I took the screenshot the Connections tab showed
no connections. It was several minutes (perhaps ten) since the last
connection was closed.
If I start a producer, or a consumer, at this point the connection is
immediately marked as blocked in the Connections tab, and the message
rate numbers on the Overview tab shows zero, even though my code is
reporting that it's sending thousands of messages (the number of
queued/ready messages increases though).
Removing all queues seems to resolve the problem, but that is not a
feasible workaround. It feels like I should be able to run the cluster
continuously without having to stop and clean up from time to time.
More specifics on the setup: the cluster consists of 4 EC2 instances
with 8 CPUs and 7 Gb RAM each (I forget the exact instance name)
running RabbitMQ 2.4. The producers and consumers are Ruby processed
running the latest RC of the AMQP gem. Each node has three queues
bound to one exchange with a single routing key each, the producers
connect to a random cluster node and publish "hello world" with a
random routing key, so that each message will end up in exactly one
queue. The consumers connect to one of the cluster nodes (one consumer
per cluster node, in this test setup), and subscribe to all of the
queues on that node. The consumers do nothing beside ack the message.
The idea behind the setup is to get load balancing, and high
Before the cluster rots, we publish, deliver and ack 15-20K messages
Thanks in advance for any tips,