In order to do this in our setup, we use HAProxy <http://haproxy.1wt.eu/
front of three virtual machines (our systems are all deployed on RackSpace
Cloud VMs). All of our software only knows about the Virtual IP that
points at our HAProxy installation; HAProxy knows about the three VMs, each
of which is running one node of a RabbitMQ cluster. All of our Queues are
created as HA queues so that messages are replicated across all nodes in
the event of a single node failure.
This method allows us to simply add another VM and bring it into the
cluster if we want to expand capacity, and no updates are required to our
code base as it continues to interact with HAProxy for connections. The
trick was to make sure we set everything up properly to automatically
reconnect in our code in the event of a disconnection, as HAProxy times out
connections at an interval determined by your settings.
Having said all that, we have yet to move this into production, so I can't
speak to whether or not it will behave as anticipated in the wild.
However, initial testing is very promising, and when we force a node to
fail, HAProxy automatically detects that it has gone down, and just stops
handing out connections to that node. Just in case anyone complains about
the single point of failure that is HAProxy, that has also been setup to
failover on a Virtual IP; should our main HAProxy machine ever go down, the
other HAProxy instance will take over and continue handing out node
This has been an interesting experiment, feel free to bug me if you want
more details on our setup.