(cc-ing rabbitmq-discuss so that other folks who've been around Rabbit longer than I have can chime in as well).
Thanks for the additional motivation... I think variations on this request have come up in discussion before, so let's keep some other eyes on the topic...
So in this system I'm developing, a message from a persistent queue like RabbitMQ will trigger a computation across the cluster, creating potentially many messages and involving many machines. I don't want to ack the message until all the computations it has triggered has finished. I have a way of detecting this efficiently and reliably, and it's a different machine than the one that read the message off the queue that will know the message is ready to be acked.
Out of curiosity what are you using as your distributed coordination mechanism over the other machines and processes involved in your cluster? I wonder if perhaps there's some information available to or through it that your Rabbit-facing processes might be able to use straightforwardly...
If there's a failure at any point (like an exception or machine going down), the system will never detect that the computations for the message has finished. So I want to handle this by putting a time limit on how long the system has to process the message. This is where setting timeouts on acking would come in, and it would be very useful.
Gotcha... I follow where you're going...
What I'm going to have to do to get RabbitMQ viable as a source for this system is handle acking/timeouts on the node that reads from RabbitMQ. When the downstream node detects that the message is finished, it will have to send a message to the reader node to ack the message. The reader node will also have to cache messages in memory and have a separate thread explicitly reject messages that are in the cache for too long. Not terrible, but would be nice if this was native in RabbitMQ.
Right. As stated your design sketch sounds totally viable although, as you say, it does put work into your application that one might prefer to have done elsewhere... :-)
You get correctness out of the deal in that your intermediate reader process that owns the channel on which the original do-work message was de-queued will make the message available for later consumption by someone else if it dies, but of course it has to implement its own safekeeping and timeout logic for deciding when to give up on things. Via basic.reject it also gets the option of throwing the message back to be handled again later or condemning it to oblivion.
On Fri, Dec 10, 2010 at 9:55 AM, Jerry Kuch <jerryk at vmware.comwrote:
On Dec 9, 2010, at 5:21 PM, nathanmarz wrote:
Let's just say I'm working on the "Cascalog for stream-processing",
although the semantics of the tool will be quite different of course.
The goal is to be able to do distributed stream processing with the
details of fault tolerance and messaging abstracted away. The rest I'm
keeping secret for now :)
Very interesting... I look forward to seeing it develop!
Alternatively, is it possible to ack messages out of order from how
you read them off the queue?
It is indeed.
In AMQP, acking a message tells the broker that you have received it and taken responsibility for its contents, and that the broker no longer has to take any measures to ensure its integrity or existence.
When you take a message off a queue, say with 'basic.get', it still exists on the server, but is unavailable to other consumers until such time as you 'basic.ack' it. If you basic.ack the message, the message is removed from the server. If your channel or connection dies, the message is again made available for other consumers.
If you pulled a smattering of messages from a queue and dispatched them to separate workers in your application, and your workers finished and sent acks in some order other than the order in which the messages were enqueued, there's no problem at all.
Also, is it possible to set a timeout on
acking a message, so that if a message isn't acked within X secs it's
automatically considered failed and scheduled for redelivery?
Such a mechanism doesn't currently exist. In practice, this usually won't be a problem since failure to ack (modulo the case of a consumer that's alive but paralyzed for some reason) is typically the result of network troubles or client death, both of which will result in the connection being closed. If the connection closes, the un-acked message on the server becomes available for other clients to get or consume.
Are there specific aspects of your desired use case that make such a timeout-waiting-for-ack super important?