Hey list,

Curious for feedback, as this design idea I have would possibly extend
to several places in our application, and I'd love to make it generic
enough to do so.

The basic problem:

We have a queue of work to be done, represented as a named queue. For
redundancy, we have 2->many processes (in Erlang) that will read from
the queue and do the work. Easy so far.

However, we also have a list of resources needed to process the work
queue, which needs to be shared across all worker processes. When one
worker retrieves a job from the work queue, it must also get the list of
available resources, match resource to job, and return the list of
resources (with the used resource flagged as such) back to a globally
accessible place (hoping to avoid the DB, but it may be a good-enough
sync point).

So, for example, there are four jobs, two that require a paper shredder,
two that require a wood chipper. There is only one paper shredder and
one wood chipper. There are also 10 workers available to actually shred
stuff.

Worker one consumes the list of available equipment (one paper shredder,
one wood chipper), and then pulls the first job (paper to be shredded).
Worker one grabs the paper shredder, returns the list of resources
having flagged the paper shredder as in use, and ACKs the job. When
worker one finishes shredding paper, it will wait until the resources
list is available and update it with 1 more paper shredder available.

Once a job is actually started, I don't care if the worker crashes or
not; that will be handled later. And I'll have a supervising gen_server
pull the resource list, add the resource its using to its state, and
spawn the actual worker (catching exits to return the resource). I'm
sure other measures will be needed to ensure the resource is added back
eventually, but I'm trying to get the basic idea down.

So once worker one is done retrieving a resource, worker two can next
get the resources list and the next job. If the job is paper shredding,
worker two will nack the job (as there are no paper shredders at the
moment) and return the resource list. I will eventually want to support
being able to requeue the job to the back of the job queue, but for now,
work stalls until the first item in the queue can be processed successfully.

So while the resources list is out of queue, no workers will consume
jobs from the work queue. As soon as it reappears in queue, I want the
next worker in line to grab it and try to process the next job.


Some initial concerns:

No resources available for the job at the front of the queue means a lot
of fetching and nacking of jobs, and fetching/returning the resource list.

Losing the resources list: if whatever node (our Erlang VMs only
communicate via AMQP) goes down while handling the resources list, it
will obviously never be republished. We do usually have a timeout window
for the worker to get a resource for a job (sometimes the actual finding
of the resource takes a while), so if I could coordinate all workers
consuming from the resources_available queue, I could know after timeout
+ a small fudge factor, that someone else should have begun processing.
So maybe the resources_available queue is more a try_to_process_a_job
queue, and each worker process maintains an identical list of resources
available, with a worker that pairs a job to resource publishing the
resource's in-use status.

So perhaps the solution is:

Jobs queue round robins amongst all workers, but only one worker can
fetch/(n)ack. A worker can only fetch from the jobs queue by consuming a
"next" message from a shared coordination queue. All workers publish
updates to resources available to a routing key so all workers'
resources state is in sync. If a worker can't find a resource for a job,
it can (configurable) 1) nack the job and send a 'next' to the
coordination queue; 2) requeue the job at the end of the jobs queue and
send a 'next'; 3) sit on the job (meaning no other worker can fetch from
the jobs queue) until an appropriate resource comes available.

Concerns here:

Starting/recovering the 'next' token.

I'm sure others but am realizing this is getting long.

Sorry for the wall of text and stream of consciousness writing. We have
a lot of places where this pattern is showing up and would love to get a
module or library written to ease doing this kind of coordination. Any
ideas or directions towards appropriate messaging patterns warmly welcome :)

- --
James Aimonetti
Distributed Systems Engineer / DJ MC_

2600hz | http://2600hz.com
sip:james at 2600hz.com
tel: 415.886.7905

Search Discussions

  • Jerry Kuch at Feb 28, 2012 at 7:53 pm
    Hi, James:

    Some of this appears to go beyond messaging per se and into more
    general distributed systems/services coordination. Have you contemplated
    using something like Apache Zookeeper or Heroku's Doozer? The primitives
    provided by those may lend themselves well to integrating-with/supporting
    your application, while the messaging heavy lifting continues using Rabbit
    and its features/guarantees in the ways they work best without strange
    fetch-peek-ack-nack trickery that will likely make your application
    pretty complex and hard to reason about.

    http://zookeeper.apache.org/

    http://xph.us/2011/04/13/introducing-doozer.html

    Doozer is quite new; Zookeeper by virtue of its position in the
    Hadoop ecology is older, more mature and well documented both online
    and in books...

    Best regards,
    Jerry

    ----- Original Message -----
    From: "James Aimonetti" <james at 2600hz.com>
    To: rabbitmq-discuss at lists.rabbitmq.com
    Sent: Monday, February 20, 2012 9:04:12 AM
    Subject: [rabbitmq-discuss] Design Help?

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    Hey list,

    Curious for feedback, as this design idea I have would possibly extend
    to several places in our application, and I'd love to make it generic
    enough to do so.

    The basic problem:

    We have a queue of work to be done, represented as a named queue. For
    redundancy, we have 2->many processes (in Erlang) that will read from
    the queue and do the work. Easy so far.

    However, we also have a list of resources needed to process the work
    queue, which needs to be shared across all worker processes. When one
    worker retrieves a job from the work queue, it must also get the list of
    available resources, match resource to job, and return the list of
    resources (with the used resource flagged as such) back to a globally
    accessible place (hoping to avoid the DB, but it may be a good-enough
    sync point).

    So, for example, there are four jobs, two that require a paper shredder,
    two that require a wood chipper. There is only one paper shredder and
    one wood chipper. There are also 10 workers available to actually shred
    stuff.

    Worker one consumes the list of available equipment (one paper shredder,
    one wood chipper), and then pulls the first job (paper to be shredded).
    Worker one grabs the paper shredder, returns the list of resources
    having flagged the paper shredder as in use, and ACKs the job. When
    worker one finishes shredding paper, it will wait until the resources
    list is available and update it with 1 more paper shredder available.

    Once a job is actually started, I don't care if the worker crashes or
    not; that will be handled later. And I'll have a supervising gen_server
    pull the resource list, add the resource its using to its state, and
    spawn the actual worker (catching exits to return the resource). I'm
    sure other measures will be needed to ensure the resource is added back
    eventually, but I'm trying to get the basic idea down.

    So once worker one is done retrieving a resource, worker two can next
    get the resources list and the next job. If the job is paper shredding,
    worker two will nack the job (as there are no paper shredders at the
    moment) and return the resource list. I will eventually want to support
    being able to requeue the job to the back of the job queue, but for now,
    work stalls until the first item in the queue can be processed successfully.

    So while the resources list is out of queue, no workers will consume
    jobs from the work queue. As soon as it reappears in queue, I want the
    next worker in line to grab it and try to process the next job.


    Some initial concerns:

    No resources available for the job at the front of the queue means a lot
    of fetching and nacking of jobs, and fetching/returning the resource list.

    Losing the resources list: if whatever node (our Erlang VMs only
    communicate via AMQP) goes down while handling the resources list, it
    will obviously never be republished. We do usually have a timeout window
    for the worker to get a resource for a job (sometimes the actual finding
    of the resource takes a while), so if I could coordinate all workers
    consuming from the resources_available queue, I could know after timeout
    + a small fudge factor, that someone else should have begun processing.
    So maybe the resources_available queue is more a try_to_process_a_job
    queue, and each worker process maintains an identical list of resources
    available, with a worker that pairs a job to resource publishing the
    resource's in-use status.

    So perhaps the solution is:

    Jobs queue round robins amongst all workers, but only one worker can
    fetch/(n)ack. A worker can only fetch from the jobs queue by consuming a
    "next" message from a shared coordination queue. All workers publish
    updates to resources available to a routing key so all workers'
    resources state is in sync. If a worker can't find a resource for a job,
    it can (configurable) 1) nack the job and send a 'next' to the
    coordination queue; 2) requeue the job at the end of the jobs queue and
    send a 'next'; 3) sit on the job (meaning no other worker can fetch from
    the jobs queue) until an appropriate resource comes available.

    Concerns here:

    Starting/recovering the 'next' token.

    I'm sure others but am realizing this is getting long.

    Sorry for the wall of text and stream of consciousness writing. We have
    a lot of places where this pattern is showing up and would love to get a
    module or library written to ease doing this kind of coordination. Any
    ideas or directions towards appropriate messaging patterns warmly welcome :)

    - --
    James Aimonetti
    Distributed Systems Engineer / DJ MC_

    2600hz | http://2600hz.com
    sip:james at 2600hz.com
    tel: 415.886.7905
    _______________________________________________
    rabbitmq-discuss mailing list
    rabbitmq-discuss at lists.rabbitmq.com
    https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
  • James Aimonetti at Feb 28, 2012 at 8:23 pm
    Jerry,

    Thanks for the suggestions. You're right that this is an abuse of the
    messaging patterns, but the way our system is structured seems to
    require building these higher level concepts on top of the primitives we
    get from AMQP/RabbitMQ.

    We have a system spread out over many servers, across datacenters, with
    redundant applications running, and we need ways to coordinate work
    amongst the redundant workers in this manner. When we don't care about
    the order of work, its really easy to have the queue round-robin the
    work to each worker.

    As we're building more advanced features into our platform, this pattern
    of only allowing one worker to process a job, across the whole platform,
    is popping up in a few places and we'd like to encapsulate that pattern
    into a reusable bit of code (a behaviour in Erlang).

    AMQP is the glue that binds all these apps together, so we look to build
    on top of its feature set.

    I'll let the list know more if I figure a tenable solution :)
    On 02/28/2012 11:53 AM, Jerry Kuch wrote:
    Hi, James:

    Some of this appears to go beyond messaging per se and into more
    general distributed systems/services coordination. Have you contemplated
    using something like Apache Zookeeper or Heroku's Doozer? The primitives
    provided by those may lend themselves well to integrating-with/supporting
    your application, while the messaging heavy lifting continues using Rabbit
    and its features/guarantees in the ways they work best without strange
    fetch-peek-ack-nack trickery that will likely make your application
    pretty complex and hard to reason about.

    http://zookeeper.apache.org/

    http://xph.us/2011/04/13/introducing-doozer.html

    Doozer is quite new; Zookeeper by virtue of its position in the
    Hadoop ecology is older, more mature and well documented both online
    and in books...

    Best regards,
    Jerry

    ----- Original Message -----
    From: "James Aimonetti" <james at 2600hz.com>
    To: rabbitmq-discuss at lists.rabbitmq.com
    Sent: Monday, February 20, 2012 9:04:12 AM
    Subject: [rabbitmq-discuss] Design Help?

    Hey list,

    Curious for feedback, as this design idea I have would possibly extend
    to several places in our application, and I'd love to make it generic
    enough to do so.

    The basic problem:

    We have a queue of work to be done, represented as a named queue. For
    redundancy, we have 2->many processes (in Erlang) that will read from
    the queue and do the work. Easy so far.

    However, we also have a list of resources needed to process the work
    queue, which needs to be shared across all worker processes. When one
    worker retrieves a job from the work queue, it must also get the list of
    available resources, match resource to job, and return the list of
    resources (with the used resource flagged as such) back to a globally
    accessible place (hoping to avoid the DB, but it may be a good-enough
    sync point).

    So, for example, there are four jobs, two that require a paper shredder,
    two that require a wood chipper. There is only one paper shredder and
    one wood chipper. There are also 10 workers available to actually shred
    stuff.

    Worker one consumes the list of available equipment (one paper shredder,
    one wood chipper), and then pulls the first job (paper to be shredded).
    Worker one grabs the paper shredder, returns the list of resources
    having flagged the paper shredder as in use, and ACKs the job. When
    worker one finishes shredding paper, it will wait until the resources
    list is available and update it with 1 more paper shredder available.

    Once a job is actually started, I don't care if the worker crashes or
    not; that will be handled later. And I'll have a supervising gen_server
    pull the resource list, add the resource its using to its state, and
    spawn the actual worker (catching exits to return the resource). I'm
    sure other measures will be needed to ensure the resource is added back
    eventually, but I'm trying to get the basic idea down.

    So once worker one is done retrieving a resource, worker two can next
    get the resources list and the next job. If the job is paper shredding,
    worker two will nack the job (as there are no paper shredders at the
    moment) and return the resource list. I will eventually want to support
    being able to requeue the job to the back of the job queue, but for now,
    work stalls until the first item in the queue can be processed successfully.

    So while the resources list is out of queue, no workers will consume
    jobs from the work queue. As soon as it reappears in queue, I want the
    next worker in line to grab it and try to process the next job.


    Some initial concerns:

    No resources available for the job at the front of the queue means a lot
    of fetching and nacking of jobs, and fetching/returning the resource list.

    Losing the resources list: if whatever node (our Erlang VMs only
    communicate via AMQP) goes down while handling the resources list, it
    will obviously never be republished. We do usually have a timeout window
    for the worker to get a resource for a job (sometimes the actual finding
    of the resource takes a while), so if I could coordinate all workers
    consuming from the resources_available queue, I could know after timeout
    + a small fudge factor, that someone else should have begun processing.
    So maybe the resources_available queue is more a try_to_process_a_job
    queue, and each worker process maintains an identical list of resources
    available, with a worker that pairs a job to resource publishing the
    resource's in-use status.

    So perhaps the solution is:

    Jobs queue round robins amongst all workers, but only one worker can
    fetch/(n)ack. A worker can only fetch from the jobs queue by consuming a
    "next" message from a shared coordination queue. All workers publish
    updates to resources available to a routing key so all workers'
    resources state is in sync. If a worker can't find a resource for a job,
    it can (configurable) 1) nack the job and send a 'next' to the
    coordination queue; 2) requeue the job at the end of the jobs queue and
    send a 'next'; 3) sit on the job (meaning no other worker can fetch from
    the jobs queue) until an appropriate resource comes available.

    Concerns here:

    Starting/recovering the 'next' token.

    I'm sure others but am realizing this is getting long.

    Sorry for the wall of text and stream of consciousness writing. We have
    a lot of places where this pattern is showing up and would love to get a
    module or library written to ease doing this kind of coordination. Any
    ideas or directions towards appropriate messaging patterns warmly welcome :)
    _______________________________________________
    rabbitmq-discuss mailing list
    rabbitmq-discuss at lists.rabbitmq.com
    https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

    - --
    James Aimonetti
    Distributed Systems Engineer / DJ MC_

    2600hz | http://2600hz.com
    sip:james at 2600hz.com
    tel: 415.886.7905

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouprabbitmq-discuss @
categoriesrabbitmq
postedFeb 20, '12 at 5:04p
activeFeb 28, '12 at 8:23p
posts3
users2
websiterabbitmq.com
irc#rabbitmq

2 users in discussion

James Aimonetti: 2 posts Jerry Kuch: 1 post

People

Translate

site design / logo © 2022 Grokbase