FAQ
For those just joining us, the discussion is on the implementation of topic
exchanges.
On Sat, Feb 20, 2010 at 02:52:41AM -0800, Yurii Rashkovskii wrote:
Matthias,
On 2010-02-20, at 2:49 AM, Matthias Radestock wrote:

Yurii,

Yurii Rashkovskii wrote:
Out of curiosity, how this finite state machine would work?
Let me hand you over to my colleagues Matthew Sackman and David MacIver, who have been discussing this recently.
Thanks!
Hi,

be more when we actually find time to implement it).

A couple points to consideri:

We essentially have three operations we need to optimise:

2. removing a binding
3. looking up a key against a set of bindings

Currently the first two are fast (probably O(log(n)). Depends what mnesia is doing)
but the third is O(length of key * number of bindings). This is sub-optimal.

The naive solution would be to recompile the bindings to a finite automaton every
time they change, but this is an O(number of bindings) operation, so this would
result in creating O(n) bindings being O(n^2), which isn't great. So the objective
is to build the tree incrementally in a way makes each operation relatively cheap.

The version I proposed (which is almost certainly not how it will look in the end)
is that we'd implement it as a mostly-optimized NFA (only mostly because it makes
the cost of all binding modification operations way cheaper to do it this way
while not hurting binding lookup that much). This actually ends up being relatively
close to the tree of topics you proposed in your earlier email.

Essentially the way it would look is that you'd have the obvious NFA corresponding
to any given binding and then merge prefixes for these.

So e.g. the binding foo.bar.baz to queue 1 would simply look like:

1 -(foo)-> 2 -(bar)-> 3 -(baz)-> [queue-1]

while foo.#.baz to queue-2 would give

1 -(foo)-> 2 -(.)-> 2
-(epsilon)-> 3 -(baz)-> [queue-2]

(where an epsilon transition is as usual one that doesn't consume any tokens)

And adding both would result in:

1 -(foo)-> 2 -(epsilon)-> 3 -(bar)-> 4 -(baz)-> [queue-1]
-(epsilon)-> 5 -(.)-> 5
-(epsilon)-> 6 -(baz)-> [queue-2]

It's worth noting that although in principle if both bindings were to queue-1 we
would be able to merge nodes 6 and 7, we probably wouldn't do so: Because we want
these machines to be incrementally editable there will definitely be some redundant
nodes.

One thing we've discussed is the choice between an NFA and a DFA. On the one hand,
the DFA will be a lot faster on lookup for the cases it's well suited to (as well
as constant factors, the DFA will be O(length of key) while the NFA will be more
like O(length of key * number of _matching_ bindings)), on the other hand it will
be slower for binding deletion (because you have to scan multiplea branches of
the tree to delete a single binding).

Additionally there are pathological cases where a DFA ends up with an exponentially
larger number of nodes than the equivalent DFA. For example:

#.foo{".*" k times}

takes O(2^k) nodes to represent with a DFA but O(k) nodes with an NFA.

One thing that would help us to make these decisions as to the best approach would
be some feedback on how people are using topic exchanges in the real world: e.g.
if it turns out that basically all bindings are short then the space explosion
isn't really an issue. If bindings are created rarely, or typically in big blocks,
then we could optimise for that. There's not going to be a universally "best"
solution, so while the objective is of course to make everything as fast as possible
it always helps to know which operations we need to worry about the most.

Hope that helps clarify some of our thoughts so far.

Regards,
David

Search Discussions

•  at Feb 22, 2010 at 11:46 am ⇧
Hi David,
One thing that would help us to make these decisions as to the best
approach would
be some feedback on how people are using topic exchanges in the real
world: e.g.
if it turns out that basically all bindings are short then the space
explosion isn't really an issue. If bindings are created rarely, or
typically in big blocks,
then we could optimise for that. There's not going to be a
universally "best" solution, so while the objective is of course to make
everything as fast as possible
it always helps to know which operations we need to worry about the most.
Hope that helps clarify some of our thoughts so far.
Very often the only subsciption type needed is subsciption for a
specific sub-tree within the topic hierarchy, in other words
subscription with literal prefix and # at the end, e.g. "x.y.z.#".

If that's the case, all three fundamental operations can be implemented
in O(1) time where n is number of topics and/or bindings and O(n) time
where n is length of the longest subscription.

Such an algorithm can be turned on by a specific hint passed to
Exchange.Create 'arguments' parameter, or, more transparently, it can be
used as long as subscriptions adhere to "x.y.z.#" pattern and once
there's a non-compliant subscription, matching algorithm can switch
dynamically to a more generic one.

Martin
•  at Feb 22, 2010 at 12:02 pm ⇧

On Mon, Feb 22, 2010 at 12:43:14PM +0100, Martin Sustrik wrote:
Hi David,
One thing that would help us to make these decisions as to the
best approach would
be some feedback on how people are using topic exchanges in the
real world: e.g.
if it turns out that basically all bindings are short then the
space explosion isn't really an issue. If bindings are created
rarely, or typically in big blocks,
then we could optimise for that. There's not going to be a
universally "best" solution, so while the objective is of course to
make everything as fast as possible
it always helps to know which operations we need to worry about the most.
Hope that helps clarify some of our thoughts so far.
Very often the only subsciption type needed is subsciption for a
specific sub-tree within the topic hierarchy, in other words
subscription with literal prefix and # at the end, e.g. "x.y.z.#".
Ah, that makes sense. Thanks.
If that's the case, all three fundamental operations can be
implemented in O(1) time where n is number of topics and/or bindings
Well, O(log(branching rate for the tree)) (you need to do key lookups at
each level), but near enough.
and O(n) time where n is length of the longest subscription.
Such an algorithm can be turned on by a specific hint passed to
Exchange.Create 'arguments' parameter, or, more transparently, it
can be used as long as subscriptions adhere to "x.y.z.#" pattern and
once there's a non-compliant subscription, matching algorithm can
switch dynamically to a more generic one.
I'm not sure this actually needs quite such specific logic. It should be possible
to add rules which means this behaviour falls out naturally rather than having a
sharp transition between the two behaviours e.g. in the NFA approach suggested
the only special handling needed would be to make sure that binding the pattern
# was sensibly implemented as a "go straight to queue, do not pass go, do not
collect O(key length) operations" node rather than doing a full scan of the rest
of the key.

'though I guess that sparks another question: How long are the keys people are
actually using with this?
•  at Feb 22, 2010 at 12:10 pm ⇧
David R. MacIver wrote:
If that's the case, all three fundamental operations can be
implemented in O(1) time where n is number of topics and/or bindings
>
Well, O(log(branching rate for the tree)) (you need to do key lookups at
each level), but near enough.
Right. However, in ideal case you'll use hash table for the lookup which
will get the average case down to a constant.
and O(n) time where n is length of the longest subscription.
Such an algorithm can be turned on by a specific hint passed to
Exchange.Create 'arguments' parameter, or, more transparently, it
can be used as long as subscriptions adhere to "x.y.z.#" pattern and
once there's a non-compliant subscription, matching algorithm can
switch dynamically to a more generic one.
>
I'm not sure this actually needs quite such specific logic. It should
be possible
to add rules which means this behaviour falls out naturally rather
than having a sharp transition between the two behaviours e.g. in the
NFA approach suggested the only special handling needed would be to make
sure that binding the pattern # was sensibly implemented as a "go
straight to queue, do not pass go, do not collect O(key length)
operations" node rather than doing a full scan of the rest
of the key. Ack.
'though I guess that sparks another question: How long are the keys
people are
actually using with this?
My experience from stock trading sphere is that most keys are 3-5 items
long, the longest being somewhere in the range of 15-20 items.

Martin
•  at Feb 22, 2010 at 12:22 pm ⇧

On Mon, Feb 22, 2010 at 01:07:59PM +0100, Martin Sustrik wrote:
David R. MacIver wrote:
If that's the case, all three fundamental operations can be
implemented in O(1) time where n is number of topics and/or bindings
Well, O(log(branching rate for the tree)) (you need to do key lookups at
each level), but near enough.
Right. However, in ideal case you'll use hash table for the lookup
which will get the average case down to a constant.
True, although the worst case down to O(n) [we should probably take the resulting
vim vs. emacs style trees-vs-hash-tables debate that normally follows at this
'though I guess that sparks another question: How long are the keys people are
actually using with this?
My experience from stock trading sphere is that most keys are 3-5
items long, the longest being somewhere in the range of 15-20 items.
Thanks, that's good to know.

Paul Jones just pointed me to an interesting use case here.

http://github.com/paulj/trapeze

The keys being used correspond to URLS + methods and seem to look something like

GET.some.domain.//.foo.bar.baz

(I'm not sure if that's exactly right, but that's the idea)

so your routing would normally look something like

*.some.domain.//.some.prefix.#

The keys are probably no longer than what you describe, but the initial * does
mean that they need to be able gracefully transition between simple prefixes
and the more general NFA.
•  at Feb 22, 2010 at 8:50 pm ⇧

David R. MacIver wrote:
'though I guess that sparks another question: How long are the keys people are
actually using with this?
256 bytes or fewer, as limited by the AMQP transfer syntax. :-/
•  at Feb 22, 2010 at 8:55 pm ⇧

Tony Garnock-Jones wrote:
256 bytes or fewer, as limited by the AMQP transfer syntax. :-/
Uh, oops, off by one. 255 bytes or fewer. Zero is a permitted length.
FTR :-)

Tony
•  at Feb 23, 2010 at 10:32 am ⇧
Well, the byte count isn't actually that interesting (though thanks for the
reminder). The question is how many tokens are used.

e.g. #.{ 126 repetitions of a.} would produce quite an unfortunately large DFA.

Related Discussions

Discussion Overview
 group rabbitmq-discuss categories rabbitmq posted Feb 22, '10 at 11:24a active Feb 23, '10 at 10:32a posts 8 users 3 website rabbitmq.com irc #rabbitmq

3 users in discussion

Content

People

Support

Translate

site design / logo © 2022 Grokbase