Multiple topics is the model I would recommend for what you have described.
LinkedIn has an environment where we have a wide mix, in a lot of different
clusters. We have some topics that have one producer and one consumer
(queuing). We have some topics that are multi-producer (tracking and
metrics, mostly). Some of those are multi-consumer (tracking), and some are
mostly single consumer (metrics). Besides all of this, we have a couple
wildcard consumers that read everything (our audit system, and mirror
In your case, the rules engine sounds like a similar consumer case as our
audit consumer. I would not make the determination as to how many topics
you need based on that consumer because of that. Since the majority of what
you're describing is consumers who are interested in discrete data sets, go
with breaking out the topics based on that (all other things being equal).
While Gwen is absolutely right about her guidelines, consuming and throwing
away most of the data is a cardinal sin and should be avoided. Multi-topic
consumers are much less of a problem to deal with. Personally, I wouldn't
bother combining the messages into a separate topic for the rules engine -
I would just consume all the topics.
You mentioned message ordering, and that can present an issue. Now, you'd
likely have this problem regardless of how many topics you use, as ordering
is only guaranteed in a single partition. So you'd either have to have one
partition, or you would have to use some sort of partitioning scheme on the
messages that means hard ordering of all the messages matters less.
Obviously, when you have multiple topics it's the same as having multiple
partitions. You need to decide how important ordering within Kafka is to
your application, and if it can be handled separately inside of the
On Thu, Oct 8, 2015 at 8:50 AM, Mark Drago wrote:
Thanks for your reply. I understand all of the points you've made. I
think the challenge for us is that we have some consumers that are
interested in messages of one type, but we also have a rules engine that is
checking for events of many types and acting on them.
If we put discrete event types on individual topics:
* Our rules engine would have to monitor many of these topics (10-20)
* Other consumers would only see messages they care about
If we put all event types on one topic:
* Our rules engine only has to monitor one topic
* Other consumers would parse and then discard the majority of the
messages that they see
Perhaps a better approach would be to have different topics for the
different use cases? This would be similar to an approach that merges
smaller topics together as needed. So, each event type would be on its own
topic but then we would put a subset of those messages on another topic
that is destined for the rules engine. The consumers that only care about
one message type would listen on dedicated topics and the rules engine
would just monitor one topic for all of the events that it cares about. We
would need to have something moving/creating messages on the rules engine
topic. We may also run in to another set of problems as the ordering of
messages of different types no longer exists as they're coming from
I'm curious to hear if anyone else has been in a similar situation and had
to make a judgement call about the best approach to take.
I usually approach this questions by looking at possible consumers. You
usually want each consumer to read from relatively few topics, use most
of the messages it receives and have fairly cohesive logic for using these
messages. Signs that things went wrong with too few topics:
* Consumers that throw away 90% of the messages on topics they read
* Consumers with gigantic switch statements for handling all the different
message types they get Signs that you have too many topics:
* Every consumer needs to read messages from 20 different topics in order
to construct the objects it actually uses If you ever did data modeling
for a datawarehouse, this will look a bit
familiar :) Gwen
On Tue, Oct 6, 2015 at 4:46 PM, Mark Drago wrote:
At my organization we are already using kafka in a few areas, but we're
looking to expand our use and we're struggling with how best to
our events on to topics.
We have on the order of 30 different kinds of events that we'd like to
distribute via kafka. We have one or two consumers that have a need to
consume many of these types of events (~20 out of the 30) and we have other
consumers that are only interested in one type of event.
We're trying to decide between a model where we have one topic
many kinds of events or a model where we have many topics each
one type of event. We have also thought about mixed models where we
one large topic that we later break down in to smaller ones or we have many
small topics that we later coming in to a large topic.
I'm curious to hear about best practices and past experiences from the
members of this group. What is the general best practice for reusing
topics or creating new ones? What has worked well in the past? What
should we be considering while making this decision?
Thanks in advance!