FAQ
Hello,

At my organization we are already using kafka in a few areas, but we're
looking to expand our use and we're struggling with how best to distribute
our events on to topics.

We have on the order of 30 different kinds of events that we'd like to
distribute via kafka. We have one or two consumers that have a need to
consume many of these types of events (~20 out of the 30) and we have other
consumers that are only interested in one type of event.

We're trying to decide between a model where we have one topic containing
many kinds of events or a model where we have many topics each containing
one type of event. We have also thought about mixed models where we have
one large topic that we later break down in to smaller ones or we have many
small topics that we later coming in to a large topic.

I'm curious to hear about best practices and past experiences from the
members of this group. What is the general best practice for reusing
topics or creating new ones? What has worked well in the past? What
should we be considering while making this decision?

Thanks in advance!
Mark.

Search Discussions

  • Gwen Shapira at Oct 7, 2015 at 12:39 am
    I usually approach this questions by looking at possible consumers.

    You usually want each consumer to read from relatively few topics, use most
    of the messages it receives and have fairly cohesive logic for using these
    messages.

    Signs that things went wrong with too few topics:
    * Consumers that throw away 90% of the messages on topics they read
    * Consumers with gigantic switch statements for handling all the different
    message types they get

    Signs that you have too many topics:
    * Every consumer needs to read messages from 20 different topics in order
    to construct the objects it actually uses

    If you ever did data modeling for a datawarehouse, this will look a bit
    familiar :)

    Gwen
    On Tue, Oct 6, 2015 at 4:46 PM, Mark Drago wrote:

    Hello,

    At my organization we are already using kafka in a few areas, but we're
    looking to expand our use and we're struggling with how best to distribute
    our events on to topics.

    We have on the order of 30 different kinds of events that we'd like to
    distribute via kafka. We have one or two consumers that have a need to
    consume many of these types of events (~20 out of the 30) and we have other
    consumers that are only interested in one type of event.

    We're trying to decide between a model where we have one topic containing
    many kinds of events or a model where we have many topics each containing
    one type of event. We have also thought about mixed models where we have
    one large topic that we later break down in to smaller ones or we have many
    small topics that we later coming in to a large topic.

    I'm curious to hear about best practices and past experiences from the
    members of this group. What is the general best practice for reusing
    topics or creating new ones? What has worked well in the past? What
    should we be considering while making this decision?

    Thanks in advance!
    Mark.
  • Mark Drago at Oct 8, 2015 at 12:50 pm
    Gwen,

    Thanks for your reply. I understand all of the points you've made. I
    think the challenge for us is that we have some consumers that are
    interested in messages of one type, but we also have a rules engine that is
    checking for events of many types and acting on them.

    If we put discrete event types on individual topics:
       * Our rules engine would have to monitor many of these topics (10-20)
       * Other consumers would only see messages they care about

    If we put all event types on one topic:
       * Our rules engine only has to monitor one topic
       * Other consumers would parse and then discard the majority of the
    messages that they see

    Perhaps a better approach would be to have different topics for the
    different use cases? This would be similar to an approach that merges
    smaller topics together as needed. So, each event type would be on its own
    topic but then we would put a subset of those messages on another topic
    that is destined for the rules engine. The consumers that only care about
    one message type would listen on dedicated topics and the rules engine
    would just monitor one topic for all of the events that it cares about. We
    would need to have something moving/creating messages on the rules engine
    topic. We may also run in to another set of problems as the ordering of
    messages of different types no longer exists as they're coming from
    separate topics.

    I'm curious to hear if anyone else has been in a similar situation and had
    to make a judgement call about the best approach to take.

    Thanks,
    Mark.

    I usually approach this questions by looking at possible consumers. You
    usually want each consumer to read from relatively few topics, use most
    of the messages it receives and have fairly cohesive logic for using these
    messages. Signs that things went wrong with too few topics:
    * Consumers that throw away 90% of the messages on topics they read
    * Consumers with gigantic switch statements for handling all the different
    message types they get Signs that you have too many topics:
    * Every consumer needs to read messages from 20 different topics in order
    to construct the objects it actually uses If you ever did data modeling
    for a datawarehouse, this will look a bit
    familiar :) Gwen
    On Tue, Oct 6, 2015 at 4:46 PM, Mark Drago wrote: >
    Hello,
    At my organization we are already using kafka in a few areas, but we're
    looking to expand our use and we're struggling with how best to
    distribute
    our events on to topics.

    We have on the order of 30 different kinds of events that we'd like to
    distribute via kafka. We have one or two consumers that have a need to
    consume many of these types of events (~20 out of the 30) and we have other
    consumers that are only interested in one type of event.

    We're trying to decide between a model where we have one topic containing
    many kinds of events or a model where we have many topics each containing
    one type of event. We have also thought about mixed models where we have
    one large topic that we later break down in to smaller ones or we have many
    small topics that we later coming in to a large topic.

    I'm curious to hear about best practices and past experiences from the
    members of this group. What is the general best practice for reusing
    topics or creating new ones? What has worked well in the past? What
    should we be considering while making this decision?

    Thanks in advance!
    Mark.
  • Todd Palino at Oct 8, 2015 at 2:09 pm
    Multiple topics is the model I would recommend for what you have described.
    LinkedIn has an environment where we have a wide mix, in a lot of different
    clusters. We have some topics that have one producer and one consumer
    (queuing). We have some topics that are multi-producer (tracking and
    metrics, mostly). Some of those are multi-consumer (tracking), and some are
    mostly single consumer (metrics). Besides all of this, we have a couple
    wildcard consumers that read everything (our audit system, and mirror
    makers).

    In your case, the rules engine sounds like a similar consumer case as our
    audit consumer. I would not make the determination as to how many topics
    you need based on that consumer because of that. Since the majority of what
    you're describing is consumers who are interested in discrete data sets, go
    with breaking out the topics based on that (all other things being equal).
    While Gwen is absolutely right about her guidelines, consuming and throwing
    away most of the data is a cardinal sin and should be avoided. Multi-topic
    consumers are much less of a problem to deal with. Personally, I wouldn't
    bother combining the messages into a separate topic for the rules engine -
    I would just consume all the topics.

    You mentioned message ordering, and that can present an issue. Now, you'd
    likely have this problem regardless of how many topics you use, as ordering
    is only guaranteed in a single partition. So you'd either have to have one
    partition, or you would have to use some sort of partitioning scheme on the
    messages that means hard ordering of all the messages matters less.
    Obviously, when you have multiple topics it's the same as having multiple
    partitions. You need to decide how important ordering within Kafka is to
    your application, and if it can be handled separately inside of the
    application.

    -Todd


    On Thu, Oct 8, 2015 at 8:50 AM, Mark Drago wrote:

    Gwen,

    Thanks for your reply. I understand all of the points you've made. I
    think the challenge for us is that we have some consumers that are
    interested in messages of one type, but we also have a rules engine that is
    checking for events of many types and acting on them.

    If we put discrete event types on individual topics:
    * Our rules engine would have to monitor many of these topics (10-20)
    * Other consumers would only see messages they care about

    If we put all event types on one topic:
    * Our rules engine only has to monitor one topic
    * Other consumers would parse and then discard the majority of the
    messages that they see

    Perhaps a better approach would be to have different topics for the
    different use cases? This would be similar to an approach that merges
    smaller topics together as needed. So, each event type would be on its own
    topic but then we would put a subset of those messages on another topic
    that is destined for the rules engine. The consumers that only care about
    one message type would listen on dedicated topics and the rules engine
    would just monitor one topic for all of the events that it cares about. We
    would need to have something moving/creating messages on the rules engine
    topic. We may also run in to another set of problems as the ordering of
    messages of different types no longer exists as they're coming from
    separate topics.

    I'm curious to hear if anyone else has been in a similar situation and had
    to make a judgement call about the best approach to take.

    Thanks,
    Mark.

    I usually approach this questions by looking at possible consumers. You
    usually want each consumer to read from relatively few topics, use most
    of the messages it receives and have fairly cohesive logic for using these
    messages. Signs that things went wrong with too few topics:
    * Consumers that throw away 90% of the messages on topics they read
    * Consumers with gigantic switch statements for handling all the different
    message types they get Signs that you have too many topics:
    * Every consumer needs to read messages from 20 different topics in order
    to construct the objects it actually uses If you ever did data modeling
    for a datawarehouse, this will look a bit
    familiar :) Gwen
    On Tue, Oct 6, 2015 at 4:46 PM, Mark Drago wrote:

    Hello,
    At my organization we are already using kafka in a few areas, but we're
    looking to expand our use and we're struggling with how best to
    distribute
    our events on to topics.

    We have on the order of 30 different kinds of events that we'd like to
    distribute via kafka. We have one or two consumers that have a need to
    consume many of these types of events (~20 out of the 30) and we have other
    consumers that are only interested in one type of event.

    We're trying to decide between a model where we have one topic
    containing
    many kinds of events or a model where we have many topics each
    containing
    one type of event. We have also thought about mixed models where we
    have
    one large topic that we later break down in to smaller ones or we have many
    small topics that we later coming in to a large topic.

    I'm curious to hear about best practices and past experiences from the
    members of this group. What is the general best practice for reusing
    topics or creating new ones? What has worked well in the past? What
    should we be considering while making this decision?

    Thanks in advance!
    Mark.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupusers @
categorieskafka
postedOct 7, '15 at 12:30a
activeOct 8, '15 at 2:09p
posts4
users3
websitekafka.apache.org
irc#kafka

People

Translate

site design / logo © 2019 Grokbase