FAQ
Hello all,

Just wondering if those with a good amount of experience using Kafka in
production with many streams have converged on any sort of naming
convention. If so would you be willing to share?

Thanks in advance,

Taylor Gautier

Search Discussions

  • Taylor Gautier at Feb 24, 2015 at 8:56 pm
    Hello all,
    Just wondering if those with a good amount of experience using Kafka in production with many streams have converged on any sort of naming convention.  If so would you be willing to share?
    Thanks in advance,
    Taylor
  • Thunder Stumpges at Feb 24, 2015 at 9:13 pm
    We have a global namespace hierarchy for topics that is exactly our Avro namespace with Class Name. The template is basically:

    <root_ns>.Core.<core_data_types_shared_across_company>
    <root_ns>.<product>.<product_specific_hierarchy>

    The up side of this for us is that since the topics are named based on the Avro schema namespace and type, we can look up the avro schema in the Avro Schema Repository using the topic name, and the schema ID coded into the message. Each product then also has the flexibility of defining whatever topics they find useful.

    Hope this helps,
    Thunder

    -----Original Message-----
    From: Taylor Gautier
    Sent: Tuesday, February 24, 2015 12:11 PM
    To: kafka-users@incubator.apache.org
    Subject: Stream naming conventions?

    Hello all,
    Just wondering if those with a good amount of experience using Kafka in production with many streams have converged on any sort of naming convention.  If so would you be willing to share?
    Thanks in advance,
    Taylor
  • Gwen Shapira at Feb 25, 2015 at 4:18 am
    Nice :) I like the idea of tying topic name to avro schemas.

    I have experience with other people's data, and until now I mostly
    recommended:
    <app type>.<app name>.<data set name>.<stage of processing>

    So we end up with things like:
    etl.onlineshop.searches.validated

    Or if I have my own test dataset that I don't want to share:
    users.gshapira.newapp.testing1

    Makes it relatively easy to share datasets across the organization, and
    also makes white-listing and black-listing relatively simple because of the
    hierarchy (until we add a real topic hierarchy to kafka...).

    Gwen
    On Tue, Feb 24, 2015 at 1:13 PM, Thunder Stumpges wrote:

    We have a global namespace hierarchy for topics that is exactly our Avro
    namespace with Class Name. The template is basically:

    <root_ns>.Core.<core_data_types_shared_across_company>
    <root_ns>.<product>.<product_specific_hierarchy>

    The up side of this for us is that since the topics are named based on the
    Avro schema namespace and type, we can look up the avro schema in the Avro
    Schema Repository using the topic name, and the schema ID coded into the
    message. Each product then also has the flexibility of defining whatever
    topics they find useful.

    Hope this helps,
    Thunder

    -----Original Message-----
    From: Taylor Gautier
    Sent: Tuesday, February 24, 2015 12:11 PM
    To: kafka-users@incubator.apache.org
    Subject: Stream naming conventions?

    Hello all,
    Just wondering if those with a good amount of experience using Kafka in
    production with many streams have converged on any sort of naming
    convention. If so would you be willing to share?
    Thanks in advance,
    Taylor
  • Maciej Jaśkowski at Mar 3, 2015 at 10:34 am
    This approach sounds nice at first but it would fail if you start
    sending the same message but partitioned in different (orthogonal)
    ways. How would you go about that?

    Maciej
    On 25 February 2015 at 05:17, Gwen Shapira wrote:
    Nice :) I like the idea of tying topic name to avro schemas.

    I have experience with other people's data, and until now I mostly
    recommended:
    <app type>.<app name>.<data set name>.<stage of processing>

    So we end up with things like:
    etl.onlineshop.searches.validated

    Or if I have my own test dataset that I don't want to share:
    users.gshapira.newapp.testing1

    Makes it relatively easy to share datasets across the organization, and
    also makes white-listing and black-listing relatively simple because of the
    hierarchy (until we add a real topic hierarchy to kafka...).

    Gwen
    On Tue, Feb 24, 2015 at 1:13 PM, Thunder Stumpges wrote:

    We have a global namespace hierarchy for topics that is exactly our Avro
    namespace with Class Name. The template is basically:

    <root_ns>.Core.<core_data_types_shared_across_company>
    <root_ns>.<product>.<product_specific_hierarchy>

    The up side of this for us is that since the topics are named based on the
    Avro schema namespace and type, we can look up the avro schema in the Avro
    Schema Repository using the topic name, and the schema ID coded into the
    message. Each product then also has the flexibility of defining whatever
    topics they find useful.

    Hope this helps,
    Thunder

    -----Original Message-----
    From: Taylor Gautier
    Sent: Tuesday, February 24, 2015 12:11 PM
    To: kafka-users@incubator.apache.org
    Subject: Stream naming conventions?

    Hello all,
    Just wondering if those with a good amount of experience using Kafka in
    production with many streams have converged on any sort of naming
    convention. If so would you be willing to share?
    Thanks in advance,
    Taylor


    --

    Twitter: @mjaskowski
  • Thunder Stumpges at Mar 3, 2015 at 2:57 pm
    I'm not sure who you were asking the question to, but since Gwen's was not bound to any restrictions just a guideline, I'll assume you meant me :)

    We have a concept of a "topic suffix property" that is some property in the data that can change dynamically. The full topic name then becomes "<avro_class>-<topic_suffix>" the dash is agreed never to be used in a topic suffix so we can strip just the last dash to get back to the class name. You could pick any delimiter not used in class names or suffixes.

    The topic suffix is then where we put things like processing stage (incoming, cleaned, duplicate, etc) as well as any other orthogonal delineation that needs to be in a different topic.

    We use .NET so I'm not sure the terminology for java but we have property attributes to declare a property as the "topic suffix property" (and also the "message key property") and we use "property getters" in a partial class to do dynamic computation of these if necessary.

    A "message registry" then uses reflection to get the topic name and message key for any message going out our producer. It also deals with stripping the topic suffix for consumers looking for the avro type given a topic name.

    So far this has worked great for us.
    Cheers,
    Thunder



    -----Original Message-----
    From: Maciej Jaśkowski [maciej.jaskowski@gmail.com]
    Received: Tuesday, 03 Mar 2015, 2:34AM
    To: users@kafka.apache.org [users@kafka.apache.org]
    CC: Taylor Gautier [tgautier@yahoo.com]; kafka-users@incubator.apache.org [kafka-users@incubator.apache.org]
    Subject: Re: Stream naming conventions?

    This approach sounds nice at first but it would fail if you start
    sending the same message but partitioned in different (orthogonal)
    ways. How would you go about that?

    Maciej
    On 25 February 2015 at 05:17, Gwen Shapira wrote:
    Nice :) I like the idea of tying topic name to avro schemas.

    I have experience with other people's data, and until now I mostly
    recommended:
    <app type>.<app name>.<data set name>.<stage of processing>

    So we end up with things like:
    etl.onlineshop.searches.validated

    Or if I have my own test dataset that I don't want to share:
    users.gshapira.newapp.testing1

    Makes it relatively easy to share datasets across the organization, and
    also makes white-listing and black-listing relatively simple because of the
    hierarchy (until we add a real topic hierarchy to kafka...).

    Gwen
    On Tue, Feb 24, 2015 at 1:13 PM, Thunder Stumpges wrote:

    We have a global namespace hierarchy for topics that is exactly our Avro
    namespace with Class Name. The template is basically:

    <root_ns>.Core.<core_data_types_shared_across_company>
    <root_ns>.<product>.<product_specific_hierarchy>

    The up side of this for us is that since the topics are named based on the
    Avro schema namespace and type, we can look up the avro schema in the Avro
    Schema Repository using the topic name, and the schema ID coded into the
    message. Each product then also has the flexibility of defining whatever
    topics they find useful.

    Hope this helps,
    Thunder

    -----Original Message-----
    From: Taylor Gautier
    Sent: Tuesday, February 24, 2015 12:11 PM
    To: kafka-users@incubator.apache.org
    Subject: Stream naming conventions?

    Hello all,
    Just wondering if those with a good amount of experience using Kafka in
    production with many streams have converged on any sort of naming
    convention. If so would you be willing to share?
    Thanks in advance,
    Taylor


    --

    Twitter: @mjaskowski
  • Julio Castillo at Mar 3, 2015 at 6:58 pm
    Can you provide some examples on your naming patterns described below?

    Thanks

    ** julio
    On 3/3/15, 6:56 AM, "Thunder Stumpges" wrote:

    I'm not sure who you were asking the question to, but since Gwen's was
    not bound to any restrictions just a guideline, I'll assume you meant me
    :)

    We have a concept of a "topic suffix property" that is some property in
    the data that can change dynamically. The full topic name then becomes
    "<avro_class>-<topic_suffix>" the dash is agreed never to be used in a
    topic suffix so we can strip just the last dash to get back to the class
    name. You could pick any delimiter not used in class names or suffixes.

    The topic suffix is then where we put things like processing stage
    (incoming, cleaned, duplicate, etc) as well as any other orthogonal
    delineation that needs to be in a different topic.

    We use .NET so I'm not sure the terminology for java but we have property
    attributes to declare a property as the "topic suffix property" (and also
    the "message key property") and we use "property getters" in a partial
    class to do dynamic computation of these if necessary.

    A "message registry" then uses reflection to get the topic name and
    message key for any message going out our producer. It also deals with
    stripping the topic suffix for consumers looking for the avro type given
    a topic name.

    So far this has worked great for us.
    Cheers,
    Thunder



    -----Original Message-----
    From: Maciej Jaśkowski [maciej.jaskowski@gmail.com]
    Received: Tuesday, 03 Mar 2015, 2:34AM
    To: users@kafka.apache.org [users@kafka.apache.org]
    CC: Taylor Gautier [tgautier@yahoo.com]; kafka-users@incubator.apache.org
    [kafka-users@incubator.apache.org]
    Subject: Re: Stream naming conventions?

    This approach sounds nice at first but it would fail if you start
    sending the same message but partitioned in different (orthogonal)
    ways. How would you go about that?

    Maciej
    On 25 February 2015 at 05:17, Gwen Shapira wrote:
    Nice :) I like the idea of tying topic name to avro schemas.

    I have experience with other people's data, and until now I mostly
    recommended:
    <app type>.<app name>.<data set name>.<stage of processing>

    So we end up with things like:
    etl.onlineshop.searches.validated

    Or if I have my own test dataset that I don't want to share:
    users.gshapira.newapp.testing1

    Makes it relatively easy to share datasets across the organization, and
    also makes white-listing and black-listing relatively simple because of
    the
    hierarchy (until we add a real topic hierarchy to kafka...).

    Gwen

    On Tue, Feb 24, 2015 at 1:13 PM, Thunder Stumpges <tstumpges@ntent.com>
    wrote:
    We have a global namespace hierarchy for topics that is exactly our
    Avro
    namespace with Class Name. The template is basically:

    <root_ns>.Core.<core_data_types_shared_across_company>
    <root_ns>.<product>.<product_specific_hierarchy>

    The up side of this for us is that since the topics are named based on
    the
    Avro schema namespace and type, we can look up the avro schema in the
    Avro
    Schema Repository using the topic name, and the schema ID coded into
    the
    message. Each product then also has the flexibility of defining
    whatever
    topics they find useful.

    Hope this helps,
    Thunder

    -----Original Message-----
    From: Taylor Gautier
    Sent: Tuesday, February 24, 2015 12:11 PM
    To: kafka-users@incubator.apache.org
    Subject: Stream naming conventions?

    Hello all,
    Just wondering if those with a good amount of experience using Kafka in
    production with many streams have converged on any sort of naming
    convention. If so would you be willing to share?
    Thanks in advance,
    Taylor


    --

    Twitter: @mjaskowski
    NOTICE: This e-mail and any attachments to it may be privileged, confidential or contain trade secret information and is intended only for the use of the individual or entity to which it is addressed. If this e-mail was sent to you in error, please notify me immediately by either reply e-mail or by phone at 408.498.6000, and do not use, disseminate, retain, print or copy the e-mail or any attachment. All messages sent to and from this e-mail address may be monitored as permitted by or necessary under applicable law and regulations.
  • Thunder Stumpges at Mar 3, 2015 at 10:43 pm
    Sure, these are contrived, but you'll get the idea :)

    Note: the suffixes are generally an enumeration or combination of two enumerations, so the "domain" of values should always be bounded (so that the number of topics is also bounded). The idea is any time we want to use the same avro schema but don't want the messages to be in the same topic in kafka, we use the suffix to properly separate them.

    As a phase of processing:
        org.ntent.addelivery.pageview-incoming
        org.ntent.addelivery.pageview-filtered
        org.ntent.addelivery.pageview-duplicate
        org.ntent.addelivery.pageview-clean

    To separate "instances" of a particular kind of activity:
        org.ntent.addelivery.feedrequest-feed1
        org.ntent.addelivery.feedrequest-feed2
        org.ntent.addelivery.feedrequest-feed3

    To denote the type of "statistic":
        org.ntent.addelivery.filterstats-knownoffender
        org.ntent.addelivery.filterstats-bot
        org.ntent.addelivery.filterstats-clickrate

    Hope this helps :)


    -----Original Message-----
    From: Julio Castillo
    Sent: Tuesday, March 03, 2015 10:56 AM
    To: users@kafka.apache.org
    Cc: tgautier@yahoo.com; kafka-users@incubator.apache.org
    Subject: Re: Stream naming conventions?

    Can you provide some examples on your naming patterns described below?

    Thanks

    ** julio
    On 3/3/15, 6:56 AM, "Thunder Stumpges" wrote:

    I'm not sure who you were asking the question to, but since Gwen's was
    not bound to any restrictions just a guideline, I'll assume you meant
    me
    :)

    We have a concept of a "topic suffix property" that is some property in
    the data that can change dynamically. The full topic name then becomes
    "<avro_class>-<topic_suffix>" the dash is agreed never to be used in a
    topic suffix so we can strip just the last dash to get back to the
    class name. You could pick any delimiter not used in class names or suffixes.

    The topic suffix is then where we put things like processing stage
    (incoming, cleaned, duplicate, etc) as well as any other orthogonal
    delineation that needs to be in a different topic.

    We use .NET so I'm not sure the terminology for java but we have
    property attributes to declare a property as the "topic suffix
    property" (and also the "message key property") and we use "property
    getters" in a partial class to do dynamic computation of these if necessary.

    A "message registry" then uses reflection to get the topic name and
    message key for any message going out our producer. It also deals with
    stripping the topic suffix for consumers looking for the avro type
    given a topic name.

    So far this has worked great for us.
    Cheers,
    Thunder



    -----Original Message-----
    From: Maciej Jaśkowski [maciej.jaskowski@gmail.com]
    Received: Tuesday, 03 Mar 2015, 2:34AM
    To: users@kafka.apache.org [users@kafka.apache.org]
    CC: Taylor Gautier [tgautier@yahoo.com];
    kafka-users@incubator.apache.org [kafka-users@incubator.apache.org]
    Subject: Re: Stream naming conventions?

    This approach sounds nice at first but it would fail if you start
    sending the same message but partitioned in different (orthogonal)
    ways. How would you go about that?

    Maciej
    On 25 February 2015 at 05:17, Gwen Shapira wrote:
    Nice :) I like the idea of tying topic name to avro schemas.

    I have experience with other people's data, and until now I mostly
    recommended:
    <app type>.<app name>.<data set name>.<stage of processing>

    So we end up with things like:
    etl.onlineshop.searches.validated

    Or if I have my own test dataset that I don't want to share:
    users.gshapira.newapp.testing1

    Makes it relatively easy to share datasets across the organization,
    and also makes white-listing and black-listing relatively simple
    because of the hierarchy (until we add a real topic hierarchy to
    kafka...).

    Gwen

    On Tue, Feb 24, 2015 at 1:13 PM, Thunder Stumpges
    <tstumpges@ntent.com>
    wrote:
    We have a global namespace hierarchy for topics that is exactly our
    Avro namespace with Class Name. The template is basically:

    <root_ns>.Core.<core_data_types_shared_across_company>
    <root_ns>.<product>.<product_specific_hierarchy>

    The up side of this for us is that since the topics are named based
    on the Avro schema namespace and type, we can look up the avro
    schema in the Avro Schema Repository using the topic name, and the
    schema ID coded into the message. Each product then also has the
    flexibility of defining whatever topics they find useful.

    Hope this helps,
    Thunder

    -----Original Message-----
    From: Taylor Gautier
    Sent: Tuesday, February 24, 2015 12:11 PM
    To: kafka-users@incubator.apache.org
    Subject: Stream naming conventions?

    Hello all,
    Just wondering if those with a good amount of experience using Kafka
    in production with many streams have converged on any sort of naming
    convention. If so would you be willing to share?
    Thanks in advance,
    Taylor


    --

    Twitter: @mjaskowski
    NOTICE: This e-mail and any attachments to it may be privileged, confidential or contain trade secret information and is intended only for the use of the individual or entity to which it is addressed. If this e-mail was sent to you in error, please notify me immediately by either reply e-mail or by phone at 408.498.6000, and do not use, disseminate, retain, print or copy the e-mail or any attachment. All messages sent to and from this e-mail address may be monitored as permitted by or necessary under applicable law and regulations.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupusers @
categorieskafka
postedFeb 24, '15 at 6:33p
activeMar 3, '15 at 10:43p
posts8
users6
websitekafka.apache.org
irc#kafka

People

Translate

site design / logo © 2021 Grokbase