FAQ
Is it preferable to consolidate topology logic into fewer bolts in order to
keep the overall topology simpler, or is there no penalty associated with
having a topology which contains dozens (or hundreds) of bolts? All of the
performance benchmarks which I've read about are run against trivially
simple topologies -- a spout plus 1 or 2 bolts.

I saw the section on the wiki about "Guaranteeing message processing", and
it seems that there tuple tree is a fixed size regardless of how many
tuples are in the tree, so this is not a concern.
However, it seems that the other overhead associated with having more bolts
could be significant:
- more zmq activity, since there is a send and receive operation required
for each bolt that message visits
- more input and output queues required, since each task maintains its own
queues

In our testing, the performance of the topology drops off when there are
many tasks per worker (> 25), and I am wondering whether we should focus on
making our topology simpler by consolidating bolts.
(current config: 30 bolts, 300 tasks, 300 executors, and 64 workers on 4
supervisors)

Has anyone had a similar experience?
Regards,
Ed

Search Discussions

  • Philip O'Toole at Dec 11, 2012 at 7:13 pm
    This is an interesting question.

    From my review of the various Storm bolt code -- particularly the code
    in storm-starter -- there seems to be an emphasis on small bolts, in
    terms of code line count (and even less lines when using Clojure). So
    many smaller, focused, bolts (executors and tasks when actually
    instantiated in a topology) versus 3 or 4 monolithic bolts, seems to
    be the way to go. The throughput I have seen with Storm seems to more
    than compensate for any overhead.

    It also strikes me as a great way to structure development, since
    different members of a team can focus on a specific bolt, and just
    think about the tuple interface.

    Any other views out there?

    Philip

    --
    Philip O'Toole

    Senior Developer
    Loggly, Inc.
    San Francisco, CA.
    On Tue, Dec 11, 2012 at 7:24 AM, Ed Buck wrote:
    Is it preferable to consolidate topology logic into fewer bolts in order to
    keep the overall topology simpler, or is there no penalty associated with
    having a topology which contains dozens (or hundreds) of bolts? All of the
    performance benchmarks which I've read about are run against trivially
    simple topologies -- a spout plus 1 or 2 bolts.

    I saw the section on the wiki about "Guaranteeing message processing", and
    it seems that there tuple tree is a fixed size regardless of how many tuples
    are in the tree, so this is not a concern.
    However, it seems that the other overhead associated with having more bolts
    could be significant:
    - more zmq activity, since there is a send and receive operation required
    for each bolt that message visits
    - more input and output queues required, since each task maintains its own
    queues

    In our testing, the performance of the topology drops off when there are
    many tasks per worker (> 25), and I am wondering whether we should focus on
    making our topology simpler by consolidating bolts.
    (current config: 30 bolts, 300 tasks, 300 executors, and 64 workers on 4
    supervisors)

    Has anyone had a similar experience?
    Regards,
    Ed
  • Nathan Marz at Dec 11, 2012 at 8:20 pm
    Bolts take up resources, so the fewer bolts you have the more
    resource-efficient your topology will be. The only time you require a new
    bolt is when you need to repartition your stream. The way Trident works is
    it packs as many operations as possible into a single bolt (functions,
    filters, partial aggregators, etc.)
    On Tue, Dec 11, 2012 at 7:24 AM, Ed Buck wrote:

    Is it preferable to consolidate topology logic into fewer bolts in order
    to keep the overall topology simpler, or is there no penalty associated
    with having a topology which contains dozens (or hundreds) of bolts? All of
    the performance benchmarks which I've read about are run against trivially
    simple topologies -- a spout plus 1 or 2 bolts.

    I saw the section on the wiki about "Guaranteeing message processing", and
    it seems that there tuple tree is a fixed size regardless of how many
    tuples are in the tree, so this is not a concern.
    However, it seems that the other overhead associated with having more
    bolts could be significant:
    - more zmq activity, since there is a send and receive operation required
    for each bolt that message visits
    - more input and output queues required, since each task maintains its own
    queues

    In our testing, the performance of the topology drops off when there are
    many tasks per worker (> 25), and I am wondering whether we should focus on
    making our topology simpler by consolidating bolts.
    (current config: 30 bolts, 300 tasks, 300 executors, and 64 workers on 4
    supervisors)

    Has anyone had a similar experience?
    Regards,
    Ed


    --
    Twitter: @nathanmarz
    http://nathanmarz.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupstorm-user @
postedDec 11, '12 at 3:24p
activeDec 11, '12 at 8:20p
posts3
users3
websitestorm-project.net
irc#storm-user

People

Translate

site design / logo © 2022 Grokbase