FAQ
Hello!

I recently started using storm, managed to get a few basic topologies to
work, but I have a few general questions to ask about tuning performance
and discovering bottlenecks in the system.

Let's say we have the following contrived scenario...with the wordCount
topology:

100000 messages a second are being passed into storm from a kafka spout,
where each message is a sentence.

The bolts are the examples from storm-starter: splitsentence and wordcount.

So let's say we assigned a parallelism of 5 to kafkaspout, 5 to
splitsentence, and 5 to wordcount. Let's also say hypothetically that we're
now achieving a throughput of only 50000 messages. My question is:

What's the best practice for figuring out where the bottleneck lies?

The approach I thought of was: max out the parallelism of each step, and
then decrease them one by one to see what the required parallelism is for
each step. For example: set the parallelism to 100 for each spout/bolt.
Lower the parallelism of kafkaspout until we see the throughput drop below
100k, that's the minimum parallelism required for kafkaspout. Bring the
parallelism back to 100. Now lower the parallelism of splitsentence until
we see throughput drop below 100k, that's the minimum parallelism required
for splitsentence. etc., etc. until we've figured out the minimum
parallelism for each step.

Is this the right way to go about doing things?

As a corollary to this:

I know this is somewhat dependent on a number of factors, but how many
tuples per second could we generally expect a spout instance to emit? Let's
say that the spout is just emitting a single sentence from memory [the same
sentence] as quickly as it can. How many tuples per second is it capable
of emitting? What would be the ideal number of spout instances per worker
machine? Again, I understand this is very dependent on a number of things
[hardware...], but any general advice/numbers would be very helpful.

As a last quality of life question:

Is it possible to disable the startup and shutdown messages when running a
cluster in local mode? I can disable the emit messages through the debug
parameter, but there are still a number of messages that appear, and it
would be nice to disable them.

Thanks for your time and help!

--
You received this message because you are subscribed to the Google Groups "storm-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Search Discussions

  • Philip O'Toole at Mar 21, 2013 at 10:40 pm

    On Thu, Mar 21, 2013 at 10:15 AM, Justin Yan wrote:
    Hello!
    As a corollary to this:

    I know this is somewhat dependent on a number of factors, but how many
    tuples per second could we generally expect a spout instance to emit? Let's
    say that the spout is just emitting a single sentence from memory [the same
    sentence] as quickly as it can. How many tuples per second is it capable of
    emitting?
    Well, that is an easy question to answer. Simply bring up a topology
    where the spout isn't connected to anything downstream and record how
    many times per second nexttuple() is getting called. That will give
    you the upper limit.

    Philip

    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Jason Jackson at Mar 23, 2013 at 2:56 am

    On Thursday, 21 March 2013 10:15:32 UTC-7, Justin Yan wrote:
    Hello!

    I recently started using storm, managed to get a few basic topologies to
    work, but I have a few general questions to ask about tuning performance
    and discovering bottlenecks in the system.

    Let's say we have the following contrived scenario...with the wordCount
    topology:

    100000 messages a second are being passed into storm from a kafka spout,
    where each message is a sentence.

    The bolts are the examples from storm-starter: splitsentence and wordcount.

    So let's say we assigned a parallelism of 5 to kafkaspout, 5 to
    splitsentence, and 5 to wordcount. Let's also say hypothetically that we're
    now achieving a throughput of only 50000 messages. My question is:

    What's the best practice for figuring out where the bottleneck lies?
    *capacity metric in 0.9.0 storm aids a lot in discovering bottlenecks. *
    *make sure setNumAckers > 1*
    *play with set max spout pending *
    *trial and error is also a good way, replace each bolt with one that does
    nothing but emit tuples. *
    The approach I thought of was: max out the parallelism of each step, and
    then decrease them one by one to see what the required parallelism is for
    each step. For example: set the parallelism to 100 for each spout/bolt.
    Lower the parallelism of kafkaspout until we see the throughput drop below
    100k, that's the minimum parallelism required for kafkaspout. Bring the
    parallelism back to 100. Now lower the parallelism of splitsentence until
    we see throughput drop below 100k, that's the minimum parallelism required
    for splitsentence. etc., etc. until we've figured out the minimum
    parallelism for each step.

    Is this the right way to go about doing things?

    As a corollary to this:

    I know this is somewhat dependent on a number of factors, but how many
    tuples per second could we generally expect a spout instance to emit? Let's
    say that the spout is just emitting a single sentence from memory [the same
    sentence] as quickly as it can. How many tuples per second is it capable
    of emitting? What would be the ideal number of spout instances per worker
    machine? Again, I understand this is very dependent on a number of things
    [hardware...], but any general advice/numbers would be very helpful.
    *i've seen a benchmark get 1M msg/sec per node, you can easily do this test
    yourself as other commenter pointed out. *
    As a last quality of life question:

    Is it possible to disable the startup and shutdown messages when running a
    cluster in local mode? I can disable the emit messages through the debug
    parameter, but there are still a number of messages that appear, and it
    would be nice to disable them.
    Thanks for your time and help!
    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupstorm-user @
postedMar 21, '13 at 9:44p
activeMar 23, '13 at 2:56a
posts3
users3
websitestorm-project.net
irc#storm-user

People

Translate

site design / logo © 2021 Grokbase