On Thursday, 21 March 2013 10:15:32 UTC-7, Justin Yan wrote:

I recently started using storm, managed to get a few basic topologies to
work, but I have a few general questions to ask about tuning performance
and discovering bottlenecks in the system.

Let's say we have the following contrived scenario...with the wordCount

100000 messages a second are being passed into storm from a kafka spout,
where each message is a sentence.

The bolts are the examples from storm-starter: splitsentence and wordcount.

So let's say we assigned a parallelism of 5 to kafkaspout, 5 to
splitsentence, and 5 to wordcount. Let's also say hypothetically that we're
now achieving a throughput of only 50000 messages. My question is:

What's the best practice for figuring out where the bottleneck lies?
*capacity metric in 0.9.0 storm aids a lot in discovering bottlenecks. *
*make sure setNumAckers > 1*
*play with set max spout pending *
*trial and error is also a good way, replace each bolt with one that does
nothing but emit tuples. *
The approach I thought of was: max out the parallelism of each step, and
then decrease them one by one to see what the required parallelism is for
each step. For example: set the parallelism to 100 for each spout/bolt.
Lower the parallelism of kafkaspout until we see the throughput drop below
100k, that's the minimum parallelism required for kafkaspout. Bring the
parallelism back to 100. Now lower the parallelism of splitsentence until
we see throughput drop below 100k, that's the minimum parallelism required
for splitsentence. etc., etc. until we've figured out the minimum
parallelism for each step.

Is this the right way to go about doing things?

As a corollary to this:

I know this is somewhat dependent on a number of factors, but how many
tuples per second could we generally expect a spout instance to emit? Let's
say that the spout is just emitting a single sentence from memory [the same
sentence] as quickly as it can. How many tuples per second is it capable
of emitting? What would be the ideal number of spout instances per worker
machine? Again, I understand this is very dependent on a number of things
[hardware...], but any general advice/numbers would be very helpful.
*i've seen a benchmark get 1M msg/sec per node, you can easily do this test
yourself as other commenter pointed out. *
As a last quality of life question:

Is it possible to disable the startup and shutdown messages when running a
cluster in local mode? I can disable the emit messages through the debug
parameter, but there are still a number of messages that appear, and it
would be nice to disable them.
Thanks for your time and help!
You received this message because you are subscribed to the Google Groups "storm-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Search Discussions

Discussion Posts


Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 3 | next ›
Discussion Overview
groupstorm-user @
postedMar 21, '13 at 9:44p
activeMar 23, '13 at 2:56a



site design / logo © 2021 Grokbase