FAQ
Hi,

We've been running several topologies on Storm successfully for awhile now.
Recently, as message volume has increased, we've begun to notice a higher
than expected CPU burn from the workers.

Originally I thought it was due to code within our topology, however even
when I ack tuples immediately at the first bolt in our topology (and don't
propagate them throughout the topology) the high CPU usage remains. This
leads me to believe that somewhere in our Kafka Spout or storm library
there is a lot of CPU burn.

We are using the Kafka Spout with storm 0.8.2 running on the 1.6.41 Oracle
JVM. Our topology config looks like:

num_workers => 3
num_ackers => 3
TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE => 16384
TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE => 16384
TOPOLOGY_RECEIVER_BUFFER_SIZE => 8
TOPOLOGY_TRANSFER_BUFFER_SIZE => 32
spout_pending => 300000

ZeroMQ is 2.1.7
jzmq is from nathan's fork

Pushing about 5,000 msgs/sec at an average message size of 150 bytes,
through a single Kafka partition (on a single host), we consume almost an
entire c1.xlarge worth of CPU when the topology is spread across three
c1.xlarges. The highest CPU across the nodes is over 50% CPU.

Connecting VisualVM to the worker consuming the most CPU and running the
CPU sampling shows that we spend the most CPU time in:

org.apache.zookeeper.ClientCnxn$SendThread.run()
and
org.zeromq.ZMQ$Socket.recv[native]()

I don't have profiling results though.

Any idea on where we could be burning CPU? Is this level of CPU usage to be
expected in a Kafka Spout configuration? Any of our configuration variables
likely to be worsening the usage? The buffer sizes were taken from the
presentation Nathan gave at Ooyala for a high-throughput topology.

The CPU usage does seem to scale upwards with our volume, so I'm trying to
identify the bottlenecks so that we can scale this further.


Thanks!

Mike

--

Mike Heffner <[email protected]>
Librato, Inc.

--
You received this message because you are subscribed to the Google Groups "storm-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Search Discussions

  • Nathan Marz at Mar 7, 2013 at 2:51 am
    The zmq recv function there is a blocking operation, so I don't think
    you're measuring CPU there. I've found YourKit helpful for profiling. Also
    be sure to check the capacity/execute latency statistics in 0.8.2.
    On Wed, Mar 6, 2013 at 2:19 PM, Mike Heffner wrote:

    Hi,

    We've been running several topologies on Storm successfully for awhile
    now. Recently, as message volume has increased, we've begun to notice a
    higher than expected CPU burn from the workers.

    Originally I thought it was due to code within our topology, however even
    when I ack tuples immediately at the first bolt in our topology (and don't
    propagate them throughout the topology) the high CPU usage remains. This
    leads me to believe that somewhere in our Kafka Spout or storm library
    there is a lot of CPU burn.

    We are using the Kafka Spout with storm 0.8.2 running on the 1.6.41 Oracle
    JVM. Our topology config looks like:

    num_workers => 3
    num_ackers => 3
    TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE => 16384
    TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE => 16384
    TOPOLOGY_RECEIVER_BUFFER_SIZE => 8
    TOPOLOGY_TRANSFER_BUFFER_SIZE => 32
    spout_pending => 300000

    ZeroMQ is 2.1.7
    jzmq is from nathan's fork

    Pushing about 5,000 msgs/sec at an average message size of 150 bytes,
    through a single Kafka partition (on a single host), we consume almost an
    entire c1.xlarge worth of CPU when the topology is spread across three
    c1.xlarges. The highest CPU across the nodes is over 50% CPU.

    Connecting VisualVM to the worker consuming the most CPU and running the
    CPU sampling shows that we spend the most CPU time in:

    org.apache.zookeeper.ClientCnxn$SendThread.run()
    and
    org.zeromq.ZMQ$Socket.recv[native]()

    I don't have profiling results though.

    Any idea on where we could be burning CPU? Is this level of CPU usage to
    be expected in a Kafka Spout configuration? Any of our configuration
    variables likely to be worsening the usage? The buffer sizes were taken
    from the presentation Nathan gave at Ooyala for a high-throughput topology.

    The CPU usage does seem to scale upwards with our volume, so I'm trying to
    identify the bottlenecks so that we can scale this further.


    Thanks!

    Mike

    --

    Mike Heffner <[email protected]>
    Librato, Inc.

    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to [email protected].
    For more options, visit https://groups.google.com/groups/opt_out.



    --
    Twitter: @nathanmarz
    http://nathanmarz.com

    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
    For more options, visit https://groups.google.com/groups/opt_out.
  • Mike Heffner at Mar 16, 2013 at 4:50 pm
    Nathan,

    I connected YourKit to the topology and profiling shows that about 49% of
    CPU time is spent in java.util.concurrent.locks.ReentrantLock.unlock():

    [image: Inline image 1]

    Is this typical of a topology running at several thousand tuples/second?
    Anything that could be done to reduce usage here?

    Is it correct to assume that Trident would reduce this overhead by batching
    multiple messages into a single tuple, thereby reducing inter-thread
    messaging/locking?

    Thanks,

    Mike
    On Wed, Mar 6, 2013 at 9:50 PM, Nathan Marz wrote:

    The zmq recv function there is a blocking operation, so I don't think
    you're measuring CPU there. I've found YourKit helpful for profiling. Also
    be sure to check the capacity/execute latency statistics in 0.8.2.
    On Wed, Mar 6, 2013 at 2:19 PM, Mike Heffner wrote:

    Hi,

    We've been running several topologies on Storm successfully for awhile
    now. Recently, as message volume has increased, we've begun to notice a
    higher than expected CPU burn from the workers.

    Originally I thought it was due to code within our topology, however even
    when I ack tuples immediately at the first bolt in our topology (and don't
    propagate them throughout the topology) the high CPU usage remains. This
    leads me to believe that somewhere in our Kafka Spout or storm library
    there is a lot of CPU burn.

    We are using the Kafka Spout with storm 0.8.2 running on the 1.6.41
    Oracle JVM. Our topology config looks like:

    num_workers => 3
    num_ackers => 3
    TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE => 16384
    TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE => 16384
    TOPOLOGY_RECEIVER_BUFFER_SIZE => 8
    TOPOLOGY_TRANSFER_BUFFER_SIZE => 32
    spout_pending => 300000

    ZeroMQ is 2.1.7
    jzmq is from nathan's fork

    Pushing about 5,000 msgs/sec at an average message size of 150 bytes,
    through a single Kafka partition (on a single host), we consume almost an
    entire c1.xlarge worth of CPU when the topology is spread across three
    c1.xlarges. The highest CPU across the nodes is over 50% CPU.

    Connecting VisualVM to the worker consuming the most CPU and running the
    CPU sampling shows that we spend the most CPU time in:

    org.apache.zookeeper.ClientCnxn$SendThread.run()
    and
    org.zeromq.ZMQ$Socket.recv[native]()

    I don't have profiling results though.

    Any idea on where we could be burning CPU? Is this level of CPU usage to
    be expected in a Kafka Spout configuration? Any of our configuration
    variables likely to be worsening the usage? The buffer sizes were taken
    from the presentation Nathan gave at Ooyala for a high-throughput topology.

    The CPU usage does seem to scale upwards with our volume, so I'm trying
    to identify the bottlenecks so that we can scale this further.


    Thanks!

    Mike

    --

    Mike Heffner <[email protected]>
    Librato, Inc.

    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to [email protected].
    For more options, visit https://groups.google.com/groups/opt_out.



    --
    Twitter: @nathanmarz
    http://nathanmarz.com

    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to [email protected].
    For more options, visit https://groups.google.com/groups/opt_out.



    --

    Mike Heffner <[email protected]>
    Librato, Inc.

    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
    For more options, visit https://groups.google.com/groups/opt_out.
  • Nathan Marz at Mar 17, 2013 at 4:36 am
    The actual CPU usage is happening in the bottom 2 functions in that
    screenshot. You should expand the tree more to see what's actually using
    the CPU.

    On Sat, Mar 16, 2013 at 9:50 AM, Mike Heffner wrote:

    Nathan,

    I connected YourKit to the topology and profiling shows that about 49% of
    CPU time is spent in java.util.concurrent.locks.ReentrantLock.unlock():

    [image: Inline image 1]

    Is this typical of a topology running at several thousand tuples/second?
    Anything that could be done to reduce usage here?

    Is it correct to assume that Trident would reduce this overhead by
    batching multiple messages into a single tuple, thereby reducing
    inter-thread messaging/locking?

    Thanks,

    Mike
    On Wed, Mar 6, 2013 at 9:50 PM, Nathan Marz wrote:

    The zmq recv function there is a blocking operation, so I don't think
    you're measuring CPU there. I've found YourKit helpful for profiling. Also
    be sure to check the capacity/execute latency statistics in 0.8.2.
    On Wed, Mar 6, 2013 at 2:19 PM, Mike Heffner wrote:

    Hi,

    We've been running several topologies on Storm successfully for awhile
    now. Recently, as message volume has increased, we've begun to notice a
    higher than expected CPU burn from the workers.

    Originally I thought it was due to code within our topology, however
    even when I ack tuples immediately at the first bolt in our topology (and
    don't propagate them throughout the topology) the high CPU usage remains.
    This leads me to believe that somewhere in our Kafka Spout or storm library
    there is a lot of CPU burn.

    We are using the Kafka Spout with storm 0.8.2 running on the 1.6.41
    Oracle JVM. Our topology config looks like:

    num_workers => 3
    num_ackers => 3
    TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE => 16384
    TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE => 16384
    TOPOLOGY_RECEIVER_BUFFER_SIZE => 8
    TOPOLOGY_TRANSFER_BUFFER_SIZE => 32
    spout_pending => 300000

    ZeroMQ is 2.1.7
    jzmq is from nathan's fork

    Pushing about 5,000 msgs/sec at an average message size of 150 bytes,
    through a single Kafka partition (on a single host), we consume almost an
    entire c1.xlarge worth of CPU when the topology is spread across three
    c1.xlarges. The highest CPU across the nodes is over 50% CPU.

    Connecting VisualVM to the worker consuming the most CPU and running the
    CPU sampling shows that we spend the most CPU time in:

    org.apache.zookeeper.ClientCnxn$SendThread.run()
    and
    org.zeromq.ZMQ$Socket.recv[native]()

    I don't have profiling results though.

    Any idea on where we could be burning CPU? Is this level of CPU usage to
    be expected in a Kafka Spout configuration? Any of our configuration
    variables likely to be worsening the usage? The buffer sizes were taken
    from the presentation Nathan gave at Ooyala for a high-throughput topology.

    The CPU usage does seem to scale upwards with our volume, so I'm trying
    to identify the bottlenecks so that we can scale this further.


    Thanks!

    Mike

    --

    Mike Heffner <[email protected]>
    Librato, Inc.

    --
    You received this message because you are subscribed to the Google
    Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to [email protected].
    For more options, visit https://groups.google.com/groups/opt_out.



    --
    Twitter: @nathanmarz
    http://nathanmarz.com

    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to [email protected].
    For more options, visit https://groups.google.com/groups/opt_out.



    --

    Mike Heffner <[email protected]>
    Librato, Inc.

    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to [email protected].
    For more options, visit https://groups.google.com/groups/opt_out.



    --
    Twitter: @nathanmarz
    http://nathanmarz.com

    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
    For more options, visit https://groups.google.com/groups/opt_out.
  • Mike Heffner at Mar 17, 2013 at 2:46 pm
    Nathan,

    So I believe that is the reverse callstack showing where the calls to
    Reentrant.unlock() originate from. When I continue following the call stack
    down it fans out to the various locations we emit() or ack() tuples across
    our topology. I believe that is all the locations that lead to work being
    added to the DisruptorQueue?

    This is a snapshot of the actual hotspots as detected by YourKit:

    [image: Inline image 1]

    I also tried switching to the SleepingWaitStrategy, but that led to a 2-3x
    overall increase in CPU usage, mainly spinning in the waitFor() methods.

    Cheers,

    Mike

    On Sun, Mar 17, 2013 at 12:36 AM, Nathan Marz wrote:

    The actual CPU usage is happening in the bottom 2 functions in that
    screenshot. You should expand the tree more to see what's actually using
    the CPU.

    On Sat, Mar 16, 2013 at 9:50 AM, Mike Heffner wrote:

    Nathan,

    I connected YourKit to the topology and profiling shows that about 49% of
    CPU time is spent in java.util.concurrent.locks.ReentrantLock.unlock():

    [image: Inline image 1]

    Is this typical of a topology running at several thousand tuples/second?
    Anything that could be done to reduce usage here?

    Is it correct to assume that Trident would reduce this overhead by
    batching multiple messages into a single tuple, thereby reducing
    inter-thread messaging/locking?

    Thanks,

    Mike
    On Wed, Mar 6, 2013 at 9:50 PM, Nathan Marz wrote:

    The zmq recv function there is a blocking operation, so I don't think
    you're measuring CPU there. I've found YourKit helpful for profiling. Also
    be sure to check the capacity/execute latency statistics in 0.8.2.
    On Wed, Mar 6, 2013 at 2:19 PM, Mike Heffner wrote:

    Hi,

    We've been running several topologies on Storm successfully for awhile
    now. Recently, as message volume has increased, we've begun to notice a
    higher than expected CPU burn from the workers.

    Originally I thought it was due to code within our topology, however
    even when I ack tuples immediately at the first bolt in our topology (and
    don't propagate them throughout the topology) the high CPU usage remains.
    This leads me to believe that somewhere in our Kafka Spout or storm library
    there is a lot of CPU burn.

    We are using the Kafka Spout with storm 0.8.2 running on the 1.6.41
    Oracle JVM. Our topology config looks like:

    num_workers => 3
    num_ackers => 3
    TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE => 16384
    TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE => 16384
    TOPOLOGY_RECEIVER_BUFFER_SIZE => 8
    TOPOLOGY_TRANSFER_BUFFER_SIZE => 32
    spout_pending => 300000

    ZeroMQ is 2.1.7
    jzmq is from nathan's fork

    Pushing about 5,000 msgs/sec at an average message size of 150 bytes,
    through a single Kafka partition (on a single host), we consume almost an
    entire c1.xlarge worth of CPU when the topology is spread across three
    c1.xlarges. The highest CPU across the nodes is over 50% CPU.

    Connecting VisualVM to the worker consuming the most CPU and running
    the CPU sampling shows that we spend the most CPU time in:

    org.apache.zookeeper.ClientCnxn$SendThread.run()
    and
    org.zeromq.ZMQ$Socket.recv[native]()

    I don't have profiling results though.

    Any idea on where we could be burning CPU? Is this level of CPU usage
    to be expected in a Kafka Spout configuration? Any of our configuration
    variables likely to be worsening the usage? The buffer sizes were taken
    from the presentation Nathan gave at Ooyala for a high-throughput topology.

    The CPU usage does seem to scale upwards with our volume, so I'm trying
    to identify the bottlenecks so that we can scale this further.


    Thanks!

    Mike

    --

    Mike Heffner <[email protected]>
    Librato, Inc.

    --
    You received this message because you are subscribed to the Google
    Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to [email protected].
    For more options, visit https://groups.google.com/groups/opt_out.



    --
    Twitter: @nathanmarz
    http://nathanmarz.com

    --
    You received this message because you are subscribed to the Google
    Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to [email protected].
    For more options, visit https://groups.google.com/groups/opt_out.



    --

    Mike Heffner <[email protected]>
    Librato, Inc.

    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to [email protected].
    For more options, visit https://groups.google.com/groups/opt_out.



    --
    Twitter: @nathanmarz
    http://nathanmarz.com

    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to [email protected].
    For more options, visit https://groups.google.com/groups/opt_out.



    --

    Mike Heffner <[email protected]>
    Librato, Inc.

    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
    For more options, visit https://groups.google.com/groups/opt_out.
  • Nathan Marz at Mar 17, 2013 at 11:11 pm
    I prefer looking at the per-thread usage and walking down the call stack
    till I find what's burning the CPU in each thread.
    On Sun, Mar 17, 2013 at 7:46 AM, Mike Heffner wrote:

    Nathan,

    So I believe that is the reverse callstack showing where the calls to
    Reentrant.unlock() originate from. When I continue following the call stack
    down it fans out to the various locations we emit() or ack() tuples across
    our topology. I believe that is all the locations that lead to work being
    added to the DisruptorQueue?

    This is a snapshot of the actual hotspots as detected by YourKit:

    [image: Inline image 1]

    I also tried switching to the SleepingWaitStrategy, but that led to a 2-3x
    overall increase in CPU usage, mainly spinning in the waitFor() methods.

    Cheers,

    Mike

    On Sun, Mar 17, 2013 at 12:36 AM, Nathan Marz wrote:

    The actual CPU usage is happening in the bottom 2 functions in that
    screenshot. You should expand the tree more to see what's actually using
    the CPU.

    On Sat, Mar 16, 2013 at 9:50 AM, Mike Heffner wrote:

    Nathan,

    I connected YourKit to the topology and profiling shows that about 49%
    of CPU time is spent in java.util.concurrent.locks.ReentrantLock.unlock():

    [image: Inline image 1]

    Is this typical of a topology running at several thousand tuples/second?
    Anything that could be done to reduce usage here?

    Is it correct to assume that Trident would reduce this overhead by
    batching multiple messages into a single tuple, thereby reducing
    inter-thread messaging/locking?

    Thanks,

    Mike
    On Wed, Mar 6, 2013 at 9:50 PM, Nathan Marz wrote:

    The zmq recv function there is a blocking operation, so I don't think
    you're measuring CPU there. I've found YourKit helpful for profiling. Also
    be sure to check the capacity/execute latency statistics in 0.8.2.
    On Wed, Mar 6, 2013 at 2:19 PM, Mike Heffner wrote:

    Hi,

    We've been running several topologies on Storm successfully for awhile
    now. Recently, as message volume has increased, we've begun to notice a
    higher than expected CPU burn from the workers.

    Originally I thought it was due to code within our topology, however
    even when I ack tuples immediately at the first bolt in our topology (and
    don't propagate them throughout the topology) the high CPU usage remains.
    This leads me to believe that somewhere in our Kafka Spout or storm library
    there is a lot of CPU burn.

    We are using the Kafka Spout with storm 0.8.2 running on the 1.6.41
    Oracle JVM. Our topology config looks like:

    num_workers => 3
    num_ackers => 3
    TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE => 16384
    TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE => 16384
    TOPOLOGY_RECEIVER_BUFFER_SIZE => 8
    TOPOLOGY_TRANSFER_BUFFER_SIZE => 32
    spout_pending => 300000

    ZeroMQ is 2.1.7
    jzmq is from nathan's fork

    Pushing about 5,000 msgs/sec at an average message size of 150 bytes,
    through a single Kafka partition (on a single host), we consume almost an
    entire c1.xlarge worth of CPU when the topology is spread across three
    c1.xlarges. The highest CPU across the nodes is over 50% CPU.

    Connecting VisualVM to the worker consuming the most CPU and running
    the CPU sampling shows that we spend the most CPU time in:

    org.apache.zookeeper.ClientCnxn$SendThread.run()
    and
    org.zeromq.ZMQ$Socket.recv[native]()

    I don't have profiling results though.

    Any idea on where we could be burning CPU? Is this level of CPU usage
    to be expected in a Kafka Spout configuration? Any of our configuration
    variables likely to be worsening the usage? The buffer sizes were taken
    from the presentation Nathan gave at Ooyala for a high-throughput topology.

    The CPU usage does seem to scale upwards with our volume, so I'm
    trying to identify the bottlenecks so that we can scale this further.


    Thanks!

    Mike

    --

    Mike Heffner <[email protected]>
    Librato, Inc.

    --
    You received this message because you are subscribed to the Google
    Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to [email protected].
    For more options, visit https://groups.google.com/groups/opt_out.



    --
    Twitter: @nathanmarz
    http://nathanmarz.com

    --
    You received this message because you are subscribed to the Google
    Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to [email protected].
    For more options, visit https://groups.google.com/groups/opt_out.



    --

    Mike Heffner <[email protected]>
    Librato, Inc.

    --
    You received this message because you are subscribed to the Google
    Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to [email protected].
    For more options, visit https://groups.google.com/groups/opt_out.



    --
    Twitter: @nathanmarz
    http://nathanmarz.com

    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to [email protected].
    For more options, visit https://groups.google.com/groups/opt_out.



    --

    Mike Heffner <[email protected]>
    Librato, Inc.

    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to [email protected].
    For more options, visit https://groups.google.com/groups/opt_out.



    --
    Twitter: @nathanmarz
    http://nathanmarz.com

    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
    For more options, visit https://groups.google.com/groups/opt_out.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupstorm-user @
postedMar 6, '13 at 10:19p
activeMar 17, '13 at 11:11p
posts6
users2
websitestorm-project.net
irc#storm-user

2 users in discussion

Mike Heffner: 3 posts Nathan Marz: 3 posts

People

Translate

site design / logo © 2023 Grokbase