FAQ
Hi,

I have this problem on worker, that I experienced Java heap memory problem.
I have this settings that on every supervisor we have 4 workers that has
7GB Memory each, then each work has 3 executor with 1 task each.

Everytime we pushed that cluster we come up with Java Heap Memory problem
on one of the bolts and same bolt.

Below is the OOM stack.

java.lang.OutOfMemoryError: Java heap space
at com.esotericsoftware.kryo.io.Output.toBytes(Output.java:103)
at
backtype.storm.serialization.KryoTupleSerializer.serialize(KryoTupleSerializer.java:28)
at
backtype.storm.daemon.worker$mk_transfer_fn$fn__4128$fn__4132.invoke(worker.clj:99)
at backtype.storm.util$fast_list_map.invoke(util.clj:771)
at
backtype.storm.daemon.worker$mk_transfer_fn$fn__4128.invoke(worker.clj:99)
at
backtype.storm.daemon.executor$start_batch_transfer__GT_worker_handler_BANG_$fn__3905.invoke(executor.clj:208)
at
backtype.storm.disruptor$clojure_handler$reify__1585.onEvent(disruptor.clj:43)
at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:81)
at
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:55)
at
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:56)
at
backtype.storm.disruptor$consume_loop_STAR_$fn__1597.invoke(disruptor.clj:67)
at backtype.storm.util$async_loop$fn__465.invoke(util.clj:377)
at clojure.lang.AFn.run(AFn.java:24)
at java.lang.Thread.run(Thread.java:636)


Hoping someone can give me idea why these errors are happening and if I
missed something with the settings I have on the topology.


Thanks,

Vincent

Search Discussions

  • Michael Rose at Dec 10, 2012 at 9:27 pm
    Please run your cluster with -XX:+HeapDumpOnOutOfMemoryError on the
    supervisor nodes (and ensure you have the disk space for a 7gb+ heap dump
    file). Then you can use jhat<http://docs.oracle.com/javase/6/docs/technotes/tools/share/jhat.html>to explore your topology and figure out what's taking up all the memory to
    help Nathan & co. to figure out your problem.

    --
    Michael Rose (@Xorlev <https://twitter.com/xorlev>)
    Senior Platform Engineer, FullContact <http://www.fullcontact.com>
    michael@fullcontact.com
    On Friday, December 7, 2012 7:42:49 PM UTC-7, Vinz wrote:

    Hi,

    I have this problem on worker, that I experienced Java heap memory
    problem. I have this settings that on every supervisor we have 4 workers
    that has 7GB Memory each, then each work has 3 executor with 1 task each.

    Everytime we pushed that cluster we come up with Java Heap Memory problem
    on one of the bolts and same bolt.

    Below is the OOM stack.

    java.lang.OutOfMemoryError: Java heap space
    at com.esotericsoftware.kryo.io.Output.toBytes(Output.java:103)
    at
    backtype.storm.serialization.KryoTupleSerializer.serialize(KryoTupleSerializer.java:28)
    at
    backtype.storm.daemon.worker$mk_transfer_fn$fn__4128$fn__4132.invoke(worker.clj:99)
    at backtype.storm.util$fast_list_map.invoke(util.clj:771)
    at
    backtype.storm.daemon.worker$mk_transfer_fn$fn__4128.invoke(worker.clj:99)
    at
    backtype.storm.daemon.executor$start_batch_transfer__GT_worker_handler_BANG_$fn__3905.invoke(executor.clj:208)
    at
    backtype.storm.disruptor$clojure_handler$reify__1585.onEvent(disruptor.clj:43)
    at
    backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:81)
    at
    backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:55)
    at
    backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:56)
    at
    backtype.storm.disruptor$consume_loop_STAR_$fn__1597.invoke(disruptor.clj:67)
    at backtype.storm.util$async_loop$fn__465.invoke(util.clj:377)
    at clojure.lang.AFn.run(AFn.java:24)
    at java.lang.Thread.run(Thread.java:636)


    Hoping someone can give me idea why these errors are happening and if I
    missed something with the settings I have on the topology.


    Thanks,

    Vincent


  • Christian Nardi at Dec 11, 2012 at 7:14 pm
    I can tell you the two main problems that I've had with memory:

    * if you use BaseRichBolt, you need to do tuple ack. This is not
    documented neither in the interface (IRichBolt) nor in the abstract class
    (BaseRichBolt) but is documented in the project wiki.
    * If you use automatic ack (maybe using a BaseBasicBolt), then some bolts
    can emit a lot of tuples, and the next ones can take a time to process
    them, so all those tuples are maintained in memory. This is really
    difficult to find. Maybe you can think of a monitoring method so you can
    see the "internal" queues size, and find out which bolts are the ones that
    are causing the problem.

    Hope it helps
    Christian
    On Friday, December 7, 2012 11:42:49 PM UTC-3, Vinz wrote:

    Hi,

    I have this problem on worker, that I experienced Java heap memory
    problem. I have this settings that on every supervisor we have 4 workers
    that has 7GB Memory each, then each work has 3 executor with 1 task each.

    Everytime we pushed that cluster we come up with Java Heap Memory problem
    on one of the bolts and same bolt.

    Below is the OOM stack.

    java.lang.OutOfMemoryError: Java heap space
    at com.esotericsoftware.kryo.io.Output.toBytes(Output.java:103)
    at
    backtype.storm.serialization.KryoTupleSerializer.serialize(KryoTupleSerializer.java:28)
    at
    backtype.storm.daemon.worker$mk_transfer_fn$fn__4128$fn__4132.invoke(worker.clj:99)
    at backtype.storm.util$fast_list_map.invoke(util.clj:771)
    at
    backtype.storm.daemon.worker$mk_transfer_fn$fn__4128.invoke(worker.clj:99)
    at
    backtype.storm.daemon.executor$start_batch_transfer__GT_worker_handler_BANG_$fn__3905.invoke(executor.clj:208)
    at
    backtype.storm.disruptor$clojure_handler$reify__1585.onEvent(disruptor.clj:43)
    at
    backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:81)
    at
    backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:55)
    at
    backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:56)
    at
    backtype.storm.disruptor$consume_loop_STAR_$fn__1597.invoke(disruptor.clj:67)
    at backtype.storm.util$async_loop$fn__465.invoke(util.clj:377)
    at clojure.lang.AFn.run(AFn.java:24)
    at java.lang.Thread.run(Thread.java:636)


    Hoping someone can give me idea why these errors are happening and if I
    missed something with the settings I have on the topology.


    Thanks,

    Vincent


  • Vinz at Dec 12, 2012 at 7:44 pm
    Hi,

    Is this also true with Trident Topology?

    Thanks.
    On Tuesday, December 11, 2012 11:14:01 AM UTC-8, Christian Nardi wrote:

    I can tell you the two main problems that I've had with memory:

    * if you use BaseRichBolt, you need to do tuple ack. This is not
    documented neither in the interface (IRichBolt) nor in the abstract class
    (BaseRichBolt) but is documented in the project wiki.
    * If you use automatic ack (maybe using a BaseBasicBolt), then some bolts
    can emit a lot of tuples, and the next ones can take a time to process
    them, so all those tuples are maintained in memory. This is really
    difficult to find. Maybe you can think of a monitoring method so you can
    see the "internal" queues size, and find out which bolts are the ones that
    are causing the problem.

    Hope it helps
    Christian
    On Friday, December 7, 2012 11:42:49 PM UTC-3, Vinz wrote:

    Hi,

    I have this problem on worker, that I experienced Java heap memory
    problem. I have this settings that on every supervisor we have 4 workers
    that has 7GB Memory each, then each work has 3 executor with 1 task each.

    Everytime we pushed that cluster we come up with Java Heap Memory
    problem on one of the bolts and same bolt.

    Below is the OOM stack.

    java.lang.OutOfMemoryError: Java heap space
    at com.esotericsoftware.kryo.io.Output.toBytes(Output.java:103)
    at
    backtype.storm.serialization.KryoTupleSerializer.serialize(KryoTupleSerializer.java:28)
    at
    backtype.storm.daemon.worker$mk_transfer_fn$fn__4128$fn__4132.invoke(worker.clj:99)
    at backtype.storm.util$fast_list_map.invoke(util.clj:771)
    at
    backtype.storm.daemon.worker$mk_transfer_fn$fn__4128.invoke(worker.clj:99)
    at
    backtype.storm.daemon.executor$start_batch_transfer__GT_worker_handler_BANG_$fn__3905.invoke(executor.clj:208)
    at
    backtype.storm.disruptor$clojure_handler$reify__1585.onEvent(disruptor.clj:43)
    at
    backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:81)
    at
    backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:55)
    at
    backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:56)
    at
    backtype.storm.disruptor$consume_loop_STAR_$fn__1597.invoke(disruptor.clj:67)
    at backtype.storm.util$async_loop$fn__465.invoke(util.clj:377)
    at clojure.lang.AFn.run(AFn.java:24)
    at java.lang.Thread.run(Thread.java:636)


    Hoping someone can give me idea why these errors are happening and if I
    missed something with the settings I have on the topology.


    Thanks,

    Vincent


  • Christian Nardi at Dec 12, 2012 at 7:57 pm
    I don't know because I've never used a Trident Topology, I'm sorry...

    On Wednesday, December 12, 2012 4:44:06 PM UTC-3, Vinz wrote:

    Hi,

    Is this also true with Trident Topology?

    Thanks.
    On Tuesday, December 11, 2012 11:14:01 AM UTC-8, Christian Nardi wrote:

    I can tell you the two main problems that I've had with memory:

    * if you use BaseRichBolt, you need to do tuple ack. This is not
    documented neither in the interface (IRichBolt) nor in the abstract class
    (BaseRichBolt) but is documented in the project wiki.
    * If you use automatic ack (maybe using a BaseBasicBolt), then some
    bolts can emit a lot of tuples, and the next ones can take a time to
    process them, so all those tuples are maintained in memory. This is really
    difficult to find. Maybe you can think of a monitoring method so you can
    see the "internal" queues size, and find out which bolts are the ones that
    are causing the problem.

    Hope it helps
    Christian
    On Friday, December 7, 2012 11:42:49 PM UTC-3, Vinz wrote:

    Hi,

    I have this problem on worker, that I experienced Java heap memory
    problem. I have this settings that on every supervisor we have 4 workers
    that has 7GB Memory each, then each work has 3 executor with 1 task each.

    Everytime we pushed that cluster we come up with Java Heap Memory
    problem on one of the bolts and same bolt.

    Below is the OOM stack.

    java.lang.OutOfMemoryError: Java heap space
    at com.esotericsoftware.kryo.io.Output.toBytes(Output.java:103)
    at
    backtype.storm.serialization.KryoTupleSerializer.serialize(KryoTupleSerializer.java:28)
    at
    backtype.storm.daemon.worker$mk_transfer_fn$fn__4128$fn__4132.invoke(worker.clj:99)
    at backtype.storm.util$fast_list_map.invoke(util.clj:771)
    at
    backtype.storm.daemon.worker$mk_transfer_fn$fn__4128.invoke(worker.clj:99)
    at
    backtype.storm.daemon.executor$start_batch_transfer__GT_worker_handler_BANG_$fn__3905.invoke(executor.clj:208)
    at
    backtype.storm.disruptor$clojure_handler$reify__1585.onEvent(disruptor.clj:43)
    at
    backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:81)
    at
    backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:55)
    at
    backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:56)
    at
    backtype.storm.disruptor$consume_loop_STAR_$fn__1597.invoke(disruptor.clj:67)
    at backtype.storm.util$async_loop$fn__465.invoke(util.clj:377)
    at clojure.lang.AFn.run(AFn.java:24)
    at java.lang.Thread.run(Thread.java:636)


    Hoping someone can give me idea why these errors are happening and if I
    missed something with the settings I have on the topology.


    Thanks,

    Vincent


  • Balázs Kossovics at Dec 13, 2012 at 5:44 am
    We also had OOM errors with our test topology using OpaqueTridentKafkaSpout
    when I set the max pending spout limit to 1000000 out of curiosity.
    Running jhat on the memory dump showed a massive amount
    of TransactionAttempt object.

    On 12 December 2012 20:44, Vinz wrote:

    Hi,

    Is this also true with Trident Topology?

    Thanks.

    On Tuesday, December 11, 2012 11:14:01 AM UTC-8, Christian Nardi wrote:

    I can tell you the two main problems that I've had with memory:

    * if you use BaseRichBolt, you need to do tuple ack. This is not
    documented neither in the interface (IRichBolt) nor in the abstract class
    (BaseRichBolt) but is documented in the project wiki.
    * If you use automatic ack (maybe using a BaseBasicBolt), then some
    bolts can emit a lot of tuples, and the next ones can take a time to
    process them, so all those tuples are maintained in memory. This is really
    difficult to find. Maybe you can think of a monitoring method so you can
    see the "internal" queues size, and find out which bolts are the ones that
    are causing the problem.

    Hope it helps
    Christian
    On Friday, December 7, 2012 11:42:49 PM UTC-3, Vinz wrote:

    Hi,

    I have this problem on worker, that I experienced Java heap memory
    problem. I have this settings that on every supervisor we have 4 workers
    that has 7GB Memory each, then each work has 3 executor with 1 task each.

    Everytime we pushed that cluster we come up with Java Heap Memory
    problem on one of the bolts and same bolt.

    Below is the OOM stack.

    java.lang.OutOfMemoryError: Java heap space
    at com.esotericsoftware.kryo.io.**Output.toBytes(Output.java:**103)
    at backtype.storm.serialization.**KryoTupleSerializer.serialize(*
    *KryoTupleSerializer.java:28)
    at backtype.storm.daemon.worker$**mk_transfer_fn$fn__4128$fn__**
    4132.invoke(worker.clj:99)
    at backtype.storm.util$fast_list_**map.invoke(util.clj:771)
    at backtype.storm.daemon.worker$**mk_transfer_fn$fn__4128.**
    invoke(worker.clj:99)
    at backtype.storm.daemon.**executor$start_batch_transfer_**
    _GT_worker_handler_BANG_$fn__**3905.invoke(executor.clj:208)
    at backtype.storm.disruptor$**clojure_handler$reify__1585.**
    onEvent(disruptor.clj:43)
    at backtype.storm.utils.**DisruptorQueue.**consumeBatchToCursor(**
    DisruptorQueue.java:81)
    at backtype.storm.utils.**DisruptorQueue.**consumeBatchWhenAvailable(**
    DisruptorQueue.java:55)
    at backtype.storm.disruptor$**consume_batch_when_available.**
    invoke(disruptor.clj:56)
    at backtype.storm.disruptor$**consume_loop_STAR_$fn__1597.**
    invoke(disruptor.clj:67)
    at backtype.storm.util$async_**loop$fn__465.invoke(util.clj:**377)
    at clojure.lang.AFn.run(AFn.java:**24)
    at java.lang.Thread.run(Thread.**java:636)


    Hoping someone can give me idea why these errors are happening and if I
    missed something with the settings I have on the topology.


    Thanks,

    Vincent


Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupstorm-user @
postedDec 8, '12 at 2:42a
activeDec 13, '12 at 5:44a
posts6
users4
websitestorm-project.net
irc#storm-user

People

Translate

site design / logo © 2022 Grokbase