FAQ
In theory, this assert can occur when ZMQ scoket.send is called after the ZMQ socket instance is released.

As far as I know, the Storm doesn't release the instance explicitly. I suspect JNI memory is screwed up. Try setting zmq.hwm = 1000 or so and see if the same assert occurs.

Thanks
Min

On Dec 5, 2012, at 6:43 AM, Dane Hammer wrote:

So this issue has become far more common for us. I'm experiencing it in two different scenarios, in two different environments. One appears to relate to when a topology gets redeployed, one appears to be when an unrelated connection is dropped.

I cannot recreate it deliberately, so what could I do to try and trap / catch / identify what code is responsible for the crash?

On Thursday, August 30, 2012 1:13:34 AM UTC-5, nathanmarz wrote:
I have no idea. It sounds pretty obviously related to your classpath issues. I haven't ever seen anything like this. If your issue is solved, then I'm not sure what exactly you're looking for.

On Tue, Aug 28, 2012 at 2:46 PM, Dane Hammer wrote:
We recently upgraded to storm 0.8.0 and we experience a new issue when we had classpath issues. With a new version of a dependency on the classpath causing a java.lang.VerifyError in our code, we would see the expected: storm brought down the JVM that experienced the error and started again. This however exposed another issue: zeromq would blow up and we kept running the disk out of space with core dump files. This was where the core dump file took us:

(gdb) where
#0 0x00000031de232885 in raise () from /lib64/libc.so.6
#1 0x00000031de234065 in abort () from /lib64/libc.so.6
#2 0x00000031de22b9fe in __assert_fail_base () from /lib64/libc.so.6
#3 0x00000031de22bac0 in __assert_fail () from /lib64/libc.so.6
#4 0x00007fa4cd69fbc2 in get_socket (env=0x7fa4d88271d0, obj=0x7fa48f1f0688, do_assert=1) at Socket.cpp:543
#5 0x00007fa4cd69fe85 in Java_org_zeromq_ZMQ_00024Socket_send (env=0x7fa4d88271d0, obj=<value optimized out>, msg=0x7fa48f1f0680, flags=2) at Socket.cpp:365

I found a few posts in zeromq related threads online, but it was all questions, no answers. It feels like a potential race condition with the JVM shutting down and something trying to re-use that socket?

The interesting part is when we resolved our classpath issue we quit seeing the issue. Thoughts?




--
Twitter: @nathanmarz
http://nathanmarz.com

Search Discussions

  • Dane Hammer at Dec 5, 2012 at 11:11 pm
    I'm having a hard time figuring out where I configure that. Is that for the
    zeromq code? JZMQ? storm?
    On Wednesday, December 5, 2012 1:14:35 AM UTC-6, Mini wrote:

    In theory, this assert can occur when ZMQ scoket.send is called after the
    ZMQ socket instance is released.

    As far as I know, the Storm doesn't release the instance explicitly. I
    suspect JNI memory is screwed up. Try setting zmq.hwm = 1000 or so and see
    if the same assert occurs.

    Thanks
    Min


    On Dec 5, 2012, at 6:43 AM, Dane Hammer <dane.m...@gmail.com <javascript:>>
    wrote:

    So this issue has become far more common for us. I'm experiencing it in
    two different scenarios, in two different environments. One appears to
    relate to when a topology gets redeployed, one appears to be when an
    unrelated connection is dropped.

    I cannot recreate it deliberately, so what could I do to try and trap /
    catch / identify what code is responsible for the crash?
    On Thursday, August 30, 2012 1:13:34 AM UTC-5, nathanmarz wrote:

    I have no idea. It sounds pretty obviously related to your classpath
    issues. I haven't ever seen anything like this. If your issue is solved,
    then I'm not sure what exactly you're looking for.
    On Tue, Aug 28, 2012 at 2:46 PM, Dane Hammer wrote:

    We recently upgraded to storm 0.8.0 and we experience a new issue when
    we had classpath issues. With a new version of a dependency on the
    classpath causing a java.lang.VerifyError in our code, we would see the
    expected: storm brought down the JVM that experienced the error and started
    again. This however exposed another issue: zeromq would blow up and we kept
    running the disk out of space with core dump files. This was where the core
    dump file took us:

    (gdb) where
    #0 0x00000031de232885 in raise () from /lib64/libc.so.6
    #1 0x00000031de234065 in abort () from /lib64/libc.so.6
    #2 0x00000031de22b9fe in __assert_fail_base () from /lib64/libc.so.6
    #3 0x00000031de22bac0 in __assert_fail () from /lib64/libc.so.6
    #4 0x00007fa4cd69fbc2 in get_socket (env=0x7fa4d88271d0,
    obj=0x7fa48f1f0688, do_assert=1) at Socket.cpp:543
    #5 0x00007fa4cd69fe85 in Java_org_zeromq_ZMQ_00024Socket_send
    (env=0x7fa4d88271d0, obj=<value optimized out>, msg=0x7fa48f1f0680,
    flags=2) at Socket.cpp:365

    I found a few posts in zeromq related threads online, but it was all
    questions, no answers. It feels like a potential race condition with the
    JVM shutting down and something trying to re-use that socket?

    The interesting part is when we resolved our classpath issue we quit
    seeing the issue. Thoughts?


    --
    Twitter: @nathanmarz
    http://nathanmarz.com
  • Dane Hammer at Dec 5, 2012 at 11:22 pm
    Oh sorry, found it. The master branch of storm has that value in the
    Config. We're on storm 0.8.1, not sure how many things would be affected by
    trying to uplift right now. I'll see if I can reproduce what storm is doing
    with that value.
    On Wednesday, December 5, 2012 5:11:22 PM UTC-6, Dane Hammer wrote:

    I'm having a hard time figuring out where I configure that. Is that for
    the zeromq code? JZMQ? storm?
    On Wednesday, December 5, 2012 1:14:35 AM UTC-6, Mini wrote:

    In theory, this assert can occur when ZMQ scoket.send is called after the
    ZMQ socket instance is released.

    As far as I know, the Storm doesn't release the instance explicitly. I
    suspect JNI memory is screwed up. Try setting zmq.hwm = 1000 or so and see
    if the same assert occurs.

    Thanks
    Min


    On Dec 5, 2012, at 6:43 AM, Dane Hammer wrote:

    So this issue has become far more common for us. I'm experiencing it in
    two different scenarios, in two different environments. One appears to
    relate to when a topology gets redeployed, one appears to be when an
    unrelated connection is dropped.

    I cannot recreate it deliberately, so what could I do to try and trap /
    catch / identify what code is responsible for the crash?
    On Thursday, August 30, 2012 1:13:34 AM UTC-5, nathanmarz wrote:

    I have no idea. It sounds pretty obviously related to your classpath
    issues. I haven't ever seen anything like this. If your issue is solved,
    then I'm not sure what exactly you're looking for.
    On Tue, Aug 28, 2012 at 2:46 PM, Dane Hammer wrote:

    We recently upgraded to storm 0.8.0 and we experience a new issue when
    we had classpath issues. With a new version of a dependency on the
    classpath causing a java.lang.VerifyError in our code, we would see the
    expected: storm brought down the JVM that experienced the error and started
    again. This however exposed another issue: zeromq would blow up and we kept
    running the disk out of space with core dump files. This was where the core
    dump file took us:

    (gdb) where
    #0 0x00000031de232885 in raise () from /lib64/libc.so.6
    #1 0x00000031de234065 in abort () from /lib64/libc.so.6
    #2 0x00000031de22b9fe in __assert_fail_base () from /lib64/libc.so.6
    #3 0x00000031de22bac0 in __assert_fail () from /lib64/libc.so.6
    #4 0x00007fa4cd69fbc2 in get_socket (env=0x7fa4d88271d0,
    obj=0x7fa48f1f0688, do_assert=1) at Socket.cpp:543
    #5 0x00007fa4cd69fe85 in Java_org_zeromq_ZMQ_00024Socket_send
    (env=0x7fa4d88271d0, obj=<value optimized out>, msg=0x7fa48f1f0680,
    flags=2) at Socket.cpp:365

    I found a few posts in zeromq related threads online, but it was all
    questions, no answers. It feels like a potential race condition with the
    JVM shutting down and something trying to re-use that socket?

    The interesting part is when we resolved our classpath issue we quit
    seeing the issue. Thoughts?


    --
    Twitter: @nathanmarz
    http://nathanmarz.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupstorm-user @
postedDec 5, '12 at 7:14a
activeDec 5, '12 at 11:22p
posts3
users2
websitestorm-project.net
irc#storm-user

2 users in discussion

Dane Hammer: 2 posts Yu Dongmin: 1 post

People

Translate

site design / logo © 2022 Grokbase