Hi all,

I'm testing my active-active setup (2.8.1, linux 64) and I am randomly
running into some crashes when I'm stopping a node. I can stop one node
abut another one fails along with it. Below is a crash log.

Is "rabbitmq_management" the UI plugin? Could it bring down the whone node,
because I keep UI open in browser while doing my testing.

Vadim.

=INFO REPORT==== 26-Apr-2012::12:15:58 ===
Adding mirror of queue 'test_33' in vhost '/' on node 'rabbit at rabbitmq-dev-2':
<3066.4773.0>

=INFO REPORT==== 26-Apr-2012::12:15:58 ===
Adding mirror of queue 'test_34' in vhost '/' on node 'rabbit at rabbitmq-dev-2':
<3066.4777.0>

=INFO REPORT==== 26-Apr-2012::12:15:59 ===
Adding mirror of queue 'test_35' in vhost '/' on node 'rabbit at rabbitmq-dev-2':
<3066.4782.0>

=INFO REPORT==== 26-Apr-2012::12:15:59 ===
Adding mirror of queue 'test_36' in vhost '/' on node 'rabbit at rabbitmq-dev-2':
<3066.4786.0>

=INFO REPORT==== 26-Apr-2012::12:15:59 ===
Adding mirror of queue 'test_37' in vhost '/' on node 'rabbit at rabbitmq-dev-2':
<3066.4790.0>

=INFO REPORT==== 26-Apr-2012::12:15:59 ===
Adding mirror of queue 'test_38' in vhost '/' on node 'rabbit at rabbitmq-dev-2':
<3066.4795.0>

=INFO REPORT==== 26-Apr-2012::12:15:59 ===
Adding mirror of queue 'test_39' in vhost '/' on node 'rabbit at rabbitmq-dev-2':
<3066.4799.0>

=ERROR REPORT==== 26-Apr-2012::12:15:59 ===
Discarding message
{'$gen_call',{<0.1955.0>,#Ref<0.0.0.5513>},{add_on_right,{9,<0.1955.0>}}}
from <0.1955.0> to <0.26823.834>
in an old incarnation (2) of this node (3)


=INFO REPORT==== 26-Apr-2012::12:15:59 ===
Adding mirror of queue 'test_40' in vhost '/' on node 'rabbit at rabbitmq-dev-2':
<3066.4803.0>

=INFO REPORT==== 26-Apr-2012::12:15:59 ===
Adding mirror of queue 'test_41' in vhost '/' on node 'rabbit at rabbitmq-dev-2':
<3066.4807.0>

=ERROR REPORT==== 26-Apr-2012::12:15:59 ===
Discarding message
{'$gen_call',{<0.1983.0>,#Ref<0.0.0.5648>},{add_on_right,{9,<0.1983.0>}}}
from <0.1983.0> to <0.26850.834>
in an old incarnation (2) of this node (3)


=INFO REPORT==== 26-Apr-2012::12:15:59 ===
Adding mirror of queue 'test_42' in vhost '/' on node 'rabbit at rabbitmq-dev-2':
<3066.4811.0>

=INFO REPORT==== 26-Apr-2012::12:15:59 ===
Adding mirror of queue 'test_43' in vhost '/' on node 'rabbit at rabbitmq-dev-2':
<3066.4816.0>

=INFO REPORT==== 26-Apr-2012::12:16:39 ===
rabbit on node 'rabbit at rabbitmq-dev-3' up

=WARNING REPORT==== 26-Apr-2012::12:18:15 ===
closing AMQP connection <0.1433.0> (172.16.51.100:16360 -> 10.7.1.21:5672):
connection_closed_abruptly

=INFO REPORT==== 26-Apr-2012::12:23:57 ===
application: rabbitmq_management
exited: shutdown
type: permanent

=INFO REPORT==== 26-Apr-2012::12:23:57 ===
stopped TCP Listener on 0.0.0.0:5672

=ERROR REPORT==== 26-Apr-2012::12:23:57 ===
** Generic server <0.1800.0> terminating
** Last message in was {'$gen_cast',{gm_deaths,[<0.4684.0>]}}
** When Server state == {state,
{amqqueue,
{resource,<<"/">>,queue,<<"test_29">>},
true,false,<0.1433.0>,
[{<<"x-ha-policy">>,longstr,<<"all">>},
{<<"x-message-ttl">>,signedint,600000}],
<0.1799.0>,[],all},
<0.1801.0>,
{dict,0,16,16,8,80,48,

{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[],[]}}},
#Fun<rabbit_mirror_queue_master.1.2951048>,
#Fun<rabbit_mirror_queue_master.2.72654940>}
** Reason for termination ==
** {{case_clause,{ok,<3066.9234.0>,[<0.4683.0>]}},
[{rabbit_mirror_queue_coordinator,handle_cast,2},
{gen_server2,handle_msg,2},
{proc_lib,wake_up,3}]}

=ERROR REPORT==== 26-Apr-2012::12:23:57 ===
** Generic server <0.1714.0> terminating
** Last message in was {'$gen_cast',{gm_deaths,[<0.4653.0>]}}
** When Server state == {state,
{amqqueue,
{resource,<<"/">>,queue,<<"test_23">>},
true,false,<0.1433.0>,
[{<<"x-ha-policy">>,longstr,<<"all">>},
{<<"x-message-ttl">>,signedint,600000}],
<0.1712.0>,[],all},
<0.1715.0>,
{dict,0,16,16,8,80,48,

{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[],[]}}},
#Fun<rabbit_mirror_queue_master.1.2951048>,
#Fun<rabbit_mirror_queue_master.2.72654940>}
** Reason for termination ==
** {{case_clause,{ok,<3066.9155.0>,[<0.4652.0>]}},
[{rabbit_mirror_queue_coordinator,handle_cast,2},
{gen_server2,handle_msg,2},
{proc_lib,wake_up,3}]}

=ERROR REPORT==== 26-Apr-2012::12:23:57 ===
** Generic server <0.1531.0> terminating
** Last message in was {'$gen_cast',{gm_deaths,[<0.4587.0>]}}
** When Server state == {state,
{amqqueue,
{resource,<<"/">>,queue,<<"test_10">>},
true,false,<0.1433.0>,
[{<<"x-ha-policy">>,longstr,<<"all">>},
{<<"x-message-ttl">>,signedint,600000}],
<0.1530.0>,[],all},
<0.1532.0>,
{dict,0,16,16,8,80,48,

{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[],[]}}},
#Fun<rabbit_mirror_queue_master.1.2951048>,
#Fun<rabbit_mirror_queue_master.2.72654940>}
** Reason for termination ==
** {noproc,{gen_server2,call,[worker_pool,next_free,infinity]}}
(END)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120426/e224031c/attachment.htm>

Search Discussions

  • Matthew Sackman at Apr 26, 2012 at 10:31 pm
    Hi Vadim,
    On Thu, Apr 26, 2012 at 01:01:20PM -0700, Vadim Chekan wrote:
    I'm testing my active-active setup (2.8.1, linux 64) and I am randomly
    running into some crashes when I'm stopping a node. I can stop one node
    abut another one fails along with it. Below is a crash log.

    =ERROR REPORT==== 26-Apr-2012::12:15:59 ===
    Discarding message
    {'$gen_call',{<0.1955.0>,#Ref<0.0.0.5513>},{add_on_right,{9,<0.1955.0>}}}
    from <0.1955.0> to <0.26823.834>
    in an old incarnation (2) of this node (3)
    I'm worried about these messages. Someone else on this list has seen
    this sort of thing too and it's causing them trouble. I've not seen this
    issue myself in testing which is frustrating. However, that's not the
    cause of your crash in this case (I think).
    ** Generic server <0.1800.0> terminating
    ** Last message in was {'$gen_cast',{gm_deaths,[<0.4684.0>]}}
    ** When Server state == {state,
    {amqqueue,
    {resource,<<"/">>,queue,<<"test_29">>},
    true,false,<0.1433.0>,
    [{<<"x-ha-policy">>,longstr,<<"all">>},
    {<<"x-message-ttl">>,signedint,600000}],
    <0.1799.0>,[],all},
    <0.1801.0>,
    {dict,0,16,16,8,80,48,

    {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
    []},
    {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
    [],[]}}},
    #Fun<rabbit_mirror_queue_master.1.2951048>,
    #Fun<rabbit_mirror_queue_master.2.72654940>}
    ** Reason for termination ==
    ** {{case_clause,{ok,<3066.9234.0>,[<0.4683.0>]}},
    [{rabbit_mirror_queue_coordinator,handle_cast,2},
    {gen_server2,handle_msg,2},
    {proc_lib,wake_up,3}]}
    Well this is very odd. We fixed a bug that looked like this, but it got
    fixed in 2.7.1 (and related to x-ha-policy = nodes. Could you just check
    that you really are running 2.8.1? We're not aware of any bug in this
    area in 2.8.1, but that's certainly not saying there's not one there! Is
    there any particular sequence of events that you can perform that
    reliably triggers this crash? Could you also check the logs of the other
    nodes (both .log and -sasl.log) to see if there's further crash reports
    in there?

    Also, there have been discovered lots of bugs relating to the code
    changes made to add DLX support in 2.8.1, especially in relation to HA.
    It's possible one of the issues I found with TTL and HA is causing this.
    2.8.2 should be out soonish which might introduce fewer new bugs than it
    fixes, but in the mean time, could you try without the TTL and see if
    that helps?

    Matthew
  • Vadim Chekan at Apr 26, 2012 at 11:27 pm
    Hi Matthew,

    Thanks for your reply. At least web UI reports that all 3 nodes are 2.8.1
    so I guess they are up to date.
    Reproducing this bug is difficult. I'll work on it for may be 2 more days,
    trying to find a combination which would be reproducible. Very preliminary,
    I have impression that "exclusive" queue flag have something to do with it,
    but I will experiment with ttl too, as you suggested.

    I dont know if it is related, but I had a strange situation when there was
    an exclusive autodelete queue without a connection (connection reported
    "unknown") and it was impossible to delete it because "resource lock"
    error.
    Hopefully I'll be able to report more tomorrow.

    Thanks,
    Vadim.
    On Thu, Apr 26, 2012 at 3:31 PM, Matthew Sackman wrote:

    Hi Vadim,
    On Thu, Apr 26, 2012 at 01:01:20PM -0700, Vadim Chekan wrote:
    I'm testing my active-active setup (2.8.1, linux 64) and I am randomly
    running into some crashes when I'm stopping a node. I can stop one node
    abut another one fails along with it. Below is a crash log.

    =ERROR REPORT==== 26-Apr-2012::12:15:59 ===
    Discarding message
    {'$gen_call',{<0.1955.0>,#Ref<0.0.0.5513>},{add_on_right,{9,<0.1955.0>}}}
    from <0.1955.0> to <0.26823.834>
    in an old incarnation (2) of this node (3)
    I'm worried about these messages. Someone else on this list has seen
    this sort of thing too and it's causing them trouble. I've not seen this
    issue myself in testing which is frustrating. However, that's not the
    cause of your crash in this case (I think).
    ** Generic server <0.1800.0> terminating
    ** Last message in was {'$gen_cast',{gm_deaths,[<0.4684.0>]}}
    ** When Server state == {state,
    {amqqueue,
    {resource,<<"/">>,queue,<<"test_29">>},
    true,false,<0.1433.0>,
    [{<<"x-ha-policy">>,longstr,<<"all">>},
    {<<"x-message-ttl">>,signedint,600000}],
    <0.1799.0>,[],all},
    <0.1801.0>,
    {dict,0,16,16,8,80,48,

    {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
    []},
    {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
    [],[]}}},
    #Fun<rabbit_mirror_queue_master.1.2951048>,
    #Fun<rabbit_mirror_queue_master.2.72654940>}
    ** Reason for termination ==
    ** {{case_clause,{ok,<3066.9234.0>,[<0.4683.0>]}},
    [{rabbit_mirror_queue_coordinator,handle_cast,2},
    {gen_server2,handle_msg,2},
    {proc_lib,wake_up,3}]}
    Well this is very odd. We fixed a bug that looked like this, but it got
    fixed in 2.7.1 (and related to x-ha-policy = nodes. Could you just check
    that you really are running 2.8.1? We're not aware of any bug in this
    area in 2.8.1, but that's certainly not saying there's not one there! Is
    there any particular sequence of events that you can perform that
    reliably triggers this crash? Could you also check the logs of the other
    nodes (both .log and -sasl.log) to see if there's further crash reports
    in there?

    Also, there have been discovered lots of bugs relating to the code
    changes made to add DLX support in 2.8.1, especially in relation to HA.
    It's possible one of the issues I found with TTL and HA is causing this.
    2.8.2 should be out soonish which might introduce fewer new bugs than it
    fixes, but in the mean time, could you try without the TTL and see if
    that helps?

    Matthew
    _______________________________________________
    rabbitmq-discuss mailing list
    rabbitmq-discuss at lists.rabbitmq.com
    https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss


    --
  • Vadim Chekan at Apr 27, 2012 at 12:31 am
    Ok, I got it again.
    3 nodes. Stop slave, restart app(client), bring slave back online, stop
    master, restart app, bring master back online, BOOM!
    So here are logs from both nodes, the one which I shutdown (dev-2) and the
    one which failed unexpectedly (dev-3)

    dev-2 was the master node and when I stopped it, it brought dev-3 node down
    with it.
    Did everything started because of this?
    Reason: reached_max_restart_intensity
    Vadim

    dev-3 log :
    ===============================================
    =INFO REPORT==== 26-Apr-2012::16:50:38 ===
    Management agent started.

    =INFO REPORT==== 26-Apr-2012::16:50:38 ===
    Management plugin started. Port: 55672, path: /

    =INFO REPORT==== 26-Apr-2012::16:51:07 ===
    application: rabbitmq_management
    exited: shutdown
    type: permanent

    =INFO REPORT==== 26-Apr-2012::16:51:08 ===
    stopped TCP Listener on 0.0.0.0:5672
    ===============================================

    dev-3-sasl.log
    ===============================================
    =SUPERVISOR REPORT==== 26-Apr-2012::16:51:07 ===
    Supervisor: {local,rabbit_mgmt_sup}
    Context: child_terminated
    Reason: shutdown
    Offender: [{pid,<0.396.0>},
    {name,mirroring},
    {mfa,
    {mirrored_supervisor,start_internal,
    [rabbit_mgmt_sup,
    [{rabbit_mgmt_db,
    {rabbit_mgmt_db,start_link,[]},
    permanent,4294967295,worker,
    [rabbit_mgmt_db]}]]}},
    {restart_type,permanent},
    {shutdown,4294967295},
    {child_type,worker}]


    =SUPERVISOR REPORT==== 26-Apr-2012::16:51:07 ===
    Supervisor: {local,rabbit_mgmt_sup}
    Context: shutdown
    Reason: reached_max_restart_intensity
    Offender: [{pid,<0.396.0>},
    {name,mirroring},
    {mfa,
    {mirrored_supervisor,start_internal,
    [rabbit_mgmt_sup,
    [{rabbit_mgmt_db,
    {rabbit_mgmt_db,start_link,[]},
    permanent,4294967295,worker,
    [rabbit_mgmt_db]}]]}},
    {restart_type,permanent},
    {shutdown,4294967295},
    {child_type,worker}]

    ===============================================

    dev-2 log
    ===============================================
    =INFO REPORT==== 26-Apr-2012::16:51:07 ===
    Stopping Rabbit

    =INFO REPORT==== 26-Apr-2012::16:51:07 ===
    application: rabbitmq_management_visualiser
    exited: stopped
    type: permanent

    =INFO REPORT==== 26-Apr-2012::16:51:07 ===
    application: rabbitmq_management
    exited: stopped
    type: permanent

    =INFO REPORT==== 26-Apr-2012::16:51:07 ===
    application: rabbitmq_management_agent
    exited: stopped
    type: permanent

    =INFO REPORT==== 26-Apr-2012::16:51:07 ===
    stopped TCP Listener on 0.0.0.0:5672

    =INFO REPORT==== 26-Apr-2012::16:51:08 ===
    rabbit on node 'rabbit at rabbitmq-dev-3' down

    =INFO REPORT==== 26-Apr-2012::16:51:08 ===
    Mirrored-queue (queue 'Advertiser-ONYX' in vhost '/'): Slave
    <'rabbit at rabbitmq-dev-2'.1.265.0> saw deaths of mirrors
    <'rabbit at rabbitmq-dev-3'.2.290.0>

    =INFO REPORT==== 26-Apr-2012::16:51:08 ===
    Mirrored-queue (queue 'Search-JROSE6420' in vhost '/'): Slave
    <'rabbit at rabbitmq-dev-2'.1.276.0> saw deaths of mirrors
    <'rabbit at rabbitmq-dev-3'.2.328.0>

    =INFO REPORT==== 26-Apr-2012::16:51:08 ===
    Mirrored-queue (queue 'Advertiser-DHOHMEYERGX280' in vhost '/'): Slave
    <'rabbit at rabbitmq-dev-2'.1.268.0> saw deaths of mirrors
    <'rabbit at rabbitmq-dev-3'.2.288.0>

    =INFO REPORT==== 26-Apr-2012::16:51:08 ===
    Mirrored-queue (queue 'CustomerAdmin-DHOHMEYERGX280' in vhost '/'): Slave
    <'rabbit at rabbitmq-dev-2'.1.270.0> saw deaths of mirrors
    <'rabbit at rabbitmq-dev-3'.2.294.0>

    =INFO REPORT==== 26-Apr-2012::16:51:08 ===
    Mirrored-queue (queue 'Tracking-JROSE6420' in vhost '/'): Slave
    <'rabbit at rabbitmq-dev-2'.1.250.0> saw deaths of mirrors
    <'rabbit at rabbitmq-dev-3'.2.247.0>

    =ERROR REPORT==== 26-Apr-2012::16:51:08 ===
    ** Generic server rabbit_mirror_queue_slave_sup terminating
    ** Last message in was {'EXIT',<0.127.0>,shutdown}
    ** When Server state == {state,
    {local,rabbit_mirror_queue_slave_sup},
    simple_one_for_one_terminate,
    [{child,undefined,rabbit_mirror_queue_slave,
    {rabbit_mirror_queue_slave,start_link,[]},
    temporary,4294967295,worker,
    [rabbit_mirror_queue_slave]}],
    {dict,11,16,16,8,80,48,
    {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
    {{[],[],
    [[<0.265.0>,
    {amqqueue,

    {resource,<<"/">>,queue,<<"Advertiser-ONYX">>},
    false,false,none,
    [{<<"x-ha-policy">>,longstr,<<"all">>},
    {<<"x-message-ttl">>,signedint,1200000}],
    <3067.289.0>,
    [<3065.334.0>],
    all}]],
    [],
    [[undefined,
    {amqqueue,
    {resource,<<"/">>,queue,
    <<"Advertiser-heartbeat-DHOHMEYERGX280">>},
    false,true,<3065.24765.0>,
    [{<<"x-ha-policy">>,longstr,<<"all">>}],
    <3065.24784.0>,
    [<3067.12370.2>],
    all}],
    [<0.6730.0>,
    {amqqueue,
    {resource,<<"/">>,queue,
    <<"Advertiser-heartbeat-DHOHMEYERGX280">>},
    false,true,<3065.14349.1>,
    [{<<"x-ha-policy">>,longstr,<<"all">>}],
    <3065.14370.1>,[],all}]],
    [[<0.268.0>,
    {amqqueue,
    {resource,<<"/">>,queue,
    <<"Advertiser-DHOHMEYERGX280">>},
    false,false,none,
    [{<<"x-ha-policy">>,longstr,<<"all">>},
    {<<"x-message-ttl">>,signedint,1200000}],
    <3067.287.0>,
    [<3065.332.0>],
    all}]],
    [],
    [[<0.270.0>,
    {amqqueue,
    {resource,<<"/">>,queue,
    <<"CustomerAdmin-DHOHMEYERGX280">>},
    false,false,none,
    [{<<"x-ha-policy">>,longstr,<<"all">>},
    {<<"x-message-ttl">>,signedint,1200000}],
    <3067.295.0>,
    [<3065.338.0>],
    all}]],
    [],
    [[<0.272.0>,
    {amqqueue,
    {resource,<<"/">>,queue,
    <<"Sales-DHOHMEYERGX280">>},
    false,false,none,
    [{<<"x-ha-policy">>,longstr,<<"all">>},
    {<<"x-message-ttl">>,signedint,1200000}],
    <3067.317.0>,
    [<3065.356.0>],
    all}]],
    [[<0.250.0>,
    {amqqueue,
    {resource,<<"/">>,queue,
    <<"Tracking-JROSE6420">>},
    false,false,none,
    [{<<"x-ha-policy">>,longstr,<<"all">>},
    {<<"x-message-ttl">>,signedint,1200000}],
    <3067.247.0>,
    [<3065.249.0>],
    all}]],
    [[<0.274.0>,
    {amqqueue,
    {resource,<<"/">>,queue,
    <<"RelatedLinks-ROGERMOORE">>},
    false,false,none,
    [{<<"x-ha-policy">>,longstr,<<"all">>},
    {<<"x-message-ttl">>,signedint,1200000}],
    <3067.331.0>,
    [<3065.370.0>],
    all}]],
    [[<0.252.0>,
    {amqqueue,
    {resource,<<"ActivityLogging">>,queue,

    <<"vsw.activitylogging.conversion.pendingconversion">>},
    true,false,none,
    [{<<"x-ha-policy">>,longstr,<<"all">>}],
    <3067.253.0>,
    [<3065.257.0>],
    all}]],
    [[<0.276.0>,
    {amqqueue,

    {resource,<<"/">>,queue,<<"Search-JROSE6420">>},
    false,false,none,
    [{<<"x-ha-policy">>,longstr,<<"all">>},
    {<<"x-message-ttl">>,signedint,1200000}],
    <3067.26259.1>,
    [<3065.4161.0>],
    all}]],
    [[<0.261.0>,
    {amqqueue,
    {resource,<<"/">>,queue,
    <<"RelatedLinks-ONYX">>},
    false,false,none,
    [{<<"x-ha-policy">>,longstr,<<"all">>},
    {<<"x-message-ttl">>,signedint,1200000}],
    <3067.277.0>,
    [<3065.322.0>],
    all}]],
    []}}},
    10,10,[],rabbit_mirror_queue_slave_sup,[]}
    ** Reason for termination ==
    ** {badarg,[{erlang,unlink,[undefined]},
    {supervisor2,'-terminate_simple_children/3-fun-0-',4},
    {dict,fold_bucket,3},
    {dict,fold_seg,4},
    {dict,fold_segs,4},
    {supervisor2,terminate_simple_children,3},
    {supervisor2,terminate,2},
    {gen_server,terminate,6}]}

    =INFO REPORT==== 26-Apr-2012::16:51:08 ===
    Mirrored-queue (queue 'vsw.activitylogging.conversion.pendingconversion' in
    vhost 'ActivityLogging'): Slave <'rabbit at rabbitmq-dev-2'.1.252.0> saw
    deaths of mirrors <'rabbit at rabbitmq-dev-3'.2.253.0>

    =ERROR REPORT==== 26-Apr-2012::16:51:08 ===
    ** Generic server <0.265.0> terminating
    ** Last message in was {'EXIT',<0.249.0>,
    {badarg,
    [{erlang,unlink,[undefined]},
    {supervisor2,

    '-terminate_simple_children/3-fun-0-',4},
    {dict,fold_bucket,3},
    {dict,fold_seg,4},
    {dict,fold_segs,4},
    {supervisor2,terminate_simple_children,3},
    {supervisor2,terminate,2},
    {gen_server,terminate,6}]}}
    ** When Server state == {state,
    {amqqueue,
    {resource,<<"/">>,queue,<<"Advertiser-ONYX">>},
    false,false,none,
    [{<<"x-ha-policy">>,longstr,<<"all">>},
    {<<"x-message-ttl">>,signedint,1200000}],
    <3067.289.0>,
    [<3065.334.0>],
    all},
    <0.267.0>,<3065.334.0>,rabbit_variable_queue,
    {vqstate,
    {0,{[],[]}},
    {0,{[],[]}},
    {delta,undefined,0,undefined},
    {0,{[],[]}},
    {0,{[],[]}},
    0,
    {0,nil},
    undefined,
    {0,nil},
    {qistate,
    "/var/lib/rabbitmq/mnesia/rabbit at rabbitmq-dev-2
    /queues/1ON03RTBZAE64B67H41L4ICDE",
    {{dict,0,16,16,8,80,48,

    {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
    {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
    []}}},
    []},
    undefined,0,262144,
    #Fun<rabbit_variable_queue.2.91259159>,
    {0,nil}},
    {undefined,
    {client_msstate,msg_store_transient,

    <<228,99,153,84,177,114,138,215,76,30,0,48,128,77,
    68,245>>,
    {dict,0,16,16,8,80,48,

    {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
    {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
    []}}},
    {state,368709,
    "/var/lib/rabbitmq/mnesia/rabbit at rabbitmq-dev-2
    /msg_store_transient"},
    rabbit_msg_store_ets_index,
    "/var/lib/rabbitmq/mnesia/rabbit at rabbitmq-dev-2
    /msg_store_transient",
    <0.239.0>,372806,364585,376903,381000}},

    false,0,#Fun<rabbit_mirror_queue_slave.3.29941254>,
    0,0,infinity,0,0,0,0,0,
    {rates,
    {{1335,483942,985021},0},
    {{1335,483942,985021},0},
    0.0,0.0,
    {1335,484240,226498}},
    {0,nil},
    {0,nil},
    {0,nil},
    {0,nil},
    0,0,
    {rates,
    {{1335,483942,985021},0},
    {{1335,483942,985021},0},
    0.0,0.0,
    {1335,484240,226498}}},
    undefined,#Ref<0.0.9.158766>,
    {dict,0,16,16,8,80,48,
    {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},

    {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
    {dict,0,16,16,8,80,48,
    {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},

    {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
    0,
    {dict,0,16,16,8,80,48,
    {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},

    {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
    {dict,0,16,16,8,80,48,
    {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},

    {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
    true}
    ** Reason for termination ==
    ** {badarg,[{erlang,unlink,[undefined]},
    {supervisor2,'-terminate_simple_children/3-fun-0-',4},
    {dict,fold_bucket,3},
    {dict,fold_seg,4},
    {dict,fold_segs,4},
    {supervisor2,terminate_simple_children,3},
    {supervisor2,terminate,2},
    {gen_server,terminate,6}]}

    =ERROR REPORT==== 26-Apr-2012::16:51:08 ===
    ** Generic server <0.6730.0> terminating
    ** Last message in was {'EXIT',<0.249.0>,
    {badarg,
    [{erlang,unlink,[undefined]},
    {supervisor2,

    '-terminate_simple_children/3-fun-0-',4},
    {dict,fold_bucket,3},
    {dict,fold_seg,4},
    {dict,fold_segs,4},
    {supervisor2,terminate_simple_children,3},
    {supervisor2,terminate,2},
    {gen_server,terminate,6}]}}
    ** When Server state == {state,
    {amqqueue,
    {resource,<<"/">>,queue,
    <<"Advertiser-heartbeat-DHOHMEYERGX280">>},
    false,true,<3065.14349.1>,
    [{<<"x-ha-policy">>,longstr,<<"all">>}],
    <3065.14370.1>,[],all},
    <0.6731.0>,<3065.14370.1>,rabbit_variable_queue,
    {vqstate,
    {0,{[],[]}},
    {0,{[],[]}},
    {delta,undefined,0,undefined},
    {0,{[],[]}},
    {0,{[],[]}},
    0,
    {0,nil},
    undefined,
    {0,nil},
    {qistate,
    "/var/lib/rabbitmq/mnesia/rabbit at rabbitmq-dev-2
    /queues/5R0GE9UEB8S0EMSJORBUKHBTE",
    {{dict,0,16,16,8,80,48,

    {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
    {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
    []}}},
    []},
    undefined,0,262144,
    #Fun<rabbit_variable_queue.2.91259159>,
    {0,nil}},
    {undefined,
    {client_msstate,msg_store_transient,

    <<182,86,2,135,69,87,14,108,163,188,15,159,127,119,
    55,241>>,
    {dict,0,16,16,8,80,48,

    {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
    {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
    []}}},
    {state,368709,
    "/var/lib/rabbitmq/mnesia/rabbit at rabbitmq-dev-2
    /msg_store_transient"},
    rabbit_msg_store_ets_index,
    "/var/lib/rabbitmq/mnesia/rabbit at rabbitmq-dev-2
    /msg_store_transient",
    <0.239.0>,372806,364585,376903,381000}},

    false,0,#Fun<rabbit_mirror_queue_slave.3.29941254>,
    0,0,infinity,0,0,0,0,0,
    {rates,
    {{1335,484239,805553},0},
    {{1335,484239,805553},0},
    0.0,0.0,
    {1335,484255,991021}},
    {0,nil},
    {0,nil},
    {0,nil},
    {0,nil},
    0,0,
    {rates,
    {{1335,484239,805553},0},
    {{1335,484239,805553},0},
    0.0,0.0,
    {1335,484255,991021}}},
    undefined,#Ref<0.0.9.158602>,
    {dict,0,16,16,8,80,48,
    {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},

    {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
    {dict,0,16,16,8,80,48,
    {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},

    {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
    0,
    {dict,0,16,16,8,80,48,
    {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},

    {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
    {dict,0,16,16,8,80,48,
    {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},

    {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
    true}
    ** Reason for termination ==
    ** {badarg,[{erlang,unlink,[undefined]},
    {supervisor2,'-terminate_simple_children/3-fun-0-',4},
    {dict,fold_bucket,3},
    {dict,fold_seg,4},
    {dict,fold_segs,4},
    {supervisor2,terminate_simple_children,3},
    {supervisor2,terminate,2},
    {gen_server,terminate,6}]}

    =ERROR REPORT==== 26-Apr-2012::16:51:08 ===
    ** Generic server <0.252.0> terminating
    ** Last message in was {'EXIT',<0.249.0>,shutdown}
    ** When Server state == {state,
    {amqqueue,
    {resource,<<"ActivityLogging">>,queue,

    <<"vsw.activitylogging.conversion.pendingconversion">>},
    true,false,none,
    [{<<"x-ha-policy">>,longstr,<<"all">>}],
    <3067.253.0>,
    [<3065.257.0>],
    all},
    <0.253.0>,<3065.257.0>,rabbit_variable_queue,
    {vqstate,
    {0,{[],[]}},
    {0,{[],[]}},
    {delta,undefined,0,undefined},
    {0,{[],[]}},
    {0,{[],[]}},
    0,
    {0,nil},
    undefined,
    {0,nil},
    {qistate,
    "/var/lib/rabbitmq/mnesia/rabbit at rabbitmq-dev-2
    /queues/9KDJJKR11HTT40SVOPWG0AWDM",
    {{dict,0,16,16,8,80,48,

    {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
    {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
    []}}},
    []},
    undefined,0,262144,
    #Fun<rabbit_variable_queue.2.91259159>,
    {0,nil}},
    {{client_msstate,msg_store_persistent,

    <<58,135,182,192,228,241,29,217,255,38,4,53,94,213,
    95,102>>,
    {dict,0,16,16,8,80,48,

    {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
    {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
    []}}},
    {state,389194,
    "/var/lib/rabbitmq/mnesia/rabbit at rabbitmq-dev-2
    /msg_store_persistent"},
    rabbit_msg_store_ets_index,
    "/var/lib/rabbitmq/mnesia/rabbit at rabbitmq-dev-2
    /msg_store_persistent",
    <0.244.0>,393291,385097,397388,401485},
    {client_msstate,msg_store_transient,

    <<63,170,201,248,36,125,208,20,133,142,139,71,219,
    91,212,33>>,
    {dict,0,16,16,8,80,48,

    {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
    {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
    []}}},
    {state,368709,
    "/var/lib/rabbitmq/mnesia/rabbit at rabbitmq-dev-2
    /msg_store_transient"},
    rabbit_msg_store_ets_index,
    "/var/lib/rabbitmq/mnesia/rabbit at rabbitmq-dev-2
    /msg_store_transient",
    <0.239.0>,372806,364585,376903,381000}},

    true,0,#Fun<rabbit_mirror_queue_slave.3.29941254>,0,
    0,infinity,0,0,0,0,0,
    {rates,
    {{1335,483943,551021},0},
    {{1335,483943,551021},0},
    0.0,0.0,
    {1335,484240,51118}},
    {0,nil},
    {0,nil},
    {0,nil},
    {0,nil},
    0,0,
    {rates,
    {{1335,483943,551021},0},
    {{1335,483943,551021},0},
    0.0,0.0,
    {1335,484240,51118}}},
    undefined,#Ref<0.0.9.158618>,
    {dict,0,16,16,8,80,48,
    {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},

    {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
    {dict,0,16,16,8,80,48,
    {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},

    {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
    0,
    {dict,0,16,16,8,80,48,
    {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},

    {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
    {dict,0,16,16,8,80,48,
    {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},

    {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
    true}
    ** Reason for termination ==
    ** {shutdown,
    {gen_server2,call,
    [msg_store_persistent,
    {client_terminate,

    <<58,135,182,192,228,241,29,217,255,38,4,53,94,213,95,102>>},
    infinity]}}
    ** In 'terminate' callback with reason ==
    ** shutdown

    =INFO REPORT==== 26-Apr-2012::16:51:08 ===
    application: rabbit
    exited: stopped
    type: permanent

    =INFO REPORT==== 26-Apr-2012::16:51:08 ===
    application: os_mon
    exited: stopped
    type: permanent

    =INFO REPORT==== 26-Apr-2012::16:51:08 ===
    application: mnesia
    exited: stopped
    type: permanent

    =INFO REPORT==== 26-Apr-2012::16:51:08 ===
    Halting Erlang VM

    ===============================================

    dev-2-sasl.log
    ===============================================
    =CRASH REPORT==== 26-Apr-2012::16:51:08 ===
    crasher:
    initial call: supervisor2:init/1
    pid: <0.249.0>
    registered_name: rabbit_mirror_queue_slave_sup
    exception exit: {badarg,
    [{erlang,unlink,[undefined]},

    {supervisor2,'-terminate_simple_children/3-fun-0-',4},
    {dict,fold_bucket,3},
    {dict,fold_seg,4},
    {dict,fold_segs,4},
    {supervisor2,terminate_simple_children,3},
    {supervisor2,terminate,2},
    {gen_server,terminate,6}]}
    in function gen_server:terminate/6
    ancestors: [rabbit_sup,<0.126.0>]
    messages: [{'DOWN',#Ref<0.0.9.159703>,process,
    {undefined,'rabbit at rabbitmq-dev-2'},
    noproc}]
    links: [<0.265.0>,<0.6730.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 987
    stack_size: 24
    reductions: 5064
    neighbours:

    =SUPERVISOR REPORT==== 26-Apr-2012::16:51:08 ===
    Supervisor: {local,rabbit_sup}
    Context: shutdown_error
    Reason: {badarg,
    [{erlang,unlink,[undefined]},
    {supervisor2,'-terminate_simple_children/3-fun-0-',4},
    {dict,fold_bucket,3},
    {dict,fold_seg,4},
    {dict,fold_segs,4},
    {supervisor2,terminate_simple_children,3},
    {supervisor2,terminate,2},
    {gen_server,terminate,6}]}
    Offender: [{pid,<0.249.0>},
    {name,rabbit_mirror_queue_slave_sup},
    {mfargs,{rabbit_mirror_queue_slave_sup,start_link,[]}},
    {restart_type,transient},
    {shutdown,infinity},
    {child_type,supervisor}]


    =CRASH REPORT==== 26-Apr-2012::16:51:08 ===
    crasher:
    initial call: gen:init_it/6
    pid: <0.265.0>
    registered_name: []
    exception exit: {badarg,
    [{erlang,unlink,[undefined]},

    {supervisor2,'-terminate_simple_children/3-fun-0-',4},
    {dict,fold_bucket,3},
    {dict,fold_seg,4},
    {dict,fold_segs,4},
    {supervisor2,terminate_simple_children,3},
    {supervisor2,terminate,2},
    {gen_server,terminate,6}]}
    in function gen_server2:terminate/3
    ancestors: [rabbit_mirror_queue_slave_sup,rabbit_sup,<0.126.0>]
    messages: [{'EXIT',<0.267.0>,normal},
    {'DOWN',#Ref<0.0.3.44831>,process,<3065.334.0>,normal}]
    links: []
    dictionary: [{{xtype_to_module,direct},rabbit_exchange_type_direct},
    {{xtype_to_module,topic},rabbit_exchange_type_topic},
    {guid,{{3831732564,2977073879,1277034544,2152547573},0}}]
    trap_exit: true
    status: running
    heap_size: 2584
    stack_size: 24
    reductions: 4916
    neighbours:

    =CRASH REPORT==== 26-Apr-2012::16:51:08 ===
    crasher:
    initial call: gen:init_it/6
    pid: <0.6730.0>
    registered_name: []
    exception exit: {badarg,
    [{erlang,unlink,[undefined]},

    {supervisor2,'-terminate_simple_children/3-fun-0-',4},
    {dict,fold_bucket,3},
    {dict,fold_seg,4},
    {dict,fold_segs,4},
    {supervisor2,terminate_simple_children,3},
    {supervisor2,terminate,2},
    {gen_server,terminate,6}]}
    in function gen_server2:terminate/3
    ancestors: [rabbit_mirror_queue_slave_sup,rabbit_sup,<0.126.0>]
    messages: [{'EXIT',<0.6731.0>,normal},
    {'DOWN',#Ref<0.0.4.191216>,process,<3065.14370.1>,normal}]
    links: []
    dictionary: [{credit_blocked,[]},
    {{xtype_to_module,direct},rabbit_exchange_type_direct},
    {{xtype_to_module,fanout},rabbit_exchange_type_fanout},
    {guid,{{3059090055,1163333228,2747010975,2138519537},0}}]
    trap_exit: true
    status: running
    heap_size: 1597
    stack_size: 24
    reductions: 8290
    neighbours:
    ===============================================
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120426/0e88bf17/attachment.htm>
  • Matthias Radestock at Apr 27, 2012 at 5:42 am
    Vadim,
    On 27/04/12 01:31, Vadim Chekan wrote:
    Ok, I got it again.
    This does look like a different bug, and one that, for a change, is
    easily fixed.

    It is possible that this may either be triggered by or be the trigger of
    the previous bug. Is there nothing in the logs this time about
    "Discarding message" or "gm_deaths"?

    Matthias.
  • Matthias Radestock at Apr 28, 2012 at 7:13 am
    Vadim,

    (putting the list back on cc)
    On 27/04/12 23:28, Vadim Chekan wrote:
    I've spent some time today playing with different client settings. Seems
    ttl does not affect failures at all.
    That's good to know. The more factors we can eliminate as possible
    causes the better.
    I managed to reproduce crash many times today. Basic idea is: an
    application, 40 threads create a pub/sub exchange and publish a message
    every second. Under this load I bring the master node (all queues are
    usually created on the same node) and often it causes another node to
    fail. Here is my load simulator (in c#):
    http://www.heypasteit.com/clip/0B5W
    That code connects to "rabbitmq-dev". Is that a load balancer sitting in
    front of the three nodes?

    How do you deal with the disconnects resulting from the shutting down of
    nodes? There doesn't seem to be any code to handle that.

    Mind you, I suspect that the failure should still be reproducible
    without any subscriptions and publishes. Would be good to try that and
    just watch the broker logs for errors.
    As long as maillist does not allow zip attaches, I'm mailing to you guys
    directly
    Thanks for posting these. There is an error in the logs that we haven't
    seen before:

    {{badmatch,[]},
    [{rabbit_mirror_queue_misc,'-remove_from_queue/2-fun-0-',2},
    {mnesia_tm,apply_fun,3},
    {mnesia_tm,execute_transaction,5},
    {rabbit_misc,'-execute_mnesia_transaction/1-fun-0-',1},
    {worker_pool_worker,handle_call,3},
    {gen_server2,handle_msg,2},
    {proc_lib,wake_up,3}]}

    Looking at the code, this appears to indicate that there are no
    master/mirror processes left for the queue. Which is...unexpected. That
    should give us something to go on.

    Regards,

    Matthias.
  • Vadim Chekan at Apr 30, 2012 at 4:43 pm
    Thanks for looking into it Matthias,

    "rabbitmq-dev": yes, it is load balancer.
    Disconnects: I was too lazy to write code for it and just restarted client
    manually.
    I did make a short try to reproduce failure without any load but it worked
    just fine. I'll try more today.

    Vadim.

    On Sat, Apr 28, 2012 at 12:13 AM, Matthias Radestock
    wrote:
    Vadim,

    (putting the list back on cc)

    On 27/04/12 23:28, Vadim Chekan wrote:

    I've spent some time today playing with different client settings. Seems
    ttl does not affect failures at all.
    That's good to know. The more factors we can eliminate as possible causes
    the better.


    I managed to reproduce crash many times today. Basic idea is: an
    application, 40 threads create a pub/sub exchange and publish a message
    every second. Under this load I bring the master node (all queues are
    usually created on the same node) and often it causes another node to
    fail. Here is my load simulator (in c#):
    http://www.heypasteit.com/**clip/0B5W<http://www.heypasteit.com/clip/0B5W>
    That code connects to "rabbitmq-dev". Is that a load balancer sitting in
    front of the three nodes?

    How do you deal with the disconnects resulting from the shutting down of
    nodes? There doesn't seem to be any code to handle that.

    Mind you, I suspect that the failure should still be reproducible without
    any subscriptions and publishes. Would be good to try that and just watch
    the broker logs for errors.


    As long as maillist does not allow zip attaches, I'm mailing to you guys
    directly
    Thanks for posting these. There is an error in the logs that we haven't
    seen before:

    {{badmatch,[]},
    [{rabbit_mirror_queue_misc,'-**remove_from_queue/2-fun-0-',2}**
    ,
    {mnesia_tm,apply_fun,3},
    {mnesia_tm,execute_**transaction,5},
    {rabbit_misc,'-execute_mnesia_**transaction/1-fun-0-',1},
    {worker_pool_worker,handle_**call,3},
    {gen_server2,handle_msg,2},
    {proc_lib,wake_up,3}]}

    Looking at the code, this appears to indicate that there are no
    master/mirror processes left for the queue. Which is...unexpected. That
    should give us something to go on.

    Regards,

    Matthias.


    --

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouprabbitmq-discuss @
categoriesrabbitmq
postedApr 26, '12 at 8:01p
activeApr 30, '12 at 4:43p
posts7
users3
websiterabbitmq.com
irc#rabbitmq

People

Translate

site design / logo © 2017 Grokbase