Hello,


I am running rabbitmq as a cluster node, the version 2.8.2, erlang info:
Erlang R15B01 (erts-5.9.1) [source] [64-bit] [smp:8:8]
[async-threads:30] [kernel-poll:true]


The system is Ubuntu server 9.10 64 bit.


This rabbitmq has been running for long days without problems.
Today I restarted it but failed. The info:




root at rabbitmq2:/var/log/rabbitmq# cat startup_err


Crash dump was written to: erl_crash.dump
eheap_alloc: Cannot allocate 2850821240 bytes of memory (of type "heap").
Aborted


root at rabbitmq2:/var/log/rabbitmq# free -m
total used free shared buffers cached
Mem: 8004 1242 6761 0 13 870
-/+ buffers/cache: 359 7645
Swap: 0 0 0


root at rabbitmq2:/var/log/rabbitmq# rabbitmqctl status
Status of node 'testmq-slave at rabbitmq2' ...
Error: unable to connect to node 'testmq-slave at rabbitmq2': nodedown


DIAGNOSTICS
===========


nodes in question: ['testmq-slave at rabbitmq2']


hosts, their running nodes and ports:
- rabbitmq2: [{rabbitmqctl4487,63660}]


current node details:
- node name: rabbitmqctl4487 at rabbitmq2
- home dir: /var/lib/rabbitmq
- cookie hash: ****






As you see, my system has enough free memory, but rabbitmq aborted in
the process of starting.
Please help, thanks in advance.

Search Discussions

  • Emile Joubert at Oct 9, 2012 at 9:17 am
    Hi,

    On 09/10/12 08:10, Geocast wrote:
    This rabbitmq has been running for long days without problems.
    Today I restarted it but failed. The info:

    root at rabbitmq2:/var/log/rabbitmq# cat startup_err

    The reason for the broker crashing in the first place is more likely to
    be in the main broker logfile or the broker sasl file. You should look
    at that also.

    Crash dump was written to: erl_crash.dump
    eheap_alloc: Cannot allocate 2850821240 bytes of memory (of type "heap").
    Aborted

    I assume the quoted error was generated upon attempted startup after the
    crash. The crash would have caused the broker to stop in such a way that
    an expensive and lengthy recovery process is required at startup, in
    order to recover messages.

    root at rabbitmq2:/var/log/rabbitmq# free -m
    total used free shared buffers cached
    Mem: 8004 1242 6761 0 13 870
    As you see, my system has enough free memory, but rabbitmq aborted in
    the process of starting.

    It does looks like there is enough free memory and that allocation
    should succeed, but this depends on when the "free" command was run.
    There may have been less free memory while the broker was trying to
    start up. Is there any other reason why the OS might refuse? Are there
    any ulimits in effect that might prevent it? Does the OS syslog say
    anything? The root user might not be subject to the same limit, so it
    might be worth running the broker as root, just to start up.


    Can you get Erlang to allocate that much memory independently from the
    running the broker? Try this test - start an Erlang VM


    erl


    and enter this:


    size(<<0:2850821240/unit:8>>).


    The result should be 2850821240. If you receive an error instead then
    the broker is unlikely to have enough RAM to recover messages at startup.




    -Emile
  • Geocast at Oct 9, 2012 at 9:31 am
    Hello Emile,


    Thanks for your kind answer. This is my test:


    root at rabbitmq2:~# free -m
    total used free shared buffers cached
    Mem: 8004 452 7551 0 9 88
    -/+ buffers/cache: 354 7649
    Swap: 0 0 0
    root at rabbitmq2:~#
    root at rabbitmq2:~# erl
    Erlang R15B01 (erts-5.9.1) [source] [64-bit] [smp:8:8]
    [async-threads:0] [kernel-poll:false]


    Eshell V5.9.1 (abort with ^G)
    1> size(<<0:2850821240/unit:8>>).


    Crash dump was written to: erl_crash.dump
    binary_alloc: Cannot allocate 5701642511 bytes of memory (of type "binary").
    Aborted


    It seems so strange, why the memory allocation fails though there is
    enough free memory there?


    Thanks again.






    2012/10/9 Emile Joubert <emile@rabbitmq.com>:
    Hi,
    On 09/10/12 08:10, Geocast wrote:
    This rabbitmq has been running for long days without problems.
    Today I restarted it but failed. The info:

    root at rabbitmq2:/var/log/rabbitmq# cat startup_err
    The reason for the broker crashing in the first place is more likely to
    be in the main broker logfile or the broker sasl file. You should look
    at that also.
    Crash dump was written to: erl_crash.dump
    eheap_alloc: Cannot allocate 2850821240 bytes of memory (of type "heap").
    Aborted
    I assume the quoted error was generated upon attempted startup after the
    crash. The crash would have caused the broker to stop in such a way that
    an expensive and lengthy recovery process is required at startup, in
    order to recover messages.
    root at rabbitmq2:/var/log/rabbitmq# free -m
    total used free shared buffers cached
    Mem: 8004 1242 6761 0 13 870
    As you see, my system has enough free memory, but rabbitmq aborted in
    the process of starting.
    It does looks like there is enough free memory and that allocation
    should succeed, but this depends on when the "free" command was run.
    There may have been less free memory while the broker was trying to
    start up. Is there any other reason why the OS might refuse? Are there
    any ulimits in effect that might prevent it? Does the OS syslog say
    anything? The root user might not be subject to the same limit, so it
    might be worth running the broker as root, just to start up.

    Can you get Erlang to allocate that much memory independently from the
    running the broker? Try this test - start an Erlang VM

    erl

    and enter this:

    size(<<0:2850821240/unit:8>>).

    The result should be 2850821240. If you receive an error instead then
    the broker is unlikely to have enough RAM to recover messages at startup.


    -Emile
  • Emile Joubert at Oct 9, 2012 at 10:04 am

    On 09/10/12 10:31, Geocast wrote:
    It seems so strange, why the memory allocation fails though there is
    enough free memory there?

    Memory allocation failures can have many causes and are difficult to
    diagnose without access to the machine. You should check the syslog,
    kernel parameters and hardening configuration (if any) and ulimits. You
    should also consider adding more swap and tweaking the overcommit ratio.
    A forum dedicated to Ubuntu 9.10 or Linux will be able to offer more
    comprehensive help.


    Other more drastic options you might consider:


    Temporarily install more more RAM.


    Transfer the database directory to a server with more RAM, while setting
    the nodename the same as the original host. Then copy the database
    directory back after a clean shutdown.


    If you don't need the contents of the queues then the recovery process
    can be avoided by discarding the database directory.






    -Emile
  • Matthias Radestock at Oct 9, 2012 at 10:32 am

    On 09/10/12 11:04, Emile Joubert wrote:
    On 09/10/12 10:31, Geocast wrote:
    It seems so strange, why the memory allocation fails though there is
    enough free memory there?
    Memory allocation failures can have many causes and are difficult to
    diagnose without access to the machine.

    How much memory did *rabbit* think it had available when it last started
    successfully? Check the rabbit.log for an entry like this:


    =INFO REPORT==== 3-Oct-2012::20:31:08 ===
    Memory limit set to 4814MB of 12036MB total.




    Regards,


    Matthias.
  • Geocast at Oct 9, 2012 at 10:51 am
    I have added the swap to 32G, and restarted the broker, the top command shows:


    (screencut attached)


    it's still in process, wish I get good luck.
    Thank you.




    2012/10/9 Emile Joubert <emile@rabbitmq.com>:
    On 09/10/12 10:31, Geocast wrote:
    It seems so strange, why the memory allocation fails though there is
    enough free memory there?
    Memory allocation failures can have many causes and are difficult to
    diagnose without access to the machine. You should check the syslog,
    kernel parameters and hardening configuration (if any) and ulimits. You
    should also consider adding more swap and tweaking the overcommit ratio.
    A forum dedicated to Ubuntu 9.10 or Linux will be able to offer more
    comprehensive help.

    Other more drastic options you might consider:

    Temporarily install more more RAM.

    Transfer the database directory to a server with more RAM, while setting
    the nodename the same as the original host. Then copy the database
    directory back after a clean shutdown.

    If you don't need the contents of the queues then the recovery process
    can be avoided by discarding the database directory.



    -Emile
    -------------- next part --------------
    A non-text attachment was scrubbed...
    Name: top.png
    Type: image/png
    Size: 20073 bytes
    Desc: not available
    URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20121009/f3c09bd5/attachment.png>
  • Matthias Radestock at Oct 9, 2012 at 11:23 am

    On 09/10/12 11:51, Geocast wrote:
    I have added the swap to 32G, and restarted the broker

    it's still in process

    Please let us know whether that worked, and a rough idea of how much
    memory it needed.


    Matthias.
  • Geocast at Oct 9, 2012 at 11:48 am
    Currently the memory use:
    pyh at rabbitmq2:~$ free -m
    total used free shared buffers cached
    Mem: 8004 7951 52 0 1 11
    -/+ buffers/cache: 7938 65
    Swap: 35812 31715 4096


    The size of queues:


    pyh at rabbitmq2:/data/rabbitmq/server/db/testmq-slave$ du -h .
    736M ./queues/A5A29UQ5I7EMQFZFI8I7QNGZY
    677M ./queues/7ITLKQ8F63PVVN8GUAPWS4152
    359M ./queues/8W2IBTOABRYWVGC3IQM6WG9Z3
    966M ./queues/EGCZR6H1JUA1NDZ6NOBCXIN9Y
    1.1G ./queues/A6IFRZMWMUSGJ50ZG2V2N6ORZ
    962M ./queues/6JGALLR6VOJVXWPWPPG2TDLXL
    1.1M ./queues/4ILKGJ2S87IEDT0JQ4AUM04XU
    67M ./queues/AYN8KGZ9G6OYJ7XKJYNMFCGNV
    1.1G ./queues/4F8M7PYY08B7HZB2PD9SVIUJ6
    346M ./queues/8PPAJNJZKUWQBGPHO9LI45FD3
    938M ./queues/AEKXQ145TG2FBUFHHTNBMX2VD
    215M ./queues/8M5J2LAMRKP5QAXDFN1N1GDJS
    1.1G ./queues/BFJ2UY9VY8Y33JCYZ0OQ62K8Q
    450M ./queues/83G424V3FBCJ6UA2RPA1KQ33S
    215M ./queues/5TG6T0I85IXL62ZH7GH0OU9JM
    883M ./queues/DMQY72Q00Q2N3V8551BYVOSY3
    655M ./queues/61PRRPH54E4209RIDJ0O8A5GQ
    11G ./queues
    17G ./msg_store_persistent
    4.0K ./msg_store_transient
    28G .


    Thanks.


    2012/10/9 Matthias Radestock <matthias@rabbitmq.com>:
    On 09/10/12 11:51, Geocast wrote:

    I have added the swap to 32G, and restarted the broker

    it's still in process

    Please let us know whether that worked, and a rough idea of how much memory
    it needed.

    Matthias.
  • Geocast at Oct 9, 2012 at 1:33 pm
    The current status: all 32GB swap has been used out. The system dies,
    call IDC support to restart the server now.


    2012/10/9 Geocast <info@geocast.net>:
    Currently the memory use:
    pyh at rabbitmq2:~$ free -m
    total used free shared buffers cached
    Mem: 8004 7951 52 0 1 11
    -/+ buffers/cache: 7938 65
    Swap: 35812 31715 4096

    The size of queues:

    pyh at rabbitmq2:/data/rabbitmq/server/db/testmq-slave$ du -h .
    736M ./queues/A5A29UQ5I7EMQFZFI8I7QNGZY
    677M ./queues/7ITLKQ8F63PVVN8GUAPWS4152
    359M ./queues/8W2IBTOABRYWVGC3IQM6WG9Z3
    966M ./queues/EGCZR6H1JUA1NDZ6NOBCXIN9Y
    1.1G ./queues/A6IFRZMWMUSGJ50ZG2V2N6ORZ
    962M ./queues/6JGALLR6VOJVXWPWPPG2TDLXL
    1.1M ./queues/4ILKGJ2S87IEDT0JQ4AUM04XU
    67M ./queues/AYN8KGZ9G6OYJ7XKJYNMFCGNV
    1.1G ./queues/4F8M7PYY08B7HZB2PD9SVIUJ6
    346M ./queues/8PPAJNJZKUWQBGPHO9LI45FD3
    938M ./queues/AEKXQ145TG2FBUFHHTNBMX2VD
    215M ./queues/8M5J2LAMRKP5QAXDFN1N1GDJS
    1.1G ./queues/BFJ2UY9VY8Y33JCYZ0OQ62K8Q
    450M ./queues/83G424V3FBCJ6UA2RPA1KQ33S
    215M ./queues/5TG6T0I85IXL62ZH7GH0OU9JM
    883M ./queues/DMQY72Q00Q2N3V8551BYVOSY3
    655M ./queues/61PRRPH54E4209RIDJ0O8A5GQ
    11G ./queues
    17G ./msg_store_persistent
    4.0K ./msg_store_transient
    28G .

    Thanks.

    2012/10/9 Matthias Radestock <matthias@rabbitmq.com>:
    On 09/10/12 11:51, Geocast wrote:

    I have added the swap to 32G, and restarted the broker

    it's still in process

    Please let us know whether that worked, and a rough idea of how much memory
    it needed.

    Matthias.
  • Matthias Radestock at Oct 9, 2012 at 5:56 pm

    On 09/10/12 14:33, Geocast wrote:
    The current status: all 32GB swap has been used out. The system dies,
    call IDC support to restart the server now.

    It will probably need a lot more than 32GB to recover. I suggest you
    provide it with as much swap as you can.


    I've managed to reproduce this "need lots of memory when recovering from
    unclean shutdown" behaviour. We will fix that in a future release.


    Regards,


    Matthias.
  • Geocast at Oct 10, 2012 at 2:24 am
    I added the swap to 128GB, try to recover it now.




    2012/10/10 Matthias Radestock <matthias@rabbitmq.com>:
    On 09/10/12 14:33, Geocast wrote:

    The current status: all 32GB swap has been used out. The system dies,
    call IDC support to restart the server now.

    It will probably need a lot more than 32GB to recover. I suggest you provide
    it with as much swap as you can.

    I've managed to reproduce this "need lots of memory when recovering from
    unclean shutdown" behaviour. We will fix that in a future release.

    Regards,

    Matthias.
  • Geocast at Oct 10, 2012 at 8:02 am
    128GB swap is crashed also, :(
    root at rabbitmq2:/var/log/rabbitmq# cat startup_err


    Crash dump was written to: erl_crash.dump
    eheap_alloc: Cannot allocate 16992218280 bytes of memory (of type "heap").
    Aborted




    2012/10/10 Geocast <info@geocast.net>:
    I added the swap to 128GB, try to recover it now.


    2012/10/10 Matthias Radestock <matthias@rabbitmq.com>:
    On 09/10/12 14:33, Geocast wrote:

    The current status: all 32GB swap has been used out. The system dies,
    call IDC support to restart the server now.

    It will probably need a lot more than 32GB to recover. I suggest you provide
    it with as much swap as you can.

    I've managed to reproduce this "need lots of memory when recovering from
    unclean shutdown" behaviour. We will fix that in a future release.

    Regards,

    Matthias.
  • Matthias Radestock at Oct 10, 2012 at 8:08 am

    On 10/10/12 09:02, Geocast wrote:
    128GB swap is crashed also, :(

    When I wrote "I suggest you provide it with as much swap as you can." I
    meant it.


    Also, keep an eye on how much memory/swap it is using, just in case
    there is something else going here and for some reason rabbit cannot
    allocate memory when it should be able to.


    Matthias.
  • Geocast at Oct 11, 2012 at 1:18 am
    The 160GB swap is even not enough for repair it, the error info:


    Crash dump was written to: erl_crash.dump
    eheap_alloc: Cannot allocate 13593774640 bytes of memory (of type "heap").
    Aborted


    I gave up it, :)


    2012/10/10 Matthias Radestock <matthias@rabbitmq.com>:
    On 10/10/12 09:02, Geocast wrote:

    128GB swap is crashed also, :(

    When I wrote "I suggest you provide it with as much swap as you can." I
    meant it.

    Also, keep an eye on how much memory/swap it is using, just in case there is
    something else going here and for some reason rabbit cannot allocate memory
    when it should be able to.

    Matthias.
  • Matthias Radestock at Oct 11, 2012 at 6:16 am

    On 11/10/12 02:18, Geocast wrote:
    The 160GB swap is even not enough

    Did you keep an eye on how much memory/swap it was actually using?




    Matthias.
  • Geocast at Oct 11, 2012 at 9:48 am
    the top shows all memroy are using by the rabbit process.


    2012/10/11 Matthias Radestock <matthias@rabbitmq.com>:
    On 11/10/12 02:18, Geocast wrote:

    The 160GB swap is even not enough

    Did you keep an eye on how much memory/swap it was actually using?


    Matthias.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouprabbitmq-discuss @
categoriesrabbitmq
postedOct 9, '12 at 7:10a
activeOct 11, '12 at 9:48a
posts16
users3
websiterabbitmq.com
irc#rabbitmq

People

Translate

site design / logo © 2017 Grokbase