I have an EC2 node running rabbitmq 1.7.2. Take a look at its memory over the last month <http://i.imgur.com/LMf02.png>. Looks like a possible memory leak?

It only has 13 queues which generally sit at 0 (this is a change from the last time we restarted it, obvious from that graph, where we had one queue that would sit at ~70k because it was largely duplicates: we no longer do that), but are medium-traffic (maybe a few tens of new items per second, tops, its CPU over the same period: <http://i.imgur.com/BFhMK.png>), so it's not particularly loaded, and it's not clear where all of that memory is going (although it does belong to beam.smp, so it's not some other process). rabbitmq is the only thing this node does.

We can restart once a month, but this might be a bigger deal for higher-traffic users than us

Search Discussions

  • Marek Majkowski at Jul 1, 2010 at 11:12 am
    David,
    On Thu, Jul 1, 2010 at 06:31, David King wrote:
    I have an EC2 node running rabbitmq 1.7.2. Take a look at its memory over the last month <http://i.imgur.com/LMf02.png>. Looks like a possible memory leak?
    In this graph, have you restarted RabbitMQ around Saturday? Or had
    this sharp edge happened without your intervention?
    It only has 13 queues which generally sit at 0 (this is a change from the last time we restarted it, obvious from that graph, where we had one queue that would sit at ~70k because it was largely duplicates: we no longer do that), but are medium-traffic (maybe a few tens of new items per second, tops, its CPU over the same period: <http://i.imgur.com/BFhMK.png>), so it's not particularly loaded, and it's not clear where all of that memory is going (although it does belong to beam.smp, so it's not some other process). rabbitmq is the only thing this node does.

    We can restart once a month, but this might be a bigger deal for higher-traffic users than us
    It won't hurt if you upgraded to the newest RabbitMQ release - 1.8.0.

    There might be several causes of the behaviour you described. First,
    your specific usage patterns might not play well with garbage
    collection. For example, we force full garbage collection run when the
    queue is unused for more than 10 seconds. If your queue is constantly
    pinged - you're left to Erlang default garbage collection. That works
    well most of the time, but sometimes it's not optimal.

    Second thing, currently the broker keeps all currently handled
    messages in memory. If your RabbitMQ instance handle 70K messages,
    they are all stored in RAM. If that's the case, you could consider
    running our experimental code branch, which tries to page out messages
    to disk and save RAM.

    Cheers,
    Marek Majkowski
  • Matthias Radestock at Jul 1, 2010 at 12:55 pm

    On 01/07/10 12:12, Marek Majkowski wrote:
    It won't hurt if you upgraded to the newest RabbitMQ release - 1.8.0.
    ...and, more importantly, to Erlang/OTP R13B03 - unless you are running
    that or a more recent Erlang release already.


    Regards,

    Matthias.
  • David King at Jul 1, 2010 at 7:01 pm

    I have an EC2 node running rabbitmq 1.7.2. Take a look at its memory over the last month <http://i.imgur.com/LMf02.png>. Looks like a possible memory leak?
    In this graph, have you restarted RabbitMQ around Saturday? Or had
    this sharp edge happened without your intervention?
    Yes, that's when I last restarted rabbit. We have to take our site down to do so, and that's not desirable.
    It won't hurt if you upgraded to the newest RabbitMQ release - 1.8.0.
    ...and, more importantly, to Erlang/OTP R13B03 - unless you are running that or a more recent Erlang release already.
    According to that graph I'm going to have to restart the node soon anyway, so I'll try that, but if it doesn't fix a known leak that's unlikely to help, right? As for Erlang, we're running R13B-4.
    There might be several causes of the behaviour you described. First,
    your specific usage patterns might not play well with garbage
    collection. For example, we force full garbage collection run when the
    queue is unused for more than 10 seconds. If your queue is constantly
    pinged - you're left to Erlang default garbage collection. That works
    well most of the time, but sometimes it's not optimal.
    That's possible, but Erlang's default garbage collector is a stop-the-world (per-process) collector, so it's shouldn't be *that* dependent on load, it just just introduce pauses per-queue (which would actually be fine for our use-case)
    Second thing, currently the broker keeps all currently handled
    messages in memory. If your RabbitMQ instance handle 70K messages,
    they are all stored in RAM. If that's the case, you could consider
    running our experimental code branch, which tries to page out messages
    to disk and save RAM.
    Right, that's why I said that we *don't* do that anymore. I thought someone might remember me from before (when we were doing that) and say "well you have this big queue so don't do that", and I wanted to pre-empt that by saying that we don't do it anymore so that I could avoid writing this paragraph right here. So to be clear, we have 13 queues that all sit at 0 backlog basically all of the time. Certainly no queues that grow at the same rate our memory does.
  • Matthias Radestock at Jul 1, 2010 at 8:23 pm
    David,

    David King wrote:
    this sharp edge happened without your intervention?
    Yes, that's when I last restarted rabbit. We have to take our site
    down to do so, and that's not desirable.
    If you just let it run, what happens?
    According to that graph I'm going to have to restart the node soon
    anyway, so I'll try that, but if it doesn't fix a known leak that's
    unlikely to help, right? As for Erlang, we're running R13B-4.
    We have seen a few instances of peculiar interaction between the rabbit
    code and the Erlang VM when it comes to memory usage. Even minor and
    unrelated changes can alter the memory consumption pattern
    significantly. It is for that reason, and in order to make it easier for
    us to investigate the problem further, that we recommend running the
    latest release of Rabbit. As for Erlang/OTP, R13B-4 is recent enough.
    your specific usage patterns might not play well with
    garbage collection. For example, we force full garbage collection
    run when the queue is unused for more than 10 seconds. If your
    queue is constantly pinged - you're left to Erlang default garbage
    collection. That works well most of the time, but sometimes it's
    not optimal.
    That's possible, but Erlang's default garbage collector is a
    stop-the-world (per-process) collector, so it's shouldn't be *that*
    dependent on load, it just just introduce pauses per-queue (which
    would actually be fine for our use-case)
    The default Erlang gc performs garbage collections after a certain
    number of reductions, and that number is so high that a queue process
    which does work infrequently may not perform a gc for a very long time
    indeed. That's particularly noticeable when the message sizes are large.
    What's the average and max message size in your set up?
    So to be clear, we have 13
    queues that all sit at 0 backlog basically all of the time. Certainly
    no queues that grow at the same rate our memory does.
    How are you measuring the queue length? With 'rabbitmqctl list_queues'?

    Are any of the other items which can be listed with rabbitmqctl -
    exchanges, bindings, connections, channels, consumers - growing?

    Also, have you installed any plug-ins?


    Regards,

    Matthias.
  • David King at Jul 3, 2010 at 6:13 pm

    If you just let it run, what happens?
    I don't know. I guess we can see, but my guess is that it will wake me up at two in the morning to fix the site. That's what usually happens when rabbit crashes, anyway.
    The default Erlang gc performs garbage collections after a certain number of reductions, and that number is so high that a queue process which does work infrequently may not perform a gc for a very long time indeed. That's particularly noticeable when the message sizes are large. What's the average and max message size in your set up?
    They're just IDs, so they'll be between 5 and 10 bytes each.
    So to be clear, we have 13
    queues that all sit at 0 backlog basically all of the time. Certainly
    no queues that grow at the same rate our memory does.
    How are you measuring the queue length? With 'rabbitmqctl list_queues'? Yes
    Are any of the other items which can be listed with rabbitmqctl - exchanges, bindings, connections, channels, consumers - growing?
    That doesn't appear to be true, no
    Also, have you installed any plug-ins?
    No
  • Matthias Radestock at Jul 3, 2010 at 8:52 pm
    David,

    David King wrote:
    If you just let it run, what happens?
    I don't know. I guess we can see, but my guess is that it will wake
    me up at two in the morning to fix the site. That's what usually
    happens when rabbit crashes, anyway.
    As long as you are running rabbit 1.8.0, and given the usage pattern you
    describe, it is unlikely that rabbit would actually crash, though it may
    refuse to accept messages from producers (which, I guess, would still be
    a problem). It would be informative to find out whether it actually
    reaches that point.

    What memory limit does rabbit think it has? Check the rabbit log for an
    info report along the lines of

    =INFO REPORT==== 29-Oct-2009::15:43:27 ===
    Memory limit set to 2048MB.

    Also, are there any memory alarms in the logs? Look for entries like

    =INFO REPORT==== 3-Jul-2010::21:28:31 ===
    alarm_handler: {set,{vm_memory_high_watermark,[]}}


    As Marek mentioned your graphs would seem to indicate that rabbit is
    consuming a few gigs of memory right after startup. That doesn't look
    right. A healthy rabbit should eat much less memory than that. For
    example, the rabbit on my development machine starts with a 80MB virtual
    memory footprint, 25 of which is resident.

    I am assuming the graph shows the memory usage of the entire machine.
    You said that the memory "does belong to beam.smp", but do you have a
    graph, or even just some data points for only that process?


    If the above doesn't yield any clues, perhaps you could grant one of the
    rabbit engineers temporary access to the machine so they can delve into
    the innards of rabbit to figure out where the memory is going? This
    shouldn't affect the operation of your service.
    What's the average and max message size in your set up?
    They're just IDs, so they'll be between 5 and 10 bytes each.
    Are any of the messages published as persistent? Another reason to
    upgrade to 1.8.0 is that there only messages published as persistent
    that go to durable queues actually hit the persister - a change we made
    partially as a result of the investigation into your previous problems.
    How are you measuring the queue length? With 'rabbitmqctl
    list_queues'?
    How frequently do you run rabbitmqctl?


    Regards,

    Matthias.
  • Marek Majkowski at Jul 2, 2010 at 1:22 pm

    On Thu, Jul 1, 2010 at 20:01, David King wrote:
    I have an EC2 node running rabbitmq 1.7.2. Take a look at its memory over the last month <http://i.imgur.com/LMf02.png>. Looks like a possible memory leak?
    In this graph, have you restarted RabbitMQ around Saturday? Or had
    this sharp edge happened without your intervention?
    Yes, that's when I last restarted rabbit. We have to take our site down to do so, and that's not desirable.
    Hold on, if you don't have any messages in the queues why does rabbit
    eat 3 gigs from the start?
    It won't hurt if you upgraded to the newest RabbitMQ release - 1.8.0.
    ...and, more importantly, to Erlang/OTP R13B03 - unless you are running that or a more recent Erlang release already.
    According to that graph I'm going to have to restart the node soon anyway, so I'll try that, but if it doesn't fix a known leak that's unlikely to help, right? As for Erlang, we're running R13B-4.
    R13B3 introduced a major redesign to garbage collection for binaries. See here:
    http://www.lshift.net/blog/2009/12/01/garbage-collection-in-erlang


    Cheers,
    Marek

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouprabbitmq-discuss @
categoriesrabbitmq
postedJul 1, '10 at 5:31a
activeJul 3, '10 at 8:52p
posts8
users3
websiterabbitmq.com
irc#rabbitmq

People

Translate

site design / logo © 2022 Grokbase