FAQ
Hi all,

today we've encountered quite strange issues with memory allocation on
one of our VPS running CentOS 6.2. So far I've been unable to determine
what's causing it - hopefully someone here will know what's up.

The VPS is a "small" machine - just 512MB of RAM, 1 CPU, running 6.2.
with current kernel (2.6.32-220.4.2.el6.x86_64, but I've tried the
2.6.32-71.el6.x86_64 too). There's a quite common stack installed, i.e.
Nainstalov?n je na n?m celkem standardn? stack - apache, php,
postgresql, postfix, dovecot, memcached and ssh. Basically nothing
exotic, everything from official repos (except from postfix and
postgresql). The machine is not heavily used.

More detailed logs (than posted here) are available here:

http://pastebin.com/vYxRUyUX

We've been hitting some I/O utilization issues (cause by other VPS
instances on the same hw) so we've migrated to a different physical hw.
After the migration, the VPS started failing because of memory alloc
issues - the services fail either at startup time or when processing the
requests - although there's enough free mem:

[root at vps audit]# free
total used free shared buffers cached
Mem: 502728 294224 208504 0 18604 163608
-/+ buffers/cache: 112012 390716
Swap: 0 0 0

i.e. about 200 MB of free memory, but apache fails because of segfaults
when forking a child process:

[16:49:51 2012] [error] (12)Cannot allocate memory: fork: Unable to
fork new process
[16:51:17 2012] [notice] child pid 2577 exit signal Segmentation
fault (11)

or when processing requests:

[26 16:30:16 2012] [error] [client 66.249.72.1] PHP Fatal error: Out
of memory (allocated 262144) (tried to allocate 523800 bytes) in
Unknown on line 0

The memory_limit in PHP is set to 32MB, so it's not the case. Similar
issues happen to PostgreSQL:

16:42:01 CET pid%04 db=xxxxxx-drupal user=xxxxxx FATAL: out of
memory
16:42:01 CET pid%04 db=xxxxxx-drupal user=xxxxxx DETAIL: Failed on
request of size 2488.
16:42:01 CET pid$38 db= user= LOG: could not fork new process for
connection: Nelze alokovat pam??
16:42:01 CET pid$38 db= user= 4f4a5247.986:21 LOG: could not fork
new process for connection: cannot allocate memory

I have absolutely no clue what's causing this / how to fix it. According
to free/vmstat there's about 200MB of free RAM all the time, so I have
no idea why the alloc calls fail.

What makes is even more puzzling is that after adding a swapfile, all
the issues suddenly disappear, although the swapfile is not used at all
... and it's not possible to disable it because of memory alloc.


# dd if=/dev/zero of=swap.img bs24 count@9600
# mkswap swap.img
# swapon swap.img

... now the services are starting fine ...

# swapon -s

Filename Type Size Used Priority
/root/swap.img file 399992 0 -1

# free
total used free shared buffers cached
Mem: 503412 294192 209220 0 11740 99980
-/+ buffers/cache: 182472 320940
Swap: 399992 0 399992

# swapoff swap.img
swapoff: swap.img: swapoff selhal: Nelze alokovat pam??

Any ideas what might cause this?

The fact that I haven't noticed these issues before the migration are
probably caused by a swap file - I've manually added it during a
maintenance and forgot to remove it after that, but it disappeared when
the machine was rebooted during migration.

There's a SELinux enable, but I doubt it's causing the issues - there's
nothing in audit logs except for an information that there was a
segfault. Nothing suspicious.

Otherwise it's just a standard CentOS install, the only thing I had to
tune a bit were kernel limits (in sysctl.conf) related to shared memory
(because of the database). Currently there's

kernel.shmmax = 68719476736
kernel.shmall = 134217728
vm.swappiness = 0
vm.overcommit_memory = 2

which should be fine IMHO ... any ideas?

regards
Tom??

Search Discussions

  • Peter Kjellström at Feb 27, 2012 at 5:26 am
    On Sunday 26 February 2012 19.59.07 Tomas Vondra wrote:
    ...
    i.e. about 200 MB of free memory, but apache fails because of segfaults
    when forking a child process:

    [16:49:51 2012] [error] (12)Cannot allocate memory: fork: Unable to
    fork new process
    [16:51:17 2012] [notice] child pid 2577 exit signal Segmentation
    fault (11)
    In general things can get quite bad with relatively high memory pressure and
    no swap.

    That said, one thing that comes to mind is stacksize. When forking the linux
    kernel needs whatever the current stacksize is to be available as (free + free
    swap).

    Also, just because you see Y bytes free doesn't mean you can successfully
    malloc that much (fragmentation, memory zones, etc.).

    /Peter
    or when processing requests:

    [26 16:30:16 2012] [error] [client 66.249.72.1] PHP Fatal error: Out
    of memory (allocated 262144) (tried to allocate 523800 bytes) in
    Unknown on line 0

    The memory_limit in PHP is set to 32MB, so it's not the case. Similar
    issues happen to PostgreSQL:

    16:42:01 CET pid%04 db=xxxxxx-drupal user=xxxxxx FATAL: out of
    memory
    16:42:01 CET pid%04 db=xxxxxx-drupal user=xxxxxx DETAIL: Failed on
    request of size 2488.
    16:42:01 CET pid$38 db= user= LOG: could not fork new process for
    connection: Nelze alokovat pam??
    16:42:01 CET pid$38 db= user= 4f4a5247.986:21 LOG: could not fork
    new process for connection: cannot allocate memory

    I have absolutely no clue what's causing this / how to fix it. According
    to free/vmstat there's about 200MB of free RAM all the time, so I have
    no idea why the alloc calls fail.
    -------------- next part --------------
    A non-text attachment was scrubbed...
    Name: not available
    Type: application/pgp-signature
    Size: 198 bytes
    Desc: This is a digitally signed message part.
    Url : http://lists.centos.org/pipermail/centos/attachments/20120227/d6388eb6/attachment.bin
  • Tomas Vondra at Feb 27, 2012 at 6:57 am

    On 27 ?nor 2012, 11:26, Peter Kjellstr?m wrote:
    On Sunday 26 February 2012 19.59.07 Tomas Vondra wrote:
    ...
    i.e. about 200 MB of free memory, but apache fails because of segfaults
    when forking a child process:

    [16:49:51 2012] [error] (12)Cannot allocate memory: fork: Unable to
    fork new process
    [16:51:17 2012] [notice] child pid 2577 exit signal Segmentation
    fault (11)
    In general things can get quite bad with relatively high memory pressure
    and
    no swap.
    Sure, but there's no such pressure. There was almost 200MB of "free"
    memory (used for page cache, not dirty thus easy to drop).
    That said, one thing that comes to mind is stacksize. When forking the
    linux
    kernel needs whatever the current stacksize is to be available as (free +
    free
    swap).

    Also, just because you see Y bytes free doesn't mean you can successfully
    malloc that much (fragmentation, memory zones, etc.).
    Yup, I'm aware of that. But it's rather improbable, especially given the
    other symptoms.

    Update: After submitting the original post, I've noticed that these issues
    probably started about a week ago after upgrading a kernel and several
    related packages. I've had a swap there and the issues were not as severe,
    so I haven't noticed that before. I do remember I got an OOM error during
    that upgrade and I thought I've dealt with it properly, but maybe not. So
    I've reinstalled (remove+install) all those packages, rebooted and the
    problems disappeared. I will check that in the evening, but hopefully it's
    fixed.

    kind regards
  • Tomas Vondra at Feb 28, 2012 at 5:01 pm

    On 27.2.2012 12:57, Tomas Vondra wrote:
    On 27 ?nor 2012, 11:26, Peter Kjellstr?m wrote:
    On Sunday 26 February 2012 19.59.07 Tomas Vondra wrote:
    ...
    i.e. about 200 MB of free memory, but apache fails because of segfaults
    when forking a child process:

    [16:49:51 2012] [error] (12)Cannot allocate memory: fork: Unable to
    fork new process
    [16:51:17 2012] [notice] child pid 2577 exit signal Segmentation
    fault (11)
    In general things can get quite bad with relatively high memory pressure
    and
    no swap.
    Sure, but there's no such pressure. There was almost 200MB of "free"
    memory (used for page cache, not dirty thus easy to drop).
    That said, one thing that comes to mind is stacksize. When forking the
    linux
    kernel needs whatever the current stacksize is to be available as (free +
    free
    swap).

    Also, just because you see Y bytes free doesn't mean you can successfully
    malloc that much (fragmentation, memory zones, etc.).
    Yup, I'm aware of that. But it's rather improbable, especially given the
    other symptoms.

    Update: After submitting the original post, I've noticed that these issues
    probably started about a week ago after upgrading a kernel and several
    related packages. I've had a swap there and the issues were not as severe,
    so I haven't noticed that before. I do remember I got an OOM error during
    that upgrade and I thought I've dealt with it properly, but maybe not. So
    I've reinstalled (remove+install) all those packages, rebooted and the
    problems disappeared. I will check that in the evening, but hopefully it's
    fixed.
    Well, I've found the actual issue. It clearly was my stupidity as I was
    messing with overcommit_memory without fully understanding it.

    What I did was that I set (as mentioned in the original post)

    vm.overcommit_memory = 2

    which limits the amount of available memory to

    swap + vm.overcommit_ratio * RAM

    where vm.overcommit_ratioP by default, so you can allocate swap + 1/2
    the physical memory. This is just fine if you have a swap - for example
    if you have swap size equal to RAM, this means 150% of RAM is available
    for processes.

    The issues start when you disable swap (as I did) - then it effectively
    limits the available memory to 50% of physical RAM (and receive OOM if
    you try to allocate more. This is exactly what happened to me :-(

    So what I did was that I set

    vm.overcommit_ratio = 100

    which gives me 100% of RAM. I know this will give me an OOM if I use all
    the physical RAM, but that's expected - I don't want to use swap on a
    virtual machine with poor I/O (and the services are set accordingly).

    So the moral is don't mess with something you don't fully understand.

    kind regards
    Tomas

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcentos @
categoriescentos
postedFeb 26, '12 at 1:59p
activeFeb 28, '12 at 5:01p
posts4
users2
websitecentos.org
irc#centos

2 users in discussion

Tomas Vondra: 3 posts Peter Kjellström: 1 post

People

Translate

site design / logo © 2022 Grokbase