Hardware:
48 core AMD Magny Cours (4x12)
128G 1333MHz memory
34 15k6 drives, 2 hot spares, rest in RAID-1 pairs, 1 set for OS, 4
for pg_xlog, rest for /data/base
LSI 8888 RAID controller
OS:
Ubuntu 10.04

uname -a
Linux bigassdbserver 2.6.32-24-generic #38-Ubuntu SMP Mon Jul 5
09:20:59 UTC 2010 x86_64 GNU/Linux

scheduler = noop for all drive sets.
Settings for sysctl.conf:
vm.zone_reclaim_mode = 0
kernel.shmmax = 33554432000
kernel.shmall = 2097152000
kernel.shmmni = 4096
vm.swappiness = 0
vm.dirty_ratio = 2
vm.dirty_background_ratio = 1

$ free
total used free shared buffers cached
Mem: 131651412 104986524 26664888 0 910804 91170764
-/+ buffers/cache: 12904956 118746456
Swap: 0 0 0

(swap is now off with sudo swapoff -a, it fixed the problem)

It's twin, the read slave, looks like this:

$ free
total used free shared buffers cached
Mem: 131651412 110364700 21286712 0 702144 96771656
-/+ buffers/cache: 12890900 118760512
Swap: 25388024 940 25387084

So, this morning, the machine goes into 100% swap usage. four kswapds
are running at 100% CPU in mostly D state. Load climbs to 300.
Server gets a little slow. Swapoff -a fixes it.

This makes no sense to me. The machine had 90G+ in kernel cache, and
was NOT running out of memory in any way. Swappiness is 0.

Any advice on this, reporting it to the kernel guys etc welcome.

--
To understand recursion, one must first understand recursion.

Search Discussions

  • Allan Kamau at Oct 7, 2010 at 7:46 pm

    On Thu, Oct 7, 2010 at 9:11 PM, Scott Marlowe wrote:
    Hardware:
    48 core AMD Magny Cours (4x12)
    128G 1333MHz memory
    34 15k6 drives, 2 hot spares, rest in RAID-1 pairs, 1 set for OS, 4
    for pg_xlog, rest for /data/base
    LSI 8888 RAID controller
    OS:
    Ubuntu 10.04

    uname -a
    Linux bigassdbserver 2.6.32-24-generic #38-Ubuntu SMP Mon Jul 5
    09:20:59 UTC 2010 x86_64 GNU/Linux

    scheduler = noop for all drive sets.
    Settings for sysctl.conf:
    vm.zone_reclaim_mode = 0
    kernel.shmmax = 33554432000
    kernel.shmall = 2097152000
    kernel.shmmni = 4096
    vm.swappiness = 0
    vm.dirty_ratio = 2
    vm.dirty_background_ratio = 1

    $ free
    total       used       free     shared    buffers     cached
    Mem:     131651412  104986524   26664888          0     910804   91170764
    -/+ buffers/cache:   12904956  118746456
    Swap:            0          0          0

    (swap is now off with sudo swapoff -a, it fixed the problem)

    It's twin, the read slave, looks like this:

    $ free
    total       used       free     shared    buffers     cached
    Mem:     131651412  110364700   21286712          0     702144   96771656
    -/+ buffers/cache:   12890900  118760512
    Swap:     25388024        940   25387084

    So, this morning, the machine goes into 100% swap usage.  four kswapds
    are running at 100% CPU in mostly D state.  Load climbs to 300.
    Server gets a little slow.  Swapoff -a fixes it.

    This makes no sense to me.  The machine had 90G+ in kernel cache, and
    was NOT running out of memory in any way.  Swappiness is 0.

    Any advice on this, reporting it to the kernel guys etc welcome.

    --
    To understand recursion, one must first understand recursion.

    --
    Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-general


    My wild guess is that Ubuntu may be to blame. Try restarting PG and
    chances are that it would not solve the problem, meaning that it is
    most likely an OS issue. I had similar experiences on PostgreSQL
    server hosted on Ubuntu. After a couple of days having the computer
    running "free -g" would display no (or a very few) free GBs of RAM.
    With Fedora I have not noticed this problem. For some reason I seem to
    have issues with Ubuntu/Kubuntu but not Fedora.


    Allan.
  • Scott Marlowe at Oct 7, 2010 at 7:49 pm

    On Thu, Oct 7, 2010 at 1:46 PM, Allan Kamau wrote:

    My wild guess is that Ubuntu may be to blame. Try restarting PG and
    chances are that it would not solve the problem, meaning that it is
    most likely an OS issue. I had similar experiences on PostgreSQL
    server hosted on Ubuntu. After a couple of days having the computer
    running "free -g" would display no (or a very few) free GBs of RAM.
    With Fedora I have not noticed this problem. For some reason I seem to
    have issues with Ubuntu/Kubuntu but not Fedora.
    I definitely would tend to agree, but I'm more suspicious of a late
    model kernel than the specific distro. Note that this machine has 60
    days of uptime with no behaviour like this before. For now I'm just
    running it with swap turned off. It's got 128Gig of ram, if it runs
    out of that I've got other problems. :)
    --
    To understand recursion, one must first understand recursion.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-general @
categoriespostgresql
postedOct 7, '10 at 6:11p
activeOct 7, '10 at 7:49p
posts3
users2
websitepostgresql.org
irc#postgresql

2 users in discussion

Scott Marlowe: 2 posts Allan Kamau: 1 post

People

Translate

site design / logo © 2021 Grokbase