Ok,

This may the wrong place to look for answers to this, but I figured it
couldn't hurt...so here goes:

On friday we upgraded a critical backend server to postgresql 8.2
running on fedora core 4. Since then we have received three kernel
panics during periods of moderate to high load (twice during the
pg_dump backup run).

Platform is IBM x360 series running SCSI, software raid on the backplane.

After the first crash we yum updated the system which obviously did
not fix the problem. I was leaning hardware problem until this last
time and I was able to catch the following off the terminal:

BUG: spinlock recursion CPU0 postmaster...not tainted.
bunch of other stuff ending in:
Kernel Panic: not syncing: Bad locking

One of the other developers snapped a picture of the kernel panic with
his digital camera and is going to send over the pictures when he gets
home this evening.

Has anybody seen any problem like this or have any suggestions about
possible resolution...should I be posting to the LKML? Any
suggestions are welcome and appreciated.

At this juncture we are going to downgrade the postmaster back to 8.1
and see if that fixes the panics. If it doesn't this discussion is
over but if it does we are extremely curious about looking for a fix
for this issue...we have about 8 weeks of development that is on hold
until we can put a 8.2 server in production. Management has already
authorized a new server but they want a 100% guarantee this is going
to fix the problem.

thanks in advance,
merlin

Search Discussions

  • Devrim GÃNDÃZ at Feb 23, 2007 at 10:33 pm

    On Fri, 2007-02-23 at 17:14 -0500, Merlin Moncure wrote:

    BUG: spinlock recursion CPU0 postmaster...not tainted. <snip>
    Has anybody seen any problem like this or have any suggestions about
    possible resolution...should I be posting to the LKML?
    AFAIR (+ some quick Googling), this is related to a problem in kernel.
    You may need to update to a newer Fedora release since FC4 is not
    supported anymore :(.

    Even if you report to LKML, they will probably suggest you using a newer
    kernel. However, I think system will not let you compile a new kernel
    and panic again during a high load... So...

    If you have a free space, install a newer Fedora release on this system,
    mount the existing $PGDATA and try if this fixes the problem...
    --
    Devrim GÜNDÜZ
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support
    Managed Services, Shared and Dedicated Hosting
    Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/
  • Tom Lane at Feb 23, 2007 at 11:14 pm

    "Merlin Moncure" <mmoncure@gmail.com> writes:
    On friday we upgraded a critical backend server to postgresql 8.2
    running on fedora core 4.
    Umm ... why that particular choice of OS? Red Hat dropped update
    support for FC4 some time ago, and AFAIK the Fedora Legacy project
    is not getting things done. How old is the kernel you're using?
    At this juncture we are going to downgrade the postmaster back to 8.1
    and see if that fixes the panics.
    Even assuming that Postgres is related to the panics, I don't think you
    will find anyone maintaining that a kernel panic is not the kernel's
    problem. If an application *is* able to provoke a kernel panic, the
    standard description of the problem would be "critical kernel security
    flaw".

    regards, tom lane
  • CAJ CAJ at Feb 24, 2007 at 1:41 am

    On 2/23/07, Tom Lane wrote:
    "Merlin Moncure" <mmoncure@gmail.com> writes:
    On friday we upgraded a critical backend server to postgresql 8.2
    running on fedora core 4.
    Umm ... why that particular choice of OS? Red Hat dropped update
    support for FC4 some time ago, and AFAIK the Fedora Legacy project
    is not getting things done. How old is the kernel you're using?
    At this juncture we are going to downgrade the postmaster back to 8.1
    and see if that fixes the panics.
    Even assuming that Postgres is related to the panics, I don't think you
    will find anyone maintaining that a kernel panic is not the kernel's
    problem. If an application *is* able to provoke a kernel panic, the
    standard description of the problem would be "critical kernel security
    flaw".

    I vaguely remember running into spinlock problems with FC4 and it wasn't due
    to PostgreSQL. We didn't have database running on FC4.

    If you are running a critical server you should switch to atleast CentOS.
  • Merlin Moncure at Feb 26, 2007 at 1:24 pm

    On 2/23/07, Tom Lane wrote:
    "Merlin Moncure" <mmoncure@gmail.com> writes:
    On friday we upgraded a critical backend server to postgresql 8.2
    running on fedora core 4.
    Umm ... why that particular choice of OS? Red Hat dropped update
    support for FC4 some time ago, and AFAIK the Fedora Legacy project
    is not getting things done. How old is the kernel you're using?
    Linux mojo 2.6.17-1.2142_FC4smp #1 SMP Tue Jul 11 22:57:02 EDT 2006
    i686 i686 i386 GNU/Linux


    Unfortunately, the decision about which kernel to run is more or less
    out of my hands. I would personally really dislike fedora and would
    much prefer to be running centos/redhat as. That said, your comments
    and those of others are very helpul in regards to fixing that.

    we tried update to the latest via yum update with no help.

    as promised, here is the best photo of the panic we could get:
    http://img144.imageshack.us/my.php?image=dumpic6.jpg

    We did an emergency downgrade to 8.1 and will monitor the
    situation...the decision to get a new server has already been made and
    hopefully it will be on a more stable platform.

    big thanks to all who took a few minutes out of their day to lend a hand.

    merlin
  • Devrim GÃNDÃZ at Feb 26, 2007 at 1:56 pm
    Hi,
    On Mon, 2007-02-26 at 08:24 -0500, Merlin Moncure wrote:
    we tried update to the latest via yum update with no help.
    As Tom stated, FC4 is no more supported; therefore you won't be able to
    get newer kernel via yum.
    as promised, here is the best photo of the panic we could get:
    http://img144.imageshack.us/my.php?image=dumpic6.jpg
    ...bad locking...

    The picture reminded me a SCSI driver bug in older kernels -- I google'd
    again now and I saw a post that says "native drivers are being used in
    FC5+ kernels". If this is the real case, you may hit the problem
    sometime later.

    Upgrading OS will probably solve your problem; since there is no way to
    upgrade FC4 kernel unless you want to compile kernel source on your
    system.

    Regards,

    --
    Devrim GÜNDÜZ
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support
    Managed Services, Shared and Dedicated Hosting
    Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedFeb 23, '07 at 10:14p
activeFeb 26, '07 at 1:56p
posts6
users4
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase