FAQ
Hi All,

I am looking for advice on how to cure a constantly-crashing NFS server
which crashes every few hours, or at least, every few days. The kernel
log file (below) points toward NFS as a likely cause.

The system disk is a 3ware 8000 series RAID1 mirror. The data disk is
using a 3Ware 9000 controller to produce two RAID1 devices; these are
then striped (RAID0) in software to form a RAID 10 device. We're using
a 2.6 kernel, xfs filesystem, and NFS3/UDP.

We're running CentOS 4.2 with a 2.6.9-22.0.1.106 kernel. This kernel
has xfs extensions, and we're running the xfs filesystem for /home
(obtained from CentOS website).

In "lsmod" I see both 3w_xxxx and 3w_9xxx modules.

NFS is over UDP, jumbo frames (9000), 32k rsize/wsize, async server,
async clients, noac.

This system has been serving /home in this configuration since October
2005; we've seen it crash rarely, but uptimes were usually on the order
of months. This past week, it can't seem to remain up for much longer
than about a day.

Kernel log file containing the crash:

Feb 12 05:52:03 tier2-home kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000000
Feb 12 05:52:03 tier2-home kernel: printing eip:
Feb 12 05:52:03 tier2-home kernel: 00000000
Feb 12 05:52:03 tier2-home kernel: *pde = f561f067
Feb 12 05:52:03 tier2-home kernel: Oops: 0000 [#1]
Feb 12 05:52:03 tier2-home kernel: PREEMPT SMP
Feb 12 05:52:03 tier2-home kernel: Modules linked in: nfs nfsd exportfs
lockd md5 ipv6 autofs4 i2c_dev i2c_core sunrpc xfs dm_mirror dm_mod
button battery ac uhci_hcd shpchp e100
0 floppy ext3 jbd 3w_9xxx 3w_xxxx sd_mod scsi_mod
Feb 12 05:52:03 tier2-home kernel: CPU: 0
Feb 12 05:52:03 tier2-home kernel: EIP: 0060:[<00000000>] Not
tainted VLI
Feb 12 05:52:03 tier2-home kernel: EFLAGS: 00010286
(2.6.9-22.0.1.106.unsupportedsmp)
Feb 12 05:52:03 tier2-home kernel: EIP is at 0x0
Feb 12 05:52:03 tier2-home kernel: eax: e48ec050 ebx: c040a260 ecx:
00000000 edx: ecf89344
Feb 12 05:52:03 tier2-home kernel: esi: ecf89344 edi: f4ad4f00 ebp:
00000000 esp: f4ad4ee4
Feb 12 05:52:03 tier2-home kernel: ds: 007b es: 007b ss: 0068
Feb 12 05:52:03 tier2-home kernel: Process nfsd (pid: 2839,
threadinfo=f4ad4000 task=f4afd6b0)
Feb 12 05:52:03 tier2-home kernel: Stack: c01649d7 e48ec050 ffffffff
c3e300b3 0028fd9d ecf89c2c c0164a4b 0028fd9d
Feb 12 05:52:03 tier2-home kernel: 00000003 c3e300b0 ecf89c2c
e48ec0c0 e48ec050 ecf89c2c f4ac6804 f8c8bdf3
Feb 12 05:52:03 tier2-home kernel: c3e300b0 c31d6a00 f7dc1700
c31d6a00 f89902e7 f4ab6000 f4ac6800 f4ac69d4
Feb 12 05:52:03 tier2-home kernel: Call Trace:
Feb 12 05:52:03 tier2-home kernel: [<c01649d7>] __lookup_hash+0x70/0x89
Feb 12 05:52:03 tier2-home kernel: [<c0164a4b>] lookup_one_len+0x54/0x63
Feb 12 05:52:03 tier2-home kernel: [<f8c8bdf3>] nfsd_lookup+0x31c/0x3a8
[nfsd]
Feb 12 05:52:03 tier2-home kernel: [<f89902e7>]
svcauth_unix_set_client+0xa7/0xb5 [sunrpc]
Feb 12 05:52:03 tier2-home kernel: [<f8c93351>]
nfsd3_proc_lookup+0xa9/0xb3 [nfsd]
Feb 12 05:52:03 tier2-home kernel: [<f8c952aa>]
nfs3svc_decode_diropargs+0x0/0xfa [nfsd]
Feb 12 05:52:03 tier2-home kernel: [<f8c896a2>]
nfsd_dispatch+0xba/0x170 [nfsd]
Feb 12 05:52:03 tier2-home kernel: [<f898d459>] svc_process+0x41b/0x6ce
[sunrpc]
Feb 12 05:52:03 tier2-home kernel: [<f8c89482>] nfsd+0x1cc/0x332 [nfsd]
Feb 12 05:52:03 tier2-home kernel: [<f8c892b6>] nfsd+0x0/0x332 [nfsd]
Feb 12 05:52:03 tier2-home kernel: [<c0104205>]
kernel_thread_helper+0x5/0xb
Feb 12 05:52:03 tier2-home kernel: Code: Bad EIP value.
Feb 12 05:52:03 tier2-home kernel: <0>Fatal exception: panic in 5 seconds
Feb 13 08:50:06 tier2-home kernel: klogd 1.4.1, log source = /proc/kmsg
started.
Feb 13 08:50:06 tier2-home kernel: Linux version
2.6.9-22.0.1.106.unsupportedsmp (buildcentos@louisa.home.local) (gcc
version 3.4.4 20050721 (Red Hat 3.4.4-2)) #1 SMP Sun Nov 6 1
3:58:14 CST 2005
Feb 13 08:50:06 tier2-home kernel: BIOS-provided physical RAM map:

...and the reboot continues normally.

Search Discussions

  • Chris Mauritz at Feb 13, 2006 at 10:08 pm

    Andrew Zahn wrote:
    Hi All,

    I am looking for advice on how to cure a constantly-crashing NFS
    server which crashes every few hours, or at least, every few days. The
    kernel log file (below) points toward NFS as a likely cause.

    The system disk is a 3ware 8000 series RAID1 mirror. The data disk is
    using a 3Ware 9000 controller to produce two RAID1 devices; these are
    then striped (RAID0) in software to form a RAID 10 device. We're
    using a 2.6 kernel, xfs filesystem, and NFS3/UDP.

    We're running CentOS 4.2 with a 2.6.9-22.0.1.106 kernel. This kernel
    has xfs extensions, and we're running the xfs filesystem for /home
    (obtained from CentOS website).

    In "lsmod" I see both 3w_xxxx and 3w_9xxx modules.

    NFS is over UDP, jumbo frames (9000), 32k rsize/wsize, async server,
    async clients, noac.

    This system has been serving /home in this configuration since October
    2005; we've seen it crash rarely, but uptimes were usually on the
    order of months. This past week, it can't seem to remain up for much
    longer than about a day.

    Kernel log file containing the crash:

    Feb 12 05:52:03 tier2-home kernel: Unable to handle kernel NULL
    pointer dereference at virtual address 00000000
    Feb 12 05:52:03 tier2-home kernel: printing eip:
    Feb 12 05:52:03 tier2-home kernel: 00000000
    Feb 12 05:52:03 tier2-home kernel: *pde = f561f067
    Feb 12 05:52:03 tier2-home kernel: Oops: 0000 [#1]
    Feb 12 05:52:03 tier2-home kernel: PREEMPT SMP
    Feb 12 05:52:03 tier2-home kernel: Modules linked in: nfs nfsd
    exportfs lockd md5 ipv6 autofs4 i2c_dev i2c_core sunrpc xfs dm_mirror
    dm_mod button battery ac uhci_hcd shpchp e100
    0 floppy ext3 jbd 3w_9xxx 3w_xxxx sd_mod scsi_mod
    Feb 12 05:52:03 tier2-home kernel: CPU: 0
    <snip>

    Hmmm...does it also crash if you run a non-SMP kernel? Did you update
    the kernel around the same time as the instability began?

    Cheers,
  • Peter Kjellström at Feb 14, 2006 at 12:54 pm

    On Monday 13 February 2006 22:44, Andrew Zahn wrote:
    Hi All,

    I am looking for advice on how to cure a constantly-crashing NFS server
    which crashes every few hours, or at least, every few days. The kernel
    log file (below) points toward NFS as a likely cause.

    The system disk is a 3ware 8000 series RAID1 mirror. The data disk is
    using a 3Ware 9000 controller to produce two RAID1 devices; these are
    then striped (RAID0) in software to form a RAID 10 device. We're using
    a 2.6 kernel, xfs filesystem, and NFS3/UDP.

    We're running CentOS 4.2 with a 2.6.9-22.0.1.106 kernel. This kernel
    has xfs extensions, and we're running the xfs filesystem for /home
    One thing to consider is that the xfs module in current centosplus kernels is
    the same as kernel.org 2.6.9, that is, ancient. I never got 2.6.9 xfs stable
    for non-trivial loads and configurations.

    /Peter
    (obtained from CentOS website).

    In "lsmod" I see both 3w_xxxx and 3w_9xxx modules.

    NFS is over UDP, jumbo frames (9000), 32k rsize/wsize, async server,
    async clients, noac.

    This system has been serving /home in this configuration since October
    2005; we've seen it crash rarely, but uptimes were usually on the order
    of months. This past week, it can't seem to remain up for much longer
    than about a day.

    Kernel log file containing the crash:
    ...
    --
    ------------------------------------------------------------
    Peter Kjellstr?m |
    National Supercomputer Centre |
    Sweden | http://www.nsc.liu.se
    -------------- next part --------------
    A non-text attachment was scrubbed...
    Name: not available
    Type: application/pgp-signature
    Size: 189 bytes
    Desc: not available
    Url : http://lists.centos.org/pipermail/centos/attachments/20060214/b240b493/attachment.bin
  • Johnny Hughes at Feb 14, 2006 at 1:10 pm

    On Tue, 2006-02-14 at 13:54 +0100, Peter Kjellstr?m wrote:
    On Monday 13 February 2006 22:44, Andrew Zahn wrote:
    Hi All,

    I am looking for advice on how to cure a constantly-crashing NFS server
    which crashes every few hours, or at least, every few days. The kernel
    log file (below) points toward NFS as a likely cause.

    The system disk is a 3ware 8000 series RAID1 mirror. The data disk is
    using a 3Ware 9000 controller to produce two RAID1 devices; these are
    then striped (RAID0) in software to form a RAID 10 device. We're using
    a 2.6 kernel, xfs filesystem, and NFS3/UDP.

    We're running CentOS 4.2 with a 2.6.9-22.0.1.106 kernel. This kernel
    has xfs extensions, and we're running the xfs filesystem for /home
    One thing to consider is that the xfs module in current centosplus kernels is
    the same as kernel.org 2.6.9, that is, ancient. I never got 2.6.9 xfs stable
    for non-trivial loads and configurations.

    /Peter
    You might consider trying the main line 2.6.9-22.0.2.EL (or
    2.6.9-22.0.1.EL) kernel and the kernel-module for xfs in our testing
    repo (if the only reason you are using the centosplus kernel is xfs).

    This xfs code is from SGI and is a newer than the code in the 2.6.9
    kernel.

    When CentOS-4.3 is released, the new modules will be released in for the
    centosplus kernel and there will be modules to run xfs on the main line
    kernels as well.

    Thanks,
    Johnny Hughes


    -------------- next part --------------
    A non-text attachment was scrubbed...
    Name: not available
    Type: application/pgp-signature
    Size: 189 bytes
    Desc: This is a digitally signed message part
    Url : http://lists.centos.org/pipermail/centos/attachments/20060214/b17f0a66/attachment.bin

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcentos @
categoriescentos
postedFeb 13, '06 at 9:44p
activeFeb 14, '06 at 1:10p
posts4
users4
websitecentos.org
irc#centos

People

Translate

site design / logo © 2022 Grokbase