FAQ
Hi all, we're trying to install CentOS 5.1 on a Sun Ultra 40. This is
an AMD-powered machine and we're using the x86_64 version of CentOS
5.1. The machine is using the NVidia CK804 chipset and has SATA disks.
It also has 16GB's of memory which prompted us to upgrade the BIOS on
the machine from 1.1 to 1.6 per this (we also have the two quadro
cards):

http://docs.sun.com/source/819-3954-18/index.html#0_37092

The install goes fine up until the installer is trying to format the
disks. Part of the way through it simply dies and pops up an error
saying that the installer couldn't format the LVM volume and we have to
reboot.

An examination of dmesg output shows that there are many SATA errors
occuring at this point. Timeouts and such.

After a reboot, the SATA drive no longer shows up -- not even in BIOS.
It's as if the formatting has instructed the drive to deactivate
itself. :) A hard reset and reseat of the drive in the SATA enclosure
brings it back again.

First thought was that the slot or SATA port was bad, so we have moved
to others with the same result.

Solaris 10 x86 installs perfectly on this machine, so I'm starting to
think that the sata_nv driver is to blame here.

We're in the process of trying 32-bit CentOS 5.1 on the system just for
giggles, and may try Fedora 8 as well or RHEL 5.1 and use our paid
support to track this issue down, but thought I'd run it by everyone
here.

Didn't see any existing issues in bugzilla.redhat.com or
bugs.centos.org.

Any insights on this?

I will get the exact error messages posted up here soon (output from
dmesg, etc).

TIA,
Ray

Search Discussions

  • James A. Peltier at Dec 10, 2007 at 6:30 pm

    Ray Van Dolson wrote:
    Hi all, we're trying to install CentOS 5.1 on a Sun Ultra 40. This is
    an AMD-powered machine and we're using the x86_64 version of CentOS
    5.1. The machine is using the NVidia CK804 chipset and has SATA disks.
    It also has 16GB's of memory which prompted us to upgrade the BIOS on
    the machine from 1.1 to 1.6 per this (we also have the two quadro
    cards):

    http://docs.sun.com/source/819-3954-18/index.html#0_37092

    The install goes fine up until the installer is trying to format the
    disks. Part of the way through it simply dies and pops up an error
    saying that the installer couldn't format the LVM volume and we have to
    reboot.

    An examination of dmesg output shows that there are many SATA errors
    occuring at this point. Timeouts and such.

    After a reboot, the SATA drive no longer shows up -- not even in BIOS.
    It's as if the formatting has instructed the drive to deactivate
    itself. :) A hard reset and reseat of the drive in the SATA enclosure
    brings it back again.

    First thought was that the slot or SATA port was bad, so we have moved
    to others with the same result.

    Solaris 10 x86 installs perfectly on this machine, so I'm starting to
    think that the sata_nv driver is to blame here.

    We're in the process of trying 32-bit CentOS 5.1 on the system just for
    giggles, and may try Fedora 8 as well or RHEL 5.1 and use our paid
    support to track this issue down, but thought I'd run it by everyone
    here.

    Didn't see any existing issues in bugzilla.redhat.com or
    bugs.centos.org.

    Any insights on this?

    I will get the exact error messages posted up here soon (output from
    dmesg, etc).

    TIA,
    Ray
    _______________________________________________
    CentOS mailing list
    CentOS@centos.org
    http://lists.centos.org/mailman/listinfo/centos
    Try booting while disabling ACPI and APIC.

    linux noapic noacpi

    at the boot menu.

    --
    James A. Peltier
    Technical Director, RHCE
    SCIRF | GrUVi @ Simon Fraser University - Burnaby Campus
    Phone : 778-782-3610
    Fax : 778-782-3045
    Mobile : 778-840-6434
    E-Mail : jpeltier@cs.sfu.ca
    Website : http://gruvi.cs.sfu.ca | http://scirf.cs.sfu.ca
    MSN : subatomic_spam@hotmail.com
  • Ray Van Dolson at Dec 10, 2007 at 6:36 pm

    Try booting while disabling ACPI and APIC.

    linux noapic noacpi

    at the boot menu.
    We'll give that a try.

    In addition, error message output:

    An error occurred trying to format VolGroup00/LogVol00. This problem is
    serious, and the install cannot continue.

    Press <Enter> to reboot your system.

    dmesg output
    <6>sd 0:0:0:0 SCSI error: return code = 0x00040000
    <4>end_request: I/0 error, dev sda, sector 504029
    <4>printk: 646939 messages suppressed.
    <3>Buffer I/) error on device dm-0, logical block 19596153
    <4>lost page write due to I/O error on dm-0

    Again, it sure looks like a hardware problem, but using x86 Solaris 10
    on the same disk works perfectly.

    Ray
  • James A. Peltier at Dec 10, 2007 at 6:41 pm

    Ray Van Dolson wrote:
    Try booting while disabling ACPI and APIC.

    linux noapic noacpi

    at the boot menu.
    We'll give that a try.

    In addition, error message output:

    An error occurred trying to format VolGroup00/LogVol00. This problem is
    serious, and the install cannot continue.

    Press <Enter> to reboot your system.

    dmesg output
    <6>sd 0:0:0:0 SCSI error: return code = 0x00040000
    <4>end_request: I/0 error, dev sda, sector 504029
    <4>printk: 646939 messages suppressed.
    <3>Buffer I/) error on device dm-0, logical block 19596153
    <4>lost page write due to I/O error on dm-0

    Again, it sure looks like a hardware problem, but using x86 Solaris 10
    on the same disk works perfectly.

    Ray
    _______________________________________________
    CentOS mailing list
    CentOS@centos.org
    http://lists.centos.org/mailman/listinfo/centos
    I have seen errors such as this too with some cheap no name software
    RAID cards. I've also seen it on high end SCSI drives as they do the
    bad block remapping. I'm not saying it's not hardware, especially given
    that it specifically mentions sda sector 504029, but it's worth a shot.
    Solaris might not be reporting the disk remappings or some such thing,
    who knows. Perhaps it's only doing a quick format and not a full format
    therefore not even displaying the messages until content begins to be
    written.

    --
    James A. Peltier
    Technical Director, RHCE
    SCIRF | GrUVi @ Simon Fraser University - Burnaby Campus
    Phone : 778-782-3610
    Fax : 778-782-3045
    Mobile : 778-840-6434
    E-Mail : jpeltier@cs.sfu.ca
    Website : http://gruvi.cs.sfu.ca | http://scirf.cs.sfu.ca
    MSN : subatomic_spam@hotmail.com
  • Ray Van Dolson at Dec 10, 2007 at 7:03 pm
    FYI, no difference with noacpi and noapic. Errors this time indicated
    a different sector, but we'll try with a different SATA disk just to
    eliminate that as a possibility.

    Ray
  • Ross S. W. Walker at Dec 10, 2007 at 7:22 pm

    Ray Van Dolson wrote:
    Hi all, we're trying to install CentOS 5.1 on a Sun Ultra 40. This is
    an AMD-powered machine and we're using the x86_64 version of CentOS
    5.1. The machine is using the NVidia CK804 chipset and has
    SATA disks.
    It also has 16GB's of memory which prompted us to upgrade the BIOS on
    the machine from 1.1 to 1.6 per this (we also have the two quadro
    cards):

    http://docs.sun.com/source/819-3954-18/index.html#0_37092

    The install goes fine up until the installer is trying to format the
    disks. Part of the way through it simply dies and pops up an error
    saying that the installer couldn't format the LVM volume and
    we have to
    reboot.

    An examination of dmesg output shows that there are many SATA errors
    occuring at this point. Timeouts and such.

    After a reboot, the SATA drive no longer shows up -- not even in BIOS.
    It's as if the formatting has instructed the drive to deactivate
    itself. :) A hard reset and reseat of the drive in the SATA enclosure
    brings it back again.

    First thought was that the slot or SATA port was bad, so we have moved
    to others with the same result.

    Solaris 10 x86 installs perfectly on this machine, so I'm starting to
    think that the sata_nv driver is to blame here.

    We're in the process of trying 32-bit CentOS 5.1 on the
    system just for
    giggles, and may try Fedora 8 as well or RHEL 5.1 and use our paid
    support to track this issue down, but thought I'd run it by everyone
    here.

    Didn't see any existing issues in bugzilla.redhat.com or
    bugs.centos.org.

    Any insights on this?

    I will get the exact error messages posted up here soon (output from
    dmesg, etc).

    Try "acpi=noirq" as a kernel argument. Some AMD chipsets have problems
    letting the OS know what irq the 8250 timer is on, the nvidia one is
    definitely a problem, I have the same chipset in a couple of Dell
    Dimension e521 desktops :-(


    -Ross



    ______________________________________________________________________
    This e-mail, and any attachments thereto, is intended only for use by
    the addressee(s) named herein and may contain legally privileged
    and/or confidential information. If you are not the intended recipient
    of this e-mail, you are hereby notified that any dissemination,
    distribution or copying of this e-mail, and any attachments thereto,
    is strictly prohibited. If you have received this e-mail in error,
    please immediately notify the sender and permanently delete the
    original and any copy or printout thereof.
  • Ross S. W. Walker at Dec 10, 2007 at 8:09 pm

    Ross S. W. Walker wrote:
    Ray Van Dolson wrote:
    Hi all, we're trying to install CentOS 5.1 on a Sun Ultra
    40. This is
    an AMD-powered machine and we're using the x86_64 version of CentOS
    5.1. The machine is using the NVidia CK804 chipset and has
    SATA disks.
    It also has 16GB's of memory which prompted us to upgrade
    the BIOS on
    the machine from 1.1 to 1.6 per this (we also have the two quadro
    cards):

    http://docs.sun.com/source/819-3954-18/index.html#0_37092

    The install goes fine up until the installer is trying to format the
    disks. Part of the way through it simply dies and pops up an error
    saying that the installer couldn't format the LVM volume and
    we have to
    reboot.

    An examination of dmesg output shows that there are many SATA errors
    occuring at this point. Timeouts and such.

    After a reboot, the SATA drive no longer shows up -- not
    even in BIOS.
    It's as if the formatting has instructed the drive to deactivate
    itself. :) A hard reset and reseat of the drive in the
    SATA enclosure
    brings it back again.

    First thought was that the slot or SATA port was bad, so we
    have moved
    to others with the same result.

    Solaris 10 x86 installs perfectly on this machine, so I'm
    starting to
    think that the sata_nv driver is to blame here.

    We're in the process of trying 32-bit CentOS 5.1 on the
    system just for
    giggles, and may try Fedora 8 as well or RHEL 5.1 and use our paid
    support to track this issue down, but thought I'd run it by everyone
    here.

    Didn't see any existing issues in bugzilla.redhat.com or
    bugs.centos.org.

    Any insights on this?

    I will get the exact error messages posted up here soon (output from
    dmesg, etc).

    Try "acpi=noirq" as a kernel argument. Some AMD chipsets have problems
    letting the OS know what irq the 8250 timer is on, the nvidia one is
    definitely a problem, I have the same chipset in a couple of Dell
    Dimension e521 desktops :-(
    My explaination wasn't totally accurate. The acpi=noirq disables the
    ACPI IRQ routing table lookup for IRQ redirects and reprogramming. Some
    AMD chipsets had a bug in the way this table was built that caused 2.6
    kernels to fail in getting a hook into the table which caused all kinds
    of intermittent problems. By disabling this feature you run the possibility
    of IRQ conflicts that will need to use the IRQ management in the BIOS to
    resolve. Updating the BIOS of the system sometimes fixes the problem.

    It just turns out that the system timer irq was my "symptom" that I
    experienced, but it is different for different systems/configurations.

    -Ross

    ______________________________________________________________________
    This e-mail, and any attachments thereto, is intended only for use by
    the addressee(s) named herein and may contain legally privileged
    and/or confidential information. If you are not the intended recipient
    of this e-mail, you are hereby notified that any dissemination,
    distribution or copying of this e-mail, and any attachments thereto,
    is strictly prohibited. If you have received this e-mail in error,
    please immediately notify the sender and permanently delete the
    original and any copy or printout thereof.
  • Ray Van Dolson at Dec 10, 2007 at 8:19 pm

    Try "acpi=noirq" as a kernel argument. Some AMD chipsets have problems
    letting the OS know what irq the 8250 timer is on, the nvidia one is
    definitely a problem, I have the same chipset in a couple of Dell
    Dimension e521 desktops :-(
    My explaination wasn't totally accurate. The acpi=noirq disables the
    ACPI IRQ routing table lookup for IRQ redirects and reprogramming. Some
    AMD chipsets had a bug in the way this table was built that caused 2.6
    kernels to fail in getting a hook into the table which caused all kinds
    of intermittent problems. By disabling this feature you run the possibility
    of IRQ conflicts that will need to use the IRQ management in the BIOS to
    resolve. Updating the BIOS of the system sometimes fixes the problem.

    It just turns out that the system timer irq was my "symptom" that I
    experienced, but it is different for different systems/configurations.
    Thanks Ross. This does indeed make some sense. Currently we're trying
    with another known-good SATA disk just to rule that out then we'll give
    this a shot as well.

    Ray
  • Ray Van Dolson at Dec 10, 2007 at 11:17 pm

    Try "acpi=noirq" as a kernel argument. Some AMD chipsets have problems
    letting the OS know what irq the 8250 timer is on, the nvidia one is
    definitely a problem, I have the same chipset in a couple of Dell
    Dimension e521 desktops :-(
    Alright, this didn't solve the issue either. So far:

    - noacpi noapic (No effect)
    - New SATA drive (No effect)
    - acpi=noirq (No effect)
    - BIOS update to 1.6 (No effect)

    Grabbing a copy of RHEL 5.1 now and we'll throw this over to RH support
    I guess and file something in bugzilla over there.

    Gotta be something funky in sata_nv.

    Thanks everyone for your responses.

    Ray
  • Ray Van Dolson at Dec 11, 2007 at 7:37 pm
    Hi all, for anyone interested in following this -- ended up having the
    same issue with RHEL 5.1 and opened a bug (and support request):

    https://bugzilla.redhat.com/show_bug.cgi?idB0361

    Ray
  • James A. Peltier at Dec 13, 2007 at 11:05 pm

    Ray Van Dolson wrote:
    Hi all, for anyone interested in following this -- ended up having the
    same issue with RHEL 5.1 and opened a bug (and support request):

    https://bugzilla.redhat.com/show_bug.cgi?idB0361

    Ray
    _______________________________________________
    CentOS mailing list
    CentOS@centos.org
    http://lists.centos.org/mailman/listinfo/centos
    I was able to use the CentOS 5.0 images to install on the 6 SUN Ultra 40
    workstations that I have here in my labs. This allowed me to get around
    the 5.1 issues, and I performed a yum upgrade to get them to 5.1

    I would be interested in further details regarding the fix though if you
    could please. :)

    --
    James A. Peltier
    Technical Director, RHCE
    SCIRF | GrUVi @ Simon Fraser University - Burnaby Campus
    Phone : 778-782-3610
    Fax : 778-782-3045
    Mobile : 778-840-6434
    E-Mail : jpeltier@cs.sfu.ca
    Website : http://gruvi.cs.sfu.ca | http://scirf.cs.sfu.ca
    MSN : subatomic_spam@hotmail.com
  • Ray Van Dolson at Dec 14, 2007 at 1:41 am

    On Thu, Dec 13, 2007 at 03:05:47PM -0800, James A. Peltier wrote:
    Ray Van Dolson wrote:
    Hi all, for anyone interested in following this -- ended up having the
    same issue with RHEL 5.1 and opened a bug (and support request):

    https://bugzilla.redhat.com/show_bug.cgi?idB0361

    Ray
    _______________________________________________
    CentOS mailing list
    CentOS@centos.org
    http://lists.centos.org/mailman/listinfo/centos
    I was able to use the CentOS 5.0 images to install on the 6 SUN Ultra 40
    workstations that I have here in my labs. This allowed me to get around
    the 5.1 issues, and I performed a yum upgrade to get them to 5.1

    I would be interested in further details regarding the fix though if you
    could please. :)
    James are your machines Ultra 40 M2's or just plain ol Ultra 40's?
    Ours is the non-M2 model and we had problems with 5.0 as well as 5.1.

    However, it appears to be something related to LVM -- specifically
    something the GUI installer is telling LVM to do. If we manually set
    up the partitions and LVM stuff (from the console while still at the
    welcome screen), everything works fine. It's only when anaconda
    handles the partitioning and LVM setup that we run into issues.

    Interesting.

    Ray
  • James A. Peltier at Dec 14, 2007 at 4:32 am

    Ray Van Dolson wrote:
    James are your machines Ultra 40 M2's or just plain ol Ultra 40's?
    Ours is the non-M2 model and we had problems with 5.0 as well as 5.1.

    However, it appears to be something related to LVM -- specifically
    something the GUI installer is telling LVM to do. If we manually set
    up the partitions and LVM stuff (from the console while still at the
    welcome screen), everything works fine. It's only when anaconda
    handles the partitioning and LVM setup that we run into issues.

    Interesting.

    Ray
    _______________________________________________
    CentOS mailing list
    CentOS@centos.org
    http://lists.centos.org/mailman/listinfo/centos
    They're regular ol' SUN Ultra 40s. Running BIOS version 1.400 (I
    believe) nothing special 2 x Dual Core Opteron 240s running at 2.4GHz
    w/8GB of RAM.

    When trying to install, my experience nearly mirrors yours perfectly.
    Everything is all fine and dandy until it gets about 50% way through
    formatting and then throws the ATA errors. When booting rescue with 5.1
    I am able to format the already created LVM volume without issue,
    however it fails with the same errors mentioned when kickstarted.

    I just performed an install today on a SUN Ultra 40 and the 5.0 trick
    worked without issue. Perhaps something in your BIOS? Try loading the
    default settings and trying again?
  • Ray Van Dolson at Dec 28, 2007 at 1:01 am
    So it initially appeared that manually setting up the partitions and
    LVM seemed to resolve the problem. However, at the time we had swapped
    out to a different (smaller) hard disk.

    When we returned to the original 250GB Sun provided disk, even manually
    configuring the partitions, LVM and running mkfs.ext3 caused the port
    to "shut off" and we could not complete the installation.

    On a whim, we went into BIOS and disabled LBA for the drive. Lo and
    behold everyhing worked perfectly during the installation -- fully
    automated from anaconda.

    However -- the system wouldn't boot after the install completed
    successfully.

    To resolve this we had to go into BIOS and re-add the hard disk to the
    list of bootable devices (weird that it had been removed) and then
    RE-enable LBA (it will not boot with LBA disabled, doesn't find the
    boot loader at all).

    Now the system boots.

    So the trick appears to be:

    1. Disable LBA
    2. Install
    3. Re-enable LBA
    4. Ensure drive is in boot device list

    Ray

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcentos @
categoriescentos
postedDec 10, '07 at 6:26p
activeDec 28, '07 at 1:01a
posts14
users3
websitecentos.org
irc#centos

People

Translate

site design / logo © 2022 Grokbase