FAQ
Hello All.

I've just started looking into Xen and have a test environment in place. I'm seeing an
annoying problem that I thought worthy of a post.

Config:

I have 2 x HP DL585 servers each with 4 Dual core Opterons (non-vmx) and 16GB RAM
configured as Xen servers. These run CentOS 5.1 with the latest updates applied. These
system both attach to an iSCSI target which is an HP DL385 running ietd and serving SAN
based storage.

I have a test VM running CentOS 5.1 also updated.

Problem:

If I run the VM on a single server everything is OK. If I do a migrate of the VM to the
other server I start getting random "BUG: soft lockup detected on CPU#?" messages on the
VM console. The messages seem to happen with IO but not every time. A reboot of the VM
on the new server will stop these messages.

I've also left the VM running overnight a couple of times and when I do I find that any
external sessions (ssh) are hung in the morning but the console session is not. New ssh
sessions can be started and seem to work.

After much googling it looks like the kernel messages can occur if dom0 is very busy but
mine is not.

Any suggestions?

Regards

Brett Worth

Search Discussions

  • Eli Stair at Jan 21, 2008 at 7:11 pm
    My un-authoritative answer: I've been tracking this bug (or several with the
    same symptoms) for going on a couple years. It's ridiculously common,
    apparently well known to the Xen/Xensource guys judging by the number of
    reports/bugs posted, but I haven't seen mention of it actually being addressed
    and resolved. Unfortunately I see the same issue with it cropping up after VM
    moves, though it occurs /every/ time there is a VM migration, once per
    processor in the VM; doesn't matter if there is any IO on the Dom0 or DomU.
    Occasionally VM's die during a migration and have to be manually
    destroyed/restarted.

    I do see evidence of significant instability (not implying it is related to the
    above softlockup issues) however, in either VM moves migrating from a Xeon
    (5345) to Opteron Dom0, and in high-utilization DomU's which are just plain
    flaky and reboot/die semi-frequently even when never altered from their start Dom0.

    For me, it currently means running only low-priority non-production services in
    a VM, and not shelling out for RHEL5 support for the project (contrary to what
    I planned) since it's not being addressed. I'd be curious if this is being
    addressed in the Xen 3.2 release for RHEL5*...

    Cheers,

    /eli



    Brett Worth wrote:
    Hello All.

    I've just started looking into Xen and have a test environment in
    place. I'm seeing an
    annoying problem that I thought worthy of a post.

    Config:

    I have 2 x HP DL585 servers each with 4 Dual core Opterons (non-vmx) and
    16GB RAM
    configured as Xen servers. These run CentOS 5.1 with the latest updates
    applied. These
    system both attach to an iSCSI target which is an HP DL385 running ietd
    and serving SAN
    based storage.

    I have a test VM running CentOS 5.1 also updated.

    Problem:

    If I run the VM on a single server everything is OK. If I do a migrate
    of the VM to the
    other server I start getting random "BUG: soft lockup detected on CPU#?"
    messages on the
    VM console. The messages seem to happen with IO but not every time. A
    reboot of the VM
    on the new server will stop these messages.

    I've also left the VM running overnight a couple of times and when I do
    I find that any
    external sessions (ssh) are hung in the morning but the console session
    is not. New ssh
    sessions can be started and seem to work.

    After much googling it looks like the kernel messages can occur if dom0
    is very busy but
    mine is not.

    Any suggestions?

    Regards

    Brett Worth

    _______________________________________________
    CentOS-virt mailing list
    CentOS-virt@centos.org
    http://lists.centos.org/mailman/listinfo/centos-virt
  • Ross S. W. Walker at Jan 22, 2008 at 5:58 pm

    Brett Worth wrote:
    Hello All.

    I've just started looking into Xen and have a test
    environment in place. I'm seeing an
    annoying problem that I thought worthy of a post.

    Config:

    I have 2 x HP DL585 servers each with 4 Dual core Opterons
    (non-vmx) and 16GB RAM
    configured as Xen servers. These run CentOS 5.1 with the
    latest updates applied. These
    system both attach to an iSCSI target which is an HP DL385
    running ietd and serving SAN
    based storage.

    I have a test VM running CentOS 5.1 also updated.

    Problem:

    If I run the VM on a single server everything is OK. If I do
    a migrate of the VM to the
    other server I start getting random "BUG: soft lockup
    detected on CPU#?" messages on the
    VM console. The messages seem to happen with IO but not
    every time. A reboot of the VM
    on the new server will stop these messages.

    I've also left the VM running overnight a couple of times and
    when I do I find that any
    external sessions (ssh) are hung in the morning but the
    console session is not. New ssh
    sessions can be started and seem to work.

    After much googling it looks like the kernel messages can
    occur if dom0 is very busy but
    mine is not.

    Any suggestions?
    The soft lockup is technically not a BUG.

    You will see these errors if an IRQ takes more then 10 seconds
    to respond.

    In your case I would take a look at your iSCSI setup and the
    time it takes to migrate the VM from one node to another along
    with SCSI reserve/release setup on the iSCSI target.

    I also have been using the Xen 3.2 RPMs off xen.org to CentOS
    5.1 which good results, the VM migration may run smoother and
    quicker in Xen 3.2, but in doing so you take Xen off the
    reservation, if your OK with that it may fix your issues.

    -Ross

    ______________________________________________________________________
    This e-mail, and any attachments thereto, is intended only for use by
    the addressee(s) named herein and may contain legally privileged
    and/or confidential information. If you are not the intended recipient
    of this e-mail, you are hereby notified that any dissemination,
    distribution or copying of this e-mail, and any attachments thereto,
    is strictly prohibited. If you have received this e-mail in error,
    please immediately notify the sender and permanently delete the
    original and any copy or printout thereof.
  • Ross S. W. Walker at Jan 22, 2008 at 7:50 pm

    Ross S. W. Walker wrote:
    Brett Worth wrote:
    Hello All.

    I've just started looking into Xen and have a test
    environment in place. I'm seeing an
    annoying problem that I thought worthy of a post.

    Config:

    I have 2 x HP DL585 servers each with 4 Dual core Opterons
    (non-vmx) and 16GB RAM
    configured as Xen servers. These run CentOS 5.1 with the
    latest updates applied. These
    system both attach to an iSCSI target which is an HP DL385
    running ietd and serving SAN
    based storage.

    I have a test VM running CentOS 5.1 also updated.

    Problem:

    If I run the VM on a single server everything is OK. If I do
    a migrate of the VM to the
    other server I start getting random "BUG: soft lockup
    detected on CPU#?" messages on the
    VM console. The messages seem to happen with IO but not
    every time. A reboot of the VM
    on the new server will stop these messages.

    I've also left the VM running overnight a couple of times and
    when I do I find that any
    external sessions (ssh) are hung in the morning but the
    console session is not. New ssh
    sessions can be started and seem to work.

    After much googling it looks like the kernel messages can
    occur if dom0 is very busy but
    mine is not.

    Any suggestions?
    The soft lockup is technically not a BUG.

    You will see these errors if an IRQ takes more then 10 seconds
    to respond.

    In your case I would take a look at your iSCSI setup and the
    time it takes to migrate the VM from one node to another along
    with SCSI reserve/release setup on the iSCSI target.

    I also have been using the Xen 3.2 RPMs off xen.org to CentOS
    5.1 which good results, the VM migration may run smoother and
    quicker in Xen 3.2, but in doing so you take Xen off the
    reservation, if your OK with that it may fix your issues.
    After seeing this same issue on my Xen 3.2 install, but with NO
    migration or iSCSI happening I decided it is probably NOT iSCSI's
    fault, so I decided to research it a little more and this is what
    I found:

    http://docs.xensource.com/XenServer/4.0.1/guest/ch04s08.html#rhel5_limitations

    XenSource does provide a repo of CentOS 5 kernels that have been
    patched to fix this though:

    http://updates.xensource.com/XenServer/4.0.1/centos5x/

    But these seem to be woefully out of date.

    I wonder if a kind soul would add the fix to the centosplus kernel
    with XenSource's patch so those rogue Xen users could benefit from
    this fix until upstream decides to include it.

    I suppose the centosplus patch would need to be flagged interm in
    case it needs removed when upstream has their own fix.


    -Ross

    ______________________________________________________________________
    This e-mail, and any attachments thereto, is intended only for use by
    the addressee(s) named herein and may contain legally privileged
    and/or confidential information. If you are not the intended recipient
    of this e-mail, you are hereby notified that any dissemination,
    distribution or copying of this e-mail, and any attachments thereto,
    is strictly prohibited. If you have received this e-mail in error,
    please immediately notify the sender and permanently delete the
    original and any copy or printout thereof.
  • Brett Worth at Jan 22, 2008 at 10:21 pm

    Ross S. W. Walker wrote:
    After seeing this same issue on my Xen 3.2 install, but with NO
    migration or iSCSI happening I decided it is probably NOT iSCSI's
    fault, so I decided to research it a little more and this is what
    I found:

    http://docs.xensource.com/XenServer/4.0.1/guest/ch04s08.html#rhel5_limitations
    Thanks Ross. I'll do more research into this when I get back to work.

    The Release Notes say:

    "Soft lockup messages after suspend/resume or live migration (Red Hat
    Bugzilla 250994
    <https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id%0994>). These
    messages are harmless, but there may be a period of inactivity in the
    guest during live migration as a result of the lockup."

    This seems to imply that the delay during the live migration is causing
    the messages. In my case the messages persist forever long after the
    migration. I might take a look at the patched kernel and see it there's
    a specific fix that could be applied to a more recent kernel.

    Brett

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcentos-virt @
categoriescentos
postedJan 20, '08 at 12:27a
activeJan 22, '08 at 10:21p
posts5
users3
websitecentos.org
irc#centos

People

Translate

site design / logo © 2022 Grokbase