FAQ
Solaris 10, RAC 10.2.0.3. Using IPMP groups for NIC redundancy.

We've been conducting failover testing -- disabling a HBA port, power
off a switch,
yank an IC link, etc.

In every single case, CRS rebooted the server where the dire deed was performed,
and when the server came back up, the repair was successful, e.g. failed over to
the secondary HBA port, or the physical IP for the IPMP group floated
to the standby
NIC and so forth.

The other server stayed up and all Oracle components remained
available. During
the switch power off test, the physical IP for the IC actually
floated over to the
standby NIC with no outage on this server.

Is this what is to be expected? CRS will always reboot a server to repair
itself when an underlying hardware failure is detected?

Thanks,
Jeff

Search Discussions

  • Bradd Piontek at May 30, 2008 at 3:48 pm
    Jeff,
    Are the pieces you are failing redundant in nature? For example, multiple
    HBAs, switches etc? We had some issues in our fail-over testing that had to
    do with Service Processor fail-over and it was due to a Linux kernel issue
    and nmi watchdog processes (again, this was on linux). Without redundancy in
    the components you mentioned, I would expect CRS to reboot the node. What
    are you using for OCR and Voting Disk?

    --
    Bradd Piontek
    Twitter: http://www.twitter.com/piontekdd
    Oracle Blog: http://piontekdd.blogspot.com
    Linked In: http://www.linkedin.com/in/piontekdd
    Last.fm: http://www.last.fm/user/piontekdd/

    On Fri, May 30, 2008 at 10:21 AM, Jeffery Thomas
    wrote:
    Solaris 10, RAC 10.2.0.3. Using IPMP groups for NIC redundancy.

    We've been conducting failover testing -- disabling a HBA port, power
    off a switch,
    yank an IC link, etc.

    In every single case, CRS rebooted the server where the dire deed was
    performed,
    and when the server came back up, the repair was successful, e.g. failed
    over to
    the secondary HBA port, or the physical IP for the IPMP group floated
    to the standby
    NIC and so forth.

    The other server stayed up and all Oracle components remained
    available. During
    the switch power off test, the physical IP for the IC actually
    floated over to the
    standby NIC with no outage on this server.

    Is this what is to be expected? CRS will always reboot a server to repair
    itself when an underlying hardware failure is detected?

    Thanks,
    Jeff
    --
    http://www.freelists.org/webpage/oracle-l

    --
    http://www.freelists.org/webpage/oracle-l
  • William Wagman at May 30, 2008 at 3:57 pm
    Greetings,



    I don't know how or when the crs decides it is going to reboot the node
    but if you kill the crsd.bin process the node will reboot. That is part
    of it's job I think.



    Bill Wagman
    Univ. of California at Davis
    IET Campus Data Center
    wjwagman_at_ucdavis.edu
    (530) 754-6208

    From: oracle-l-bounce_at_freelists.org
    On Behalf Of Bradd Piontek
    Sent: Friday, May 30, 2008 8:49 AM
    To: jeffthomas24_at_gmail.com
    Cc: oracle-l
    Subject: Re: Failover testing with 10g RAC



    Jeff,
    Are the pieces you are failing redundant in nature? For example,
    multiple HBAs, switches etc? We had some issues in our fail-over testing
    that had to do with Service Processor fail-over and it was due to a
    Linux kernel issue and nmi watchdog processes (again, this was on
    linux). Without redundancy in the components you mentioned, I would
    expect CRS to reboot the node. What are you using for OCR and Voting
    Disk?
    --
    Bradd Piontek
    Twitter: http://www.twitter.com/piontekdd
    Oracle Blog: http://piontekdd.blogspot.com
    Linked In: http://www.linkedin.com/in/piontekdd
    Last.fm: http://www.last.fm/user/piontekdd/

    On Fri, May 30, 2008 at 10:21 AM, Jeffery Thomas
    wrote:

    Solaris 10, RAC 10.2.0.3. Using IPMP groups for NIC redundancy.

    We've been conducting failover testing -- disabling a HBA port, power
    off a switch,
    yank an IC link, etc.

    In every single case, CRS rebooted the server where the dire deed was
    performed,
    and when the server came back up, the repair was successful, e.g. failed
    over to
    the secondary HBA port, or the physical IP for the IPMP group floated
    to the standby
    NIC and so forth.

    The other server stayed up and all Oracle components remained
    available. During
    the switch power off test, the physical IP for the IC actually
    floated over to the
    standby NIC with no outage on this server.

    Is this what is to be expected? CRS will always reboot a server to
    repair
    itself when an underlying hardware failure is detected?

    Thanks,
    Jeff
    --
    http://www.freelists.org/webpage/oracle-l

    --
    http://www.freelists.org/webpage/oracle-l
  • Jeffery Thomas at May 30, 2008 at 3:58 pm
    Bradd --

    All components are redundant. For the OCR/VD we are using raw devices.

    After the server reboot, the redundant component is picked up and all is fine.
    I wasn't sure if this was expected behavior or if it should have happened more
    transparently.

    Thanks,
    Jeff
    On Fri, May 30, 2008 at 11:48 AM, Bradd Piontek wrote:
    Jeff,
    Are the pieces you are failing redundant in nature? For example, multiple
    HBAs, switches etc? We had some issues in our fail-over testing that had to
    do with Service Processor fail-over and it was due to a Linux kernel issue
    and nmi watchdog processes (again, this was on linux). Without redundancy in
    the components you mentioned, I would expect CRS to reboot the node. What
    are you using for OCR and Voting Disk?
    --
    Bradd Piontek
    Twitter: http://www.twitter.com/piontekdd
    Oracle Blog: http://piontekdd.blogspot.com
    Linked In: http://www.linkedin.com/in/piontekdd
    Last.fm: http://www.last.fm/user/piontekdd/
    --
    http://www.freelists.org/webpage/oracle-l

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouporacle-l @
categoriesoracle
postedMay 30, '08 at 3:21p
activeMay 30, '08 at 3:58p
posts4
users3
websiteoracle.com

People

Translate

site design / logo © 2022 Grokbase