I don't know how or when the crs decides it is going to reboot the node
but if you kill the crsd.bin process the node will reboot. That is part
of it's job I think.
Univ. of California at Davis
IET Campus Data Center
On Behalf Of Bradd Piontek
Sent: Friday, May 30, 2008 8:49 AM
Subject: Re: Failover testing with 10g RAC
Are the pieces you are failing redundant in nature? For example,
multiple HBAs, switches etc? We had some issues in our fail-over testing
that had to do with Service Processor fail-over and it was due to a
Linux kernel issue and nmi watchdog processes (again, this was on
linux). Without redundancy in the components you mentioned, I would
expect CRS to reboot the node. What are you using for OCR and Voting
Oracle Blog: http://piontekdd.blogspot.com
Linked In: http://www.linkedin.com/in/piontekdd
On Fri, May 30, 2008 at 10:21 AM, Jeffery Thomas
Solaris 10, RAC 10.2.0.3. Using IPMP groups for NIC redundancy.
We've been conducting failover testing -- disabling a HBA port, power
off a switch,
yank an IC link, etc.
In every single case, CRS rebooted the server where the dire deed was
and when the server came back up, the repair was successful, e.g. failed
the secondary HBA port, or the physical IP for the IPMP group floated
to the standby
NIC and so forth.
The other server stayed up and all Oracle components remained
the switch power off test, the physical IP for the IC actually
floated over to the
standby NIC with no outage on this server.
Is this what is to be expected? CRS will always reboot a server to
itself when an underlying hardware failure is detected?