Is there a way to limit multipath failover times in order to avoid Oracle RAC cluster evictions?
Issue
- Multipath takes too long to react during SAN failures, exceeding Oracle RAC cluster timeouts and triggering evictions.
The voting disk timeout is 200s
The network heartbeat is 30s (css_misscount)
The SDTO is 27s (short disk timeout)
The SDTO is not public-ally known
Network Ping Disk Ping Reboot
Completes within misscount seconds Completes within Misscount seconds N
Completes within Misscount seconds Takes more than misscount seconds but less than Disktimeout seconds N
Completes within Misscount seconds Takes more than Disktimeout seconds Y
Takes more than Misscount Seconds Completes within Misscount seconds Y
These messages show the SDTO
[ CSSD][xxxxxxx]clssnmPollingThread: local diskTimeout set to 27000 ms, remote disk timeout set to 27000, impending reconfig status(1)
More than disk timeout of 27000 after the last NHB (network heartbeat)
Environment
- Red Hat Enterprise Linux 6
- Red Hat Enterprise Linux 7
- Red Hat Enterprise Linux 8
- Oracle RAC
- Fibre Channel SAN storage
- Exceptions:
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.