Cluster doesn't fence when quorum is lost
Issue
-
We have configured the quorum disk as per best practices in tech brief How to Optimally Configure a Quorum Disk, but while disconnecting the fiber channel cables from one of the servers, the cluster detects the quorum is lost, but no fence occurs.
-
After following messages, cluster node gets hung but qdisk on another node does not evict the affected node which had lost connection with fiber channel devices:
Jul 14 12:00:51 node1 kernel: sd 2:0:0:0: SCSI error: return code = 0x00010000 Jul 14 12:00:51 node1 multipathd: sdw: directio checker reports path is down Jul 14 12:00:51 node1 multipathd: sdx: directio checker reports path is down Jul 14 12:00:51 node1 kernel: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK [...]
Environment
- Red Hat Enterprise Linux 5, 6 with High Availability Add-Ons
- Quorum disk.
- DM-Multipath configured with
queue_if_no_pathoption.
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
