Cluster doesn't fence when quorum is lost
Issue
-
We have configured the quorum disk as per best practices in tech brief How to Optimally Configure a Quorum Disk, but while disconnecting the fiber channel cables from one of the servers, the cluster detects the quorum is lost, but no fence occurs.
-
After following messages, cluster node gets hung but qdisk on another node does not evict the affected node which had lost connection with fiber channel devices:
Jul 14 12:00:51 node1 kernel: sd 2:0:0:0: SCSI error: return code = 0x00010000 Jul 14 12:00:51 node1 multipathd: sdw: directio checker reports path is down Jul 14 12:00:51 node1 multipathd: sdx: directio checker reports path is down Jul 14 12:00:51 node1 kernel: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK [...]
Environment
- Red Hat Enterprise Linux 5, 6 with High Availability Add-Ons
- Quorum disk.
- DM-Multipath configured with
queue_if_no_path
option.
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.