Cluster doesn't fence when quorum is lost

Solution Verified - Updated -

Issue

  • We have configured the quorum disk as per best practices in tech brief How to Optimally Configure a Quorum Disk, but while disconnecting the fiber channel cables from one of the servers, the cluster detects the quorum is lost, but no fence occurs.

  • After following messages, cluster node gets hung but qdisk on another node does not evict the affected node which had lost connection with fiber channel devices:

    Jul 14 12:00:51 node1 kernel: sd 2:0:0:0: SCSI error: return code = 0x00010000
    Jul 14 12:00:51 node1 multipathd: sdw: directio checker reports path is down 
    Jul 14 12:00:51 node1 multipathd: sdx: directio checker reports path is down 
    Jul 14 12:00:51 node1 kernel: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
    [...]
    

Environment

  • Red Hat Enterprise Linux 5, 6 with High Availability Add-Ons
  • Quorum disk.
  • DM-Multipath configured with queue_if_no_path option.

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In
Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.