Cluster doesn't fence when quorum is lost

Solution Verified - Updated -

Issue

  • We have configured the quorum disk as per best practices in tech brief How to Optimally Configure a Quorum Disk, but while disconnecting the fiber channel cables from one of the servers, the cluster detects the quorum is lost, but no fence occurs.

  • After following messages, cluster node gets hung but qdisk on another node does not evict the affected node which had lost connection with fiber channel devices:

    Jul 14 12:00:51 node1 kernel: sd 2:0:0:0: SCSI error: return code = 0x00010000
    Jul 14 12:00:51 node1 multipathd: sdw: directio checker reports path is down 
    Jul 14 12:00:51 node1 multipathd: sdx: directio checker reports path is down 
    Jul 14 12:00:51 node1 kernel: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
    [...]
    

Environment

  • Red Hat Enterprise Linux 5, 6 with High Availability Add-Ons
  • Quorum disk.
  • DM-Multipath configured with queue_if_no_path option.

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content