Skip to navigation

RHEL 5 or 6 cluster node "lost contact with quorum device"

Updated 2014-04-10T17:41:22+00:00

Issue

  • A cluster node loses quorum after rebooting, removing, or fencing another node from the cluster
  • What should the cman setting quorum_dev_poll be set to?
  • My cluster lost contact with the quorum device
        openais[11663]: [CMAN ] lost contact with quorum device
        openais[11664]: [CMAN ] cman killed by node 1 because we were killed by cman_tool or other application
  • After disconnecting one path in a multipath map for the quorum device, the node loses quorum after "lost contact with quorum device" and the cluster services stop rather than continuing to run or relocating to another node, as expected.
Aug  1 03:32:23 node1 kernel: qla2xxx 0000:04:00.0: LOOP DOWN detected (4 3 0 0).
Aug  1 03:32:40 node1 openais[12015]: [logging.c:0042] lost contact with quorum device
Aug  1 03:32:40 node1 openais[12015]: [logging.c:0042] quorum lost, blocking activity
Aug  1 03:32:40 node1 clurgmgrd[12095]: <emerg> #1: Quorum Dissolved
Aug  1 03:32:40 node11 clurgmgrd[12095]: <debug> Emergency stop of service:myService
  • After a node is fenced, the standby node attempts to recover the service, but the operation fails after openais reports "lost contact with quorum device" and quorum is lost:
Feb 19 20:56:25 node2 fenced[8126]: fence "node1" success      
Feb 19 20:56:27 node2 clurgmgrd[12213]: <notice> Taking over service service:myService from down member node1
Feb 19 20:56:29 node2 qdiskd[8108]: <info> Assuming master role
Feb 19 20:56:30 node2 openais[8075]: [CMAN ] lost contact with quorum device    
Feb 19 20:56:30 node2 openais[8075]: [CMAN ] quorum lost, blocking activity
Feb 19 20:56:30 node2 clurgmgrd[12213]: <emerg> #1: Quorum Dissolved       
Feb 19 20:56:30 node2 ccsd[8069]: Cluster is not quorate.  Refusing connection.
Feb 19 20:56:30 node2 ccsd[8069]: Error while processing connect: Connection refused
Feb 19 20:56:30 node2 ccsd[8069]: Invalid descriptor specified (-111).
Feb 19 20:56:30 node2 ccsd[8069]: Someone may be attempting something evil.
Feb 19 20:56:30 node2 ccsd[8069]: Error while processing get: Invalid request descriptor
Feb 19 20:56:30 node2 ccsd[8069]: Invalid descriptor specified (-21).
Feb 19 20:56:30 node2 ccsd[8069]: Someone may be attempting something evil.
Feb 19 20:56:30 node2 ccsd[8069]: Error while processing disconnect: Invalid request descriptor
Feb 19 20:56:32 node2 qdiskd[8108]: <notice> Writing eviction notice for node 1
Feb 19 20:56:32 node2 openais[8075]: [CMAN ] quorum regained, resuming activity
Feb 19 20:56:33 node2 clurgmgrd[12213]: <err> #75: Failed changing service status  
  • quorum disk lost connectivity and node fenced

Environment

  • Red Hat Enterprise Linux (RHEL) 5 with the High Availability Add On
  • Red Hat Enterprise Linux (RHEL) 6 with the High Availability Add On
  • A configuration utilizing a quorum device (<quorumd> in /etc/cluster/cluster.conf)

Subscriber content preview. For full access to the Red Hat Knowledgebase, please log in.

Not a subscriber? Learn more about the benefits of Red Hat Subscriptions.