qdiskd takes a long time to initialize and cluster is inquorate while waiting in RHEL 5 or 6
Issue
- When I start one node in my cluster by itself, I expect it to gain quorum with the quorum device votes. But it looks like it takes a long time for
qdiskd
to finish Initializing, and so the node stays inquorate for a long time:
Nov 11 15:42:09 hostname qdiskd[13063]: <info> Quorum Partition: /dev/disk/by-id/scsi-1IET_00020002 Label: rummyqdisk
Nov 11 15:42:09 hostname qdiskd[13064]: <info> Quorum Daemon Initializing
Nov 11 15:42:09 hostname qdiskd[13064]: <debug> I/O Size: 512 Page Size: 4096
Nov 11 15:42:10 hostname qdiskd[13064]: <info> Heuristic: '/bin/ping -c1 -w1 192.168.143.1' UP
Nov 11 15:42:10 hostname ccsd[13038]: Cluster is not quorate. Refusing connection.
Nov 11 15:42:10 hostname ccsd[13038]: Error while processing connect: Connection refused
[...]
Nov 11 15:42:49 hostname qdiskd[13064]: <info> Initial score 1/1
Nov 11 15:42:49 hostname qdiskd[13064]: <info> Initialization complete
Nov 11 15:42:49 hostname openais[13045]: [CMAN ] quorum device registered
Nov 11 15:42:49 hostname qdiskd[13064]: <notice> Score sufficient for master operation (1/1; required=1); upgrading
Nov 11 15:42:49 hostname ccsd[13038]: Cluster is not quorate. Refusing connection.
Nov 11 15:42:49 hostname ccsd[13038]: Error while processing connect: Connection refused
[...]
Nov 11 15:43:09 hostname qdiskd[13064]: <debug> Making bid for master
Nov 11 15:43:09 hostname ccsd[13038]: Cluster is not quorate. Refusing connection.
Nov 11 15:43:09 hostname ccsd[13038]: Error while processing connect: Connection refused
[...]
Nov 11 15:43:29 hostname qdiskd[13064]: <info> Assuming master role
Nov 11 15:43:29 hostname ccsd[13038]: Cluster is not quorate. Refusing connection.
Nov 11 15:43:29 hostname ccsd[13038]: Error while processing connect: Connection refused
[...]
Nov 11 15:43:39 hostname openais[13045]: [CMAN ] quorum regained, resuming activity
- When I start some nodes in the cluster (but not all), I see a lot of "Cluster is not quorate" messages and then other services like
clvmd
,gfs2
, andrgmanager
fail to start
Environment
- Red Hat Enterprise Linux (RHEL) 5 or 6 with the High Availability Add On
- Cluster configured with a quorum device (
<quorumd>
in/etc/cluster/cluster.conf
) - Cluster nodes starting up asynchronously (not at the same time).
cman
releases prior to3.0.12.1-68.el6
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.