QDisk heuristic using ping is timing out when there are no other noticeable issues with the network in a RHEL cluster
Issue
- We have a cluster using
QDisk
with aheuristic
pinging the default gateway. Thisheuristic
is timing out intermittently, but there are no other signs of issues on that network (such as token losses) at the time of the problem. - What time out value does a
heuristic
use? The amount of time reported in the logs when it times out does not match theheuristic's
tko*interval value. - I have a ping heuristic of the following form and occasionally I see a heuristic timeout in
/var/log/messages
, followed by the cluster node being evicted and fenced::
<heuristic interval="2" program="ping -c1 -t1 192.168.2.1" score="1" tko="3"/>
Oct 4 00:15:12 node1 qdiskd[6854]: <info> Heuristic: 'ping -c1 -t1 192.168.2.1' DOWN - Exceeded timeout of 9 seconds
Oct 4 00:15:12 node1 qdiskd[6854]: <notice> Score insufficient for master operation (0/1; required=1); downgrading
- Cluster services failover and node gets rebooted unexpectedly in two node cluster with qdisk which has
heuristic
configured. Found some qdiskd messages logged, what's causing GFS2 cluster to crash?
Environment
- Red Hat Cluster Suite 4+
- Red Hat Enterprise Linux Server 5 (with the High Availability Add on)
- Red Hat Enterprise Linux Server 6 (with the High Availability Add on)
- A cluster configuration using QDisk and a ping heuristic
- Heuristic does not use the
-w
option onping
- Heuristic does not use the
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.