QDisk heuristic using ping is timing out when there are no other noticeable issues with the network in a RHEL cluster
Issue
- We have a cluster using
QDiskwith aheuristicpinging the default gateway. Thisheuristicis timing out intermittently, but there are no other signs of issues on that network (such as token losses) at the time of the problem. - What time out value does a
heuristicuse? The amount of time reported in the logs when it times out does not match theheuristic'stko*interval value. - I have a ping heuristic of the following form and occasionally I see a heuristic timeout in
/var/log/messages, followed by the cluster node being evicted and fenced::
<heuristic interval="2" program="ping -c1 -t1 192.168.2.1" score="1" tko="3"/>
Oct 4 00:15:12 node1 qdiskd[6854]: <info> Heuristic: 'ping -c1 -t1 192.168.2.1' DOWN - Exceeded timeout of 9 seconds
Oct 4 00:15:12 node1 qdiskd[6854]: <notice> Score insufficient for master operation (0/1; required=1); downgrading
- Cluster services failover and node gets rebooted unexpectedly in two node cluster with qdisk which has
heuristicconfigured. Found some qdiskd messages logged, what's causing GFS2 cluster to crash?
Environment
- Red Hat Cluster Suite 4+
- Red Hat Enterprise Linux Server 5 (with the High Availability Add on)
- Red Hat Enterprise Linux Server 6 (with the High Availability Add on)
- A cluster configuration using QDisk and a ping heuristic
- Heuristic does not use the
-woption onping
- Heuristic does not use the
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
