Why does my Red Hat Cluster Suite node get fenced when my qdisk heuristic fails once and my quorum disk has a tko greater than one?
Issue
- Cluster nodes intermittently and unexpectedly fenced at seemingly random intervals.
-
Node was fenced and the following messages are seen in the logs:
-
Node 1:
fenced[15406]: node2 not a cluster member after 0 sec post_fail_delay fenced[15406]: fencing node "node2" fenced[15406]: fence "node2" success clurgmgrd[15444]: <notice> Taking over service service1 from down member node2 -
Node 2:
qdiskd[12327]: <info> Heuristic: 'ping 10.0.0.1 -c1 -w1' DOWN (1/1) qdiskd[1232: <notice> Score insufficient for master operation (0/1; required=1); downgrading qdiskd[12327]: <info> Heuristic: 'ping 10.0.0.1 -c1 -w1' UP <node rebooted> syslogd 1.4.1: restart.
-
-
When the heuristic ping seems evaluating only once even the tko=9. I think it is using the default value of 21 seconds to leave the cluster. Are there other values that I need to change? Or cluster should not function as I observed.
-
Assuming that you set the interval=5 and tko=9, when heuristic all fail, it suppose cycle for about 45 seconds ( interval * tko) before leaving the cluster. But what really happening, it took only 21 seconds then the system will reboot, before the reboot I seen this info on the log file:
qdiskd[28072]: <info> Heuristic: '/bin/ping 10.0.0.1 -c1 -w1' DOWN (1/1) qdiskd[28072]: <notice> Score insufficient for master operation (0/1; required=1); downgrading </code>
-
Environment
- Red Hat Cluster Suite 4+
- Red Hat Enterprise Linux 5 Advanced Platform (Clustering)
- Quorum disk for tiebreaker for the network.
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
