Why does my Red Hat Cluster Suite node get fenced when my qdisk heuristic fails once and my quorum disk has a tko greater than one?
Issue
- Cluster nodes intermittently and unexpectedly fenced at seemingly random intervals.
-
Node was fenced and the following messages are seen in the logs:
-
Node 1:
fenced[15406]: node2 not a cluster member after 0 sec post_fail_delay fenced[15406]: fencing node "node2" fenced[15406]: fence "node2" success clurgmgrd[15444]: <notice> Taking over service service1 from down member node2
-
Node 2:
qdiskd[12327]: <info> Heuristic: 'ping 10.0.0.1 -c1 -w1' DOWN (1/1) qdiskd[1232: <notice> Score insufficient for master operation (0/1; required=1); downgrading qdiskd[12327]: <info> Heuristic: 'ping 10.0.0.1 -c1 -w1' UP <node rebooted> syslogd 1.4.1: restart.
-
-
When the heuristic ping seems evaluating only once even the tko=9. I think it is using the default value of 21 seconds to leave the cluster. Are there other values that I need to change? Or cluster should not function as I observed.
-
Assuming that you set the interval=5 and tko=9, when heuristic all fail, it suppose cycle for about 45 seconds ( interval * tko) before leaving the cluster. But what really happening, it took only 21 seconds then the system will reboot, before the reboot I seen this info on the log file:
qdiskd[28072]: <info> Heuristic: '/bin/ping 10.0.0.1 -c1 -w1' DOWN (1/1) qdiskd[28072]: <notice> Score insufficient for master operation (0/1; required=1); downgrading </code>
-
Environment
- Red Hat Cluster Suite 4+
- Red Hat Enterprise Linux 5 Advanced Platform (Clustering)
- Quorum disk for tiebreaker for the network.
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.