Cluster resource management is delayed after a node is fenced due to a resource failure in a RHEL 7 High Availability cluster
Issue
- We have a high
tokensetting in our cluster, and whenever a node gets fenced due to a resource failure, the cluster blocks for a long time before resuming activity. The delay seems to be waiting for thattokentimeout, when eventuallycorosyncprocesses a membership change and activity resumes. - Cluster doesn't recover any resources for a period of time after a node gets fenced due to failing a resource op
- I see my cluster resource sitting failed/stopped when a node gets fenced after a stop failure, and doesn't get recovered for a few minutes.
Environment
- Red Hat Enterprise Linux (RHEL) 7 with the High Availability Add-On
pacemaker- Some resource configured with an
opsetting ofon-fail=fence- NOTE: This is the default for
op stopon all resources unless otherwise specified in the configuration by administrators
- NOTE: This is the default for
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.