Cluster resource management is delayed after a node is fenced due to a resource failure in a RHEL 7 High Availability cluster
Issue
- We have a high
tokensetting in our cluster, and whenever a node gets fenced due to a resource failure, the cluster blocks for a long time before resuming activity. The delay seems to be waiting for thattokentimeout, when eventuallycorosyncprocesses a membership change and activity resumes. - Cluster doesn't recover any resources for a period of time after a node gets fenced due to failing a resource op
- I see my cluster resource sitting failed/stopped when a node gets fenced after a stop failure, and doesn't get recovered for a few minutes.
Environment
- Red Hat Enterprise Linux (RHEL) 7 with the High Availability Add-On
pacemaker- Some resource configured with an
opsetting ofon-fail=fence- NOTE: This is the default for
op stopon all resources unless otherwise specified in the configuration by administrators
- NOTE: This is the default for
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
