Resource or resource group is moved to a new node instead of being recovered when two related resources fail together, but the dependent resource failure is detected first, in a RHEL 6 or 7 High Availability cluster

Solution Unverified - Updated 2024-08-05T06:12:39+00:00 -

Issue

I have a Filesystem resource and another resource that depends on it. If the file system fails, then both resources are going to fail their next monitor operation, but if the dependent resource failure is detected first, then pacemaker tries to recover it and when it can't (because the Filesystem hasn't been recovered yet), it moves the whole resource group to a new node.
If two resources are related to each other and one must be recovered before the other, there's no guarantee with pacemaker that the monitor operation will be detected before the other. So, you might have one resource failing, and then it can't be restarted because its "parent" is still in a bad state.

Red Hat Enterprise Linux (RHEL) 6 or 7 with the High Availability Add On
pacemaker
One or more resources that depends on another resource, such that if the dependency ("parent") fails, the dependent ("child") will fail as well, and they'd both need to be recovered
- This is often manifested through either:
  - Two resources ordered in a specific way within a resource group, such as a Filesystem and another for the software that uses the file system. Or
  - Two resources associated with each other through an ordering constraint

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.