A resource in a negatively colocated group can remain stopped if it fails to start or hits its migration threshold in a Pacemaker cluster
Issue
Assume the following resource configuration with two resource groups.
[root@fastvm-rhel-8-0-23 pacemaker]# pcs config | egrep '(Group|Resource|Meta Attrs):'
Group: dummya
Resource: dummya_1 (class=ocf provider=heartbeat type=Dummy)
Resource: dummya_2 (class=ocf provider=heartbeat type=Dummy)
Group: dummyb
Resource: dummyb_1 (class=ocf provider=heartbeat type=Dummy)
Resource: dummyb_2 (class=ocf provider=heartbeat type=Dummy)
Assume that one group (dummyb
) is colocated with another (dummya
) with a negative, non-INFINITY
colocation score.
[root@fastvm-rhel-8-0-23 pacemaker]# pcs constraint colocation
Colocation Constraints:
dummyb with dummya (score:-5000)
If resource dummyb_2
fails to start, the start-failure-is-fatal=true
cluster property prevents it from running on its current node again. It remains stopped on its current node. The dummyb
group does not fail over and allow resource dummyb_2
to start on the other node as expected.
The stopped resource has messages in logs like the following:
Sep 5 20:20:12 fastvm-rhel-8-0-24 pacemaker-schedulerd[342774]: warning: Forcing dummyb_2 away from node2 after 1000000 failures (max=1000000)
Note: This can also happen if the resource fails its monitor operation enough times to reach its migration-threshold
. The migration-threshold
meta attribute defaults to INFINITY
(defined as 1000000
), but it can be configured explicitly to a lower value.
Environment
- Red Hat Enterprise Linux 7, 8, 9 (with the High Availability Add-on)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.