A resource in a negatively colocated group can remain stopped if it fails to start or hits its migration threshold in a Pacemaker cluster

Solution In Progress - Updated -

Issue

Assume the following resource configuration with two resource groups.

[root@fastvm-rhel-8-0-23 pacemaker]# pcs config | egrep '(Group|Resource|Meta Attrs):'
 Group: dummya
  Resource: dummya_1 (class=ocf provider=heartbeat type=Dummy)
  Resource: dummya_2 (class=ocf provider=heartbeat type=Dummy)
 Group: dummyb
  Resource: dummyb_1 (class=ocf provider=heartbeat type=Dummy)
  Resource: dummyb_2 (class=ocf provider=heartbeat type=Dummy)

Assume that one group (dummyb) is colocated with another (dummya) with a negative, non-INFINITY colocation score.

[root@fastvm-rhel-8-0-23 pacemaker]# pcs constraint colocation
Colocation Constraints:
  dummyb with dummya (score:-5000)

If resource dummyb_2 fails to start, the start-failure-is-fatal=true cluster property prevents it from running on its current node again. It remains stopped on its current node. The dummyb group does not fail over and allow resource dummyb_2 to start on the other node as expected.

The stopped resource has messages in logs like the following:

Sep  5 20:20:12 fastvm-rhel-8-0-24 pacemaker-schedulerd[342774]: warning: Forcing dummyb_2 away from node2 after 1000000 failures (max=1000000)

Note: This can also happen if the resource fails its monitor operation enough times to reach its migration-threshold. The migration-threshold meta attribute defaults to INFINITY (defined as 1000000), but it can be configured explicitly to a lower value.

Environment

  • Red Hat Enterprise Linux 7 (with the High Availability Add-on)
  • Red Hat Enterprise Linux 8 (with the High Availability Add-on)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content