No resources in a cloned group instance can run if a single resource in the instance fails to start in a Pacemaker cluster

Solution In Progress - Updated -

Issue

  • In a pacemaker cluster with a cloned resource group, if one resource later in the group fails to start on a particular node, all the resources earlier in the group stop on that node and do not attempt to restart. This does not happen in non-cloned resource groups. In the example below, the dummy1 resource is in Stopped state because the dummy2 resource failed to start, even though dummy1 does not have an ordering dependency on dummy2.
# pcs status --full
...
Node List:
  * Online: [ node1 (1) node2 (2) ]

Full List of Resources:
  * xvm (stonith:fence_xvm):     Started node1
  * Clone Set: test_grp-clone [test_grp]:
    * Resource Group: test_grp:0:
      * dummy1  (ocf:heartbeat:Dummy):   Stopped
      * dummy2  (ocf:heartbeat:Dummy):   Stopped
    * Resource Group: test_grp:1:
      * dummy1  (ocf:heartbeat:Dummy):   Stopped
      * dummy2  (ocf:heartbeat:Dummy):   Stopped

Migration Summary:
  * Node: node1 (1):
    * dummy2: migration-threshold=1000000 fail-count=1000000 last-failure='Thu Mar  3 15:44:34 2022'
  * Node: node2 (2):
    * dummy2: migration-threshold=1000000 fail-count=1000000 last-failure='Thu Mar  3 15:44:34 2022'

Failed Resource Actions:
  * dummy2_start_0 on node1 'error' (1): call=69, status='complete', last-rc-change='Thu Mar  3 15:44:34 2022', queued=0ms, exec=19ms
  * dummy2_start_0 on node2 'error' (1): call=56, status='complete', last-rc-change='Thu Mar  3 15:44:34 2022', queued=0ms, exec=16ms

Environment

  • Red Hat Enterprise Linux 8 (with the High Availability Add-on)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content