Pacemaker resources in a group stopped after a failure even though on-fail=block is set

Solution In Progress - Updated -

Issue

  • All resources in a resource group have on-fail=block configured for their monitor operations. But when one resource encountered a monitor failure, Pacemaker stopped all the resources that depend on it.
# pcs resource config | egrep 'Group:|Resource:|monitor'
 Group: dummy_grp
  Resource: dummy1 (class=ocf provider=heartbeat type=Dummy)
               monitor interval=10s on-fail=block timeout=20s (dummy1-monitor-interval-10s)
  Resource: dummy2 (class=ocf provider=heartbeat type=Dummy_mod)
               monitor interval=10s on-fail=block timeout=20s (dummy2-monitor-interval-10s)
  Resource: dummy3 (class=ocf provider=heartbeat type=Dummy)
               monitor interval=10s on-fail=block timeout=20s (dummy3-monitor-interval-10s)

# pcs status | grep dummy
  * Resource Group: dummy_grp:
    * dummy1    (ocf:heartbeat:Dummy):   Started node1
    * dummy2    (ocf:heartbeat:Dummy_mod):   Started node1
    * dummy3    (ocf:heartbeat:Dummy):   Started node1

# pcs status | grep dummy
  * Resource Group: dummy_grp:
    * dummy1    (ocf:heartbeat:Dummy):   Started node1
    * dummy2    (ocf:heartbeat:Dummy_mod):   FAILED node1 (blocked)
    * dummy3    (ocf:heartbeat:Dummy):   Stopped
  * dummy2_monitor_10000 on node1 'error' (1): call=128, status='complete', exitreason='', last-rc-change='2021-02-18 18:04:37 -08:00', queued=0ms, exec=0ms
  • What is the expected behavior of on-fail=block for resources in a group?

Environment

  • Red Hat Enterprise Linux 8 (with the High Availability Add-on)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In