Pacemaker resources in a group stopped after a failure even though on-fail=block is set

Solution In Progress - Updated -

Issue

  • All resources in a resource group have on-fail=block configured for their monitor operations. But when one resource encountered a monitor failure, Pacemaker stopped all the resources that depend on it.
# pcs resource config | egrep 'Group:|Resource:|monitor'
 Group: dummy_grp
  Resource: dummy1 (class=ocf provider=heartbeat type=Dummy)
               monitor interval=10s on-fail=block timeout=20s (dummy1-monitor-interval-10s)
  Resource: dummy2 (class=ocf provider=heartbeat type=Dummy_mod)
               monitor interval=10s on-fail=block timeout=20s (dummy2-monitor-interval-10s)
  Resource: dummy3 (class=ocf provider=heartbeat type=Dummy)
               monitor interval=10s on-fail=block timeout=20s (dummy3-monitor-interval-10s)

# pcs status | grep dummy
  * Resource Group: dummy_grp:
    * dummy1    (ocf:heartbeat:Dummy):   Started node1
    * dummy2    (ocf:heartbeat:Dummy_mod):   Started node1
    * dummy3    (ocf:heartbeat:Dummy):   Started node1

# pcs status | grep dummy
  * Resource Group: dummy_grp:
    * dummy1    (ocf:heartbeat:Dummy):   Started node1
    * dummy2    (ocf:heartbeat:Dummy_mod):   FAILED node1 (blocked)
    * dummy3    (ocf:heartbeat:Dummy):   Stopped
  * dummy2_monitor_10000 on node1 'error' (1): call=128, status='complete', exitreason='', last-rc-change='2021-02-18 18:04:37 -08:00', queued=0ms, exec=0ms
  • What is the expected behavior of on-fail=block for resources in a group?

Environment

  • Red Hat Enterprise Linux 8 (with the High Availability Add-on)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content