Pacemaker resources in a group stopped after a failure even though on-fail=block is set
Issue
- All resources in a resource group have
on-fail=block
configured for their monitor operations. But when one resource encountered a monitor failure, Pacemaker stopped all the resources that depend on it.
# pcs resource config | egrep 'Group:|Resource:|monitor'
Group: dummy_grp
Resource: dummy1 (class=ocf provider=heartbeat type=Dummy)
monitor interval=10s on-fail=block timeout=20s (dummy1-monitor-interval-10s)
Resource: dummy2 (class=ocf provider=heartbeat type=Dummy_mod)
monitor interval=10s on-fail=block timeout=20s (dummy2-monitor-interval-10s)
Resource: dummy3 (class=ocf provider=heartbeat type=Dummy)
monitor interval=10s on-fail=block timeout=20s (dummy3-monitor-interval-10s)
# pcs status | grep dummy
* Resource Group: dummy_grp:
* dummy1 (ocf:heartbeat:Dummy): Started node1
* dummy2 (ocf:heartbeat:Dummy_mod): Started node1
* dummy3 (ocf:heartbeat:Dummy): Started node1
# pcs status | grep dummy
* Resource Group: dummy_grp:
* dummy1 (ocf:heartbeat:Dummy): Started node1
* dummy2 (ocf:heartbeat:Dummy_mod): FAILED node1 (blocked)
* dummy3 (ocf:heartbeat:Dummy): Stopped
* dummy2_monitor_10000 on node1 'error' (1): call=128, status='complete', exitreason='', last-rc-change='2021-02-18 18:04:37 -08:00', queued=0ms, exec=0ms
- What is the expected behavior of
on-fail=block
for resources in a group?
Environment
- Red Hat Enterprise Linux 8 (with the High Availability Add-on)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.