Cancelled pacemaker resource monitor treated as a resource failure
Issue
Scheduling another action ( such as moving a resource ), in the middle of a resource monitor can lead to a resource failure:
$ cat /var/log/pacemaker/pacemaker.log
------------------------------------>8-------------------------------------------
Nov 01 22:16:05 SAPHana(SAPHana_RH2_00)[2931187]: INFO: RA ==== begin action monitor_clone (0.154.0) ==== Set r/w permissions for uid=189, gid=189 on /var/log/pacemaker.log Set r/w permissions for uid=189, gid=189 on /var/log/pacemaker.log
Nov 01 22:16:05 SAPHana(SAPHana_RH2_00)[2931187]: INFO: RA: SRHOOK1= Set r/w permissions for uid=189, gid=189 on /var/log/pacemaker.log }}
Nov 01 22:16:05 SAPHana(SAPHana_RH2_00)[2931187]: INFO: RA: SRHOOK2=SOK
Nov 01 22:16:05 SAPHana(SAPHana_RH2_00)[2931187]: INFO: RA: SRHOOK3=SOK
------------------------------------>8-------------------------------------------
Nov 01 22:16:07 rhel8-node2 pacemaker-execd [10531] (cancel_recurring_action) info: Cancelling ocf operation SAPHana_RH2_00_monitor_120000
Nov 01 22:16:07 rhel8-node2 pacemaker-execd [10531] (services_action_cancel) info: Terminating in-flight op SAPHana_RH2_00_monitor_120000[2931187] early because it was cancelled Nov 01 22:16:07 rhel8-node2 pacemaker-execd [10531] (operation_finished) info: SAPHana_RH2_00_monitor_120000[2931187] terminated with signal: Killed | (9)
{{Nov 01 22:16:07 rhel8-node2 pacemaker-execd [10531] (cancel_recurring_action) info: Cancelling ocf operation SAPHana_RH2_00_monitor_120000
Nov 01 22:16:07 rhel8-node2 pacemaker-controld [10534] (process_lrm_event) info: Result of monitor operation for SAPHana_RH2_00 on rhel8-node2: Cancelled | call=121 key=SAPHana_RH2_00_monitor_120000 confirmed=true
$ crm_simulate -Ssx pe-input-2062
------------------------------------->8--------------------------------------------
st_redfish_node1 (stonith:fence_redfish): Started rhel8-node1
st_redfish_node2 (stonith:fence_redfish): Started rhel8-node1
Started: [ rhel8-node1 rhel8-node2 ]
Masters: [ rhel8-node1 ]
Slaves: [ rhel8-node2 ]
vip_G3Q_70 (ocf::heartbeat:IPaddr2): Started rhel8-node1
Started: [ rhel8-node1 rhel8-node2 ]
Masters: [ rhel8-node1 ]
Slaves: [ rhel8-node2 ]
vip_P3Q_20 (ocf::heartbeat:IPaddr2): Started rhel8-node1
Started: [ rhel8-node1 rhel8-node2 ]
SAPHana_RH1_10 (ocf::heartbeat:SAPHana): Stopped rhel8-node2 (Monitoring)
Stopped: [ rhel8-node1 rhel8-node2 ]
vip_RH1_10 (ocf::heartbeat:IPaddr2): Started rhel8-node1
Started: [ rhel8-node1 rhel8-node2 ]
SAPHana_RH2_00 (ocf::heartbeat:SAPHana): FAILED rhel8-node2 <----
Slaves: [ rhel8-node1 ]
vip_RH2_00 (ocf::heartbeat:IPaddr2): Started rhel8-node2
For environments running promotable resources such as SAPHana, this can additionally lead to failed promotions after migration commands.
Environment
- Red Hat Enterprise Linux 6, 7, 8 or 9 (with the High Availability Add-on)
- Pacemaker
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.