Cancelled pacemaker resource monitor treated as a resource failure

Solution In Progress - Updated -

Issue

Scheduling another action ( such as moving a resource ), in the middle of a resource monitor can lead to a resource failure:

$ cat /var/log/pacemaker/pacemaker.log
------------------------------------>8-------------------------------------------
Nov 01 22:16:05 SAPHana(SAPHana_RH2_00)[2931187]: INFO: RA ==== begin action monitor_clone (0.154.0) ==== Set r/w permissions for uid=189, gid=189 on /var/log/pacemaker.log Set r/w permissions for uid=189, gid=189 on /var/log/pacemaker.log 
Nov 01 22:16:05 SAPHana(SAPHana_RH2_00)[2931187]: INFO: RA: SRHOOK1= Set r/w permissions for uid=189, gid=189 on /var/log/pacemaker.log }}
Nov 01 22:16:05 SAPHana(SAPHana_RH2_00)[2931187]: INFO: RA: SRHOOK2=SOK 
Nov 01 22:16:05 SAPHana(SAPHana_RH2_00)[2931187]: INFO: RA: SRHOOK3=SOK
------------------------------------>8-------------------------------------------
Nov 01 22:16:07 rhel8-node2 pacemaker-execd [10531] (cancel_recurring_action) info: Cancelling ocf operation SAPHana_RH2_00_monitor_120000
Nov 01 22:16:07 rhel8-node2 pacemaker-execd [10531] (services_action_cancel) info: Terminating in-flight op SAPHana_RH2_00_monitor_120000[2931187] early because it was cancelled Nov 01 22:16:07 rhel8-node2 pacemaker-execd [10531] (operation_finished) info: SAPHana_RH2_00_monitor_120000[2931187] terminated with signal: Killed | (9) 
{{Nov 01 22:16:07 rhel8-node2 pacemaker-execd [10531] (cancel_recurring_action) info: Cancelling ocf operation SAPHana_RH2_00_monitor_120000
Nov 01 22:16:07 rhel8-node2 pacemaker-controld [10534] (process_lrm_event) info: Result of monitor operation for SAPHana_RH2_00 on rhel8-node2: Cancelled | call=121 key=SAPHana_RH2_00_monitor_120000 confirmed=true
$ crm_simulate -Ssx pe-input-2062
------------------------------------->8-------------------------------------------- 
st_redfish_node1 (stonith:fence_redfish): Started rhel8-node1
st_redfish_node2 (stonith:fence_redfish): Started rhel8-node1 
Started: [ rhel8-node1 rhel8-node2 ] 
Masters: [ rhel8-node1 ] 
Slaves: [ rhel8-node2 ] 
vip_G3Q_70 (ocf::heartbeat:IPaddr2): Started rhel8-node1 
Started: [ rhel8-node1 rhel8-node2 ] 
Masters: [ rhel8-node1 ] 
Slaves: [ rhel8-node2 ] 
vip_P3Q_20 (ocf::heartbeat:IPaddr2): Started rhel8-node1 
Started: [ rhel8-node1 rhel8-node2 ] 
SAPHana_RH1_10 (ocf::heartbeat:SAPHana): Stopped rhel8-node2 (Monitoring) 
Stopped: [ rhel8-node1 rhel8-node2 ] 
vip_RH1_10 (ocf::heartbeat:IPaddr2): Started rhel8-node1 
Started: [ rhel8-node1 rhel8-node2 ] 
SAPHana_RH2_00 (ocf::heartbeat:SAPHana): FAILED rhel8-node2 <---- 
Slaves: [ rhel8-node1 ] 
vip_RH2_00 (ocf::heartbeat:IPaddr2): Started rhel8-node2

For environments running promotable resources such as SAPHana, this can additionally lead to failed promotions after migration commands.

Environment

  • Red Hat Enterprise Linux 6, 7, 8 or 9 (with the High Availability Add-on)
  • Pacemaker

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content