Pacemaker resource monitor results are ignored if a failure expires and the recurring operation gets rescheduled before recovery
Issue
- A resource in a group failed and hit its migration threshold. It was scheduled to recover (stop and restart in place or move). Other resources in the same group had to stop first. But the failure expired before the other resources finished stopping, so the failure was deleted and the failed resource never restarted. Now, the resource keeps failing but
pacemaker
doesn't try to recover it. The resource remains in Started state. - An
SAPInstance
resource keeps logging"SAP instance service enq_server is not running with status GRAY"
but Pacemaker doesn't recover it.
Jul 13 12:25:51node2 SAPInstance(rsc_sap_RH2_ASCS21)[903731]: ERROR: SAP instance service enq_server is not running with status GRAY !
Jul 13 12:25:51node2 pacemaker-controld[95195]: notice: Result of monitor operation for rsc_sap_RH2_ASCS21 on node2: not running
...
Jul 13 12:25:51 node2 pacemaker-schedulerd[95194]: warning: Forcing rsc_sap_RH2_ASCS21 away from node2 after 1 failures (max=1)
Jul 13 12:25:51 node2 pacemaker-schedulerd[95194]: notice: * Move fs_RH2_ASCS ( node2 -> node1 )
Jul 13 12:25:51 node2 pacemaker-schedulerd[95194]: notice: * Move vip_RH2_ASCS ( node2 -> node1 )
Jul 13 12:25:51 node2 pacemaker-schedulerd[95194]: notice: * Move nc_RH2_ASCS ( node2 -> node1 )
Jul 13 12:25:51 node2 pacemaker-schedulerd[95194]: notice: * Recover rsc_sap_RH2_ASCS21 ( node2 -> node1 )
Jul 13 12:25:51 node2 pacemaker-schedulerd[95194]: notice: * Move fs_TWS_RH2 ( node2 -> node1 )
Jul 13 12:25:51 node2 pacemaker-schedulerd[95194]: notice: * Move rsc_TWS_RH2 ( node2 -> node1 )
Jul 13 12:25:51 node2 pacemaker-schedulerd[95194]: notice: * Move fs_RH2_ERS ( node1 -> node2 )
Jul 13 12:25:51 node2 pacemaker-schedulerd[95194]: notice: * Move vip_RH2_ERS ( node1 -> node2 )
Jul 13 12:25:51 node2 pacemaker-schedulerd[95194]: notice: * Move pc_RH2_ERS ( node1 -> node2 )
Jul 13 12:25:51 node2 pacemaker-schedulerd[95194]: notice: * Move rsc_sap_RH2_ERS31 ( node1 -> node2 )
Jul 13 12:25:51 node2 pacemaker-schedulerd[95194]: notice: Calculated transition 28, saving inputs in /var/lib/pacemaker/pengine/pe-input-524.bz2
...
Jul 13 12:26:56 node2 pacemaker-controld[95195]: notice: Result of stop operation for rsc_TWS_RH2 on node2: ok
Jul 13 12:26:56 node2 pacemaker-schedulerd[95194]: notice: Clearing failure of rsc_sap_RH2_ASCS21 on node2 because it expired
Jul 13 12:26:56 node2 pacemaker-schedulerd[95194]: notice: Rescheduling rsc_sap_RH2_ASCS21_monitor_20000 after failure expired on node2
Jul 13 12:26:56 node2 pacemaker-attrd[95193]: notice: Setting last-failure-rsc_sap_RH2_ASCS21#monitor_20000[node2]: 1626179151 -> (unset)
Jul 13 12:26:56 node2 pacemaker-attrd[95193]: notice: Setting fail-count-rsc_sap_RH2_ASCS21#monitor_20000[node2]: 1 -> (unset)
...
Jul 13 12:27:14 node2 SAPInstance(rsc_sap_RH2_ASCS21)[906673]: ERROR: SAP instance service enq_server is not running with status GRAY !
Jul 13 12:27:34 node2 SAPInstance(rsc_sap_RH2_ASCS21)[907343]: ERROR: SAP instance service enq_server is not running with status GRAY !
Jul 13 12:27:55 node2 SAPInstance(rsc_sap_RH2_ASCS21)[907946]: ERROR: SAP instance service enq_server is not running with status GRAY !
Jul 13 12:28:16 node2 SAPInstance(rsc_sap_RH2_ASCS21)[908777]: ERROR: SAP instance service enq_server is not running with status GRAY !
...
Environment
- Red Hat Enterprise Linux 8 (with the High Availability Add-on)
- Red Hat Enterprise Linux 8 for SAP Solutions
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.