Pacemaker resource monitor results are ignored if a failure expires and the recurring operation gets rescheduled before recovery

Solution In Progress - Updated -

Issue

  • A resource in a group failed and hit its migration threshold. It was scheduled to recover (stop and restart in place or move). Other resources in the same group had to stop first. But the failure expired before the other resources finished stopping, so the failure was deleted and the failed resource never restarted. Now, the resource keeps failing but pacemaker doesn't try to recover it. The resource remains in Started state.
  • An SAPInstance resource keeps logging "SAP instance service enq_server is not running with status GRAY" but Pacemaker doesn't recover it.
Jul 13 12:25:51node2 SAPInstance(rsc_sap_RH2_ASCS21)[903731]: ERROR: SAP instance service enq_server is not running with status GRAY !
Jul 13 12:25:51node2 pacemaker-controld[95195]: notice: Result of monitor operation for rsc_sap_RH2_ASCS21 on node2: not running
...
Jul 13 12:25:51 node2 pacemaker-schedulerd[95194]: warning: Forcing rsc_sap_RH2_ASCS21 away from node2 after 1 failures (max=1)
Jul 13 12:25:51 node2 pacemaker-schedulerd[95194]: notice:  * Move       fs_RH2_ASCS            ( node2 -> node1 )
Jul 13 12:25:51 node2 pacemaker-schedulerd[95194]: notice:  * Move       vip_RH2_ASCS           ( node2 -> node1 )
Jul 13 12:25:51 node2 pacemaker-schedulerd[95194]: notice:  * Move       nc_RH2_ASCS            ( node2 -> node1 )
Jul 13 12:25:51 node2 pacemaker-schedulerd[95194]: notice:  * Recover    rsc_sap_RH2_ASCS21     ( node2 -> node1 )
Jul 13 12:25:51 node2 pacemaker-schedulerd[95194]: notice:  * Move       fs_TWS_RH2             ( node2 -> node1 )
Jul 13 12:25:51 node2 pacemaker-schedulerd[95194]: notice:  * Move       rsc_TWS_RH2            ( node2 -> node1 )
Jul 13 12:25:51 node2 pacemaker-schedulerd[95194]: notice:  * Move       fs_RH2_ERS             ( node1 -> node2 ) 
Jul 13 12:25:51 node2 pacemaker-schedulerd[95194]: notice:  * Move       vip_RH2_ERS            ( node1 -> node2 ) 
Jul 13 12:25:51 node2 pacemaker-schedulerd[95194]: notice:  * Move       pc_RH2_ERS             ( node1 -> node2 ) 
Jul 13 12:25:51 node2 pacemaker-schedulerd[95194]: notice:  * Move       rsc_sap_RH2_ERS31      ( node1 -> node2 ) 
Jul 13 12:25:51 node2 pacemaker-schedulerd[95194]: notice: Calculated transition 28, saving inputs in /var/lib/pacemaker/pengine/pe-input-524.bz2
...
Jul 13 12:26:56 node2 pacemaker-controld[95195]: notice: Result of stop operation for rsc_TWS_RH2 on node2: ok
Jul 13 12:26:56 node2 pacemaker-schedulerd[95194]: notice: Clearing failure of rsc_sap_RH2_ASCS21 on node2 because it expired
Jul 13 12:26:56 node2 pacemaker-schedulerd[95194]: notice: Rescheduling rsc_sap_RH2_ASCS21_monitor_20000 after failure expired on node2
Jul 13 12:26:56 node2 pacemaker-attrd[95193]: notice: Setting last-failure-rsc_sap_RH2_ASCS21#monitor_20000[node2]: 1626179151 -> (unset)
Jul 13 12:26:56 node2 pacemaker-attrd[95193]: notice: Setting fail-count-rsc_sap_RH2_ASCS21#monitor_20000[node2]: 1 -> (unset)
...
Jul 13 12:27:14 node2 SAPInstance(rsc_sap_RH2_ASCS21)[906673]: ERROR: SAP instance service enq_server is not running with status GRAY !
Jul 13 12:27:34 node2 SAPInstance(rsc_sap_RH2_ASCS21)[907343]: ERROR: SAP instance service enq_server is not running with status GRAY !
Jul 13 12:27:55 node2 SAPInstance(rsc_sap_RH2_ASCS21)[907946]: ERROR: SAP instance service enq_server is not running with status GRAY !
Jul 13 12:28:16 node2 SAPInstance(rsc_sap_RH2_ASCS21)[908777]: ERROR: SAP instance service enq_server is not running with status GRAY !
...

Environment

  • Red Hat Enterprise Linux 8 (with the High Availability Add-on)
  • Red Hat Enterprise Linux 8 for SAP Solutions

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In