compute-unfence-trigger become FAILED and a compute node reboots repeatedly after fencing the compute node in Red Hat OpenStack Platform 13
Issue
-
Configured STONITH on Compute nodes with reference to How do I configure my fence device to panic my cluster node instead of rebooting in a RHEL High Availability cluster?
# pcs config Fencing Levels: Target: compute-0 Level 1 - stonith-fence_kdump-0000000000aa,stonith-fence_compute-fence-nova Level 2 - stonith-fence_ipmi-diag-0000000000aa,stonith-fence_kdump-0000000000aa,stonith-fence_compute-fence-nova Level 3 - stonith-fence_ipmilan-0000000000aa,stonith-fence_compute-fence-nova
-
Tested fencing by the following command.
# pcs stonith fence compute-0
-
The fencing was succeeded.
# pcs status --full Fencing History: * unfencing of compute-0 successful: delegate=controller-0, client=crmd.761778, origin=controller-0, completed='Wed May 4 08:55:18 2022' * reboot of compute-0 successful: delegate=controller-1, client=stonith_admin.615233, origin=controller-1, completed='Wed May 4 08:53:13 2022'
-
However,
compute-unfence-trigger
resource status becameFAILED (blocked)
and the Compute node started to reboot repeatedly.# pcs status --full Clone Set: compute-unfence-trigger-clone [compute-unfence-trigger] compute-unfence-trigger (ocf::pacemaker:Dummy): FAILED Hostname6 (blocked) : Failed Resource Actions: * compute-unfence-trigger_stop_0 on stg-jp3-1-az1-6 'unknown' (189): call=16, status=Error, exitreason='', last-rc-change='Mon Jul 4 16:15:19 2022', queued=0ms, exec=0ms
Environment
- Red Hat OpenStack Platform 13
- Red Hat Enterprise Linux 7.9
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.