Why does a fencing operation complete with success but the node is not rebooted and instead pacemaker is shut down on that node?

Solution Verified - Updated 2024-08-09T04:03:32+00:00 -

Issue

While performing a manual fence operation using pcs stonith fence node2, the command completes without error and actually the node2 does not get rebooted - it remains up and running and just the pacemaker stack gets shut down on the node2:

[root@node1 ~]# pcs stonith fence node2
Node: node2 fenced

Logs from node1:

stonith-ng[1485]:  notice: Client stonith_admin.6884.651b3028 wants to fence (reboot) 'node2' with device '(any)'
stonith-ng[1485]:  notice: Requesting peer fencing (reboot) of node2
stonith-ng[1485]:  notice: fence_node1 can not fence (reboot) node2: static-list
stonith-ng[1485]:  notice: fence_node2 can fence (reboot) node2 (aka. 'vm-node2'): static-list
stonith-ng[1485]:  notice: fence_node1 can not fence (reboot) node2: static-list
stonith-ng[1485]:  notice: fence_node2 can fence (reboot) node2 (aka. 'vm-node2'): static-list
stonith-ng[1485]:  notice: Operation 'reboot' [6885] (call 2 from stonith_admin.6884) for host 'node2' with device 'fence_node2' returned: 0 (OK)
stonith-ng[1485]:  notice: Operation reboot of node2 by node1 for stonith_admin.6884@node1.671eb0a4: OK
crmd[1490]:  notice: Peer node2 was terminated (reboot) by node1 on behalf of stonith_admin.6884: OK

Logs from node2:

stonith-ng[3038]:  notice: fence_node1 can not fence (reboot) node2: static-list
stonith-ng[3038]:  notice: fence_node2 can fence (reboot) node2 (aka. 'vm-node2'): static-list
stonith-ng[3038]:  notice: Operation reboot of node2 by node1 for stonith_admin.6884@node1.671eb0a4: OK
crmd[3042]:    crit: We were allegedly just fenced by node1 for node1!
cib[3037]: warning: new_event_notification (3037-3042-13): Broken pipe (32)
cib[3037]: warning: Notification of client crmd/0fff530d-0e73-42f0-bbac-c0100a4ab62b failed
pacemakerd[3036]: warning: The crmd process (3042) can no longer be respawned, shutting the cluster down.
lrmd[3039]: warning: new_event_notification (3039-3042-8): Bad file descriptor (9)
lrmd[3039]: warning: Could not notify client crmd/e36d493d-4cd9-4e96-946d-cf1165dbfe2c: Bad file descriptor
pacemakerd[3036]:  notice: Shutting down Pacemaker
pacemakerd[3036]:  notice: Stopping pengine
pengine[3041]:  notice: Caught 'Terminated' signal
pacemakerd[3036]:  notice: Stopping attrd
attrd[3040]:  notice: Caught 'Terminated' signal
lrmd[3039]:  notice: Caught 'Terminated' signal
pacemakerd[3036]:  notice: Stopping lrmd
pacemakerd[3036]:  notice: Stopping stonith-ng
stonith-ng[3038]:  notice: Caught 'Terminated' signal
cib[3037]: warning: new_event_notification (3037-3038-11): Broken pipe (32)
cib[3037]: warning: Notification of client stonithd/7c8647e4-b100-4d2b-b9bf-b4c94fdc6e80 failed
cib[3037]: warning: new_event_notification (3037-3040-12): Broken pipe (32)
cib[3037]: warning: Notification of client attrd/0bcd95ee-116f-4423-8b89-666d97583822 failed
pacemakerd[3036]:  notice: Stopping cib
cib[3037]:  notice: Caught 'Terminated' signal
cib[3037]:  notice: Disconnected from Corosync
cib[3037]:  notice: Disconnected from Corosync
pacemakerd[3036]:  notice: Shutdown complete
pacemakerd[3036]:  notice: Attempting to inhibit respawning after fatal error

The same behavior is seen during stonith operations issued by the cluster itself but it cannot be reproduced when issuing the operation manually using the command fence_vmware_soap or fence_vmware_rest.
The hypervisor may report any of the following messages in the logs of the virtual machine which was fenced:
```
Task: Reset virtual machine
```
or:
```
Task: Reconfigure virtual machine
```
From the point of view of the rest of the nodes in the cluster, the node which was fenced appears as Pending.

Environment

Red Hat Enterprise Linux 7 with High-Availability or Resilient Storage Add-Ons
Pacemaker cluster
VMware hypervisor

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Select Your Language

Why does a fencing operation complete with success but the node is not rebooted and instead pacemaker is shut down on that node?

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links