Why does a fencing operation complete with success but the node is not rebooted and instead pacemaker is shut down on that node?

Solution Verified - Updated -

Issue

  • While performing a manual fence operation using pcs stonith fence node2, the command completes without error and actually the node2 does not get rebooted - it remains up and running and just the pacemaker stack gets shut down on the node2:

    [root@node1 ~]# pcs stonith fence node2
    Node: node2 fenced
    

    Logs from node1:

    stonith-ng[1485]:  notice: Client stonith_admin.6884.651b3028 wants to fence (reboot) 'node2' with device '(any)'
    stonith-ng[1485]:  notice: Requesting peer fencing (reboot) of node2
    stonith-ng[1485]:  notice: fence_node1 can not fence (reboot) node2: static-list
    stonith-ng[1485]:  notice: fence_node2 can fence (reboot) node2 (aka. 'vm-node2'): static-list
    stonith-ng[1485]:  notice: fence_node1 can not fence (reboot) node2: static-list
    stonith-ng[1485]:  notice: fence_node2 can fence (reboot) node2 (aka. 'vm-node2'): static-list
    stonith-ng[1485]:  notice: Operation 'reboot' [6885] (call 2 from stonith_admin.6884) for host 'node2' with device 'fence_node2' returned: 0 (OK)
    stonith-ng[1485]:  notice: Operation reboot of node2 by node1 for stonith_admin.6884@node1.671eb0a4: OK
    crmd[1490]:  notice: Peer node2 was terminated (reboot) by node1 on behalf of stonith_admin.6884: OK
    

    Logs from node2:

    stonith-ng[3038]:  notice: fence_node1 can not fence (reboot) node2: static-list
    stonith-ng[3038]:  notice: fence_node2 can fence (reboot) node2 (aka. 'vm-node2'): static-list
    stonith-ng[3038]:  notice: Operation reboot of node2 by node1 for stonith_admin.6884@node1.671eb0a4: OK
    crmd[3042]:    crit: We were allegedly just fenced by node1 for node1!
    cib[3037]: warning: new_event_notification (3037-3042-13): Broken pipe (32)
    cib[3037]: warning: Notification of client crmd/0fff530d-0e73-42f0-bbac-c0100a4ab62b failed
    pacemakerd[3036]: warning: The crmd process (3042) can no longer be respawned, shutting the cluster down.
    lrmd[3039]: warning: new_event_notification (3039-3042-8): Bad file descriptor (9)
    lrmd[3039]: warning: Could not notify client crmd/e36d493d-4cd9-4e96-946d-cf1165dbfe2c: Bad file descriptor
    pacemakerd[3036]:  notice: Shutting down Pacemaker
    pacemakerd[3036]:  notice: Stopping pengine
    pengine[3041]:  notice: Caught 'Terminated' signal
    pacemakerd[3036]:  notice: Stopping attrd
    attrd[3040]:  notice: Caught 'Terminated' signal
    lrmd[3039]:  notice: Caught 'Terminated' signal
    pacemakerd[3036]:  notice: Stopping lrmd
    pacemakerd[3036]:  notice: Stopping stonith-ng
    stonith-ng[3038]:  notice: Caught 'Terminated' signal
    cib[3037]: warning: new_event_notification (3037-3038-11): Broken pipe (32)
    cib[3037]: warning: Notification of client stonithd/7c8647e4-b100-4d2b-b9bf-b4c94fdc6e80 failed
    cib[3037]: warning: new_event_notification (3037-3040-12): Broken pipe (32)
    cib[3037]: warning: Notification of client attrd/0bcd95ee-116f-4423-8b89-666d97583822 failed
    pacemakerd[3036]:  notice: Stopping cib
    cib[3037]:  notice: Caught 'Terminated' signal
    cib[3037]:  notice: Disconnected from Corosync
    cib[3037]:  notice: Disconnected from Corosync
    pacemakerd[3036]:  notice: Shutdown complete
    pacemakerd[3036]:  notice: Attempting to inhibit respawning after fatal error
    
  • The same behavior is seen during stonith operations issued by the cluster itself but it cannot be reproduced when issuing the operation manually using the command fence_vmware_soap or fence_vmware_rest.

  • The hypervisor may report any of the following messages in the logs of the virtual machine which was fenced:

    Task: Reset virtual machine
    

    or:

    Task: Reconfigure virtual machine
    
  • From the point of view of the rest of the nodes in the cluster, the node which was fenced appears as Pending.

Environment

Red Hat Enterprise Linux 7 with High-Availability or Resilient Storage Add-Ons
Pacemaker cluster
VMware hypervisor

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In