Resources run on two nodes simultaneously, data on shared storage is corrupted, and/or other unexpected behavior occurs in a RHEL 5, 6, or 7 High Availability cluster using fence_ipmilan with method=cycle
Environment
- Red Hat Enterprise Linux (RHEL) 5, 6, 7 with the High Availability Add On
- One or more stonith or fence devices configured to use agent
fence_ipmilan- Such a device configured with
method=cyclein/etc/cluster/cluster.confincman-based clusters or in the CIB forpacemaker-based clusters
- Such a device configured with
Issue
- A node had trouble communicating and the cluster decided to fence it and take over its resources, but it seems that another node mounted file system resources before the node got powered off, and data was corrupted.
fence_ipmilanreturns success before a node actually gets powered off- A node failed to stop a resource and so needed to be fenced, and somehow that node was still alive to log the completion of that fence action from another node. How can this be possible if the node should have powered off before fencing completed?
Aug 17 08:33:08 node2 stonith-ng[17738]: notice: remote_op_done: Operation reboot of node2 by node1 for stonith_admin.cman.120215@sapha014hb0.ee6744ed: OK
- When a node is fenced in my
pacemakercluster due to a resource stop timeout, the rest of the cluster logs "telling cman to remove nodeid 9 from cluster", the membership changes, but GFS2 access stays blocked. All nodes log "Trying to acquire journal lock" but nothing else happens. We only see this behavior withmethod="cycle"in ourstonithdevice.
Resolution
IMPORTANT: Configure all fence_ipmilan devices to use method=onoff (the default) instead of cycle and make sure that cluster node is configured to power off immediately for RHEL 5, 6 cluster nodes or powered off immediately for RHEL 7 cluster nodes.
pacemaker-based clusters
Update stonith device configurations to not specify a method, or use method=onoff instead.
# # NOTE: Leaving the value off an attribute when updating causes it to be un-set
# pcs stonith update node1-ipmi method=
Purely cman-based clusters
Update any fencedevice definitions in /etc/cluster/cluster.conf that use agent="fence_ipmilan" and method="cycle" to not specify a method, or use method="onoff" instead.
<fencedevice name="node1-ipmi" agent="fence_ipmilan" ipaddr="node1-ipmi.example.com" userid="myuser" password="StrongPassword" lanplus="1"/>
NOTE: This uses the default method, which is onoff and should avoid the issue.
With the following errata RHBA-2018:0758 the fence_ilo3 resource agent no longer has a default value of cycle for the action parameter, but has a default action parameter of onoff.
Root Cause
fence_ipmilan offers a special method attribute that controls how a reboot operation is carried out. If using the default value of onoff, then the agent sends a power-off command to the device, then sends a power-on, and evaluates the results of those and reports that back as the exit status. This ensures that no successful return code can be sent back to the cluster stack until a node is successfully powered off.
However, the alternate value of cycle results in the agent issuing a single command to the hardware device telling it to cycle the node itself. This relies on the device firmware carrying out the action in the proper way and reporting the status successfully, since both before and after the status of the server will be "on", so there is no way to confirm that it actually powered off. Some server make/model firmwares might actually return a successful status from this cycle request before proceeding to power off the server. The end result is that the fence agent may believe the operation was a success several seconds or more before a node actually powers off.
This can cause problems for the cluster stack in a few ways, the most significant of which being that the successful completion of fencing signals to the resource manager on other nodes to start recovering resources that were running on fenced nodes, meaning those resources have the potential to run on two nodes simultaneously. If one node thinks the other has powered off and, for example, takes over a file system resource, mounts it, and submits I/O to it, all while the other node is still issuing I/O to it itself, data corruption could ensue.
While this ultimately would be a problem on the IPMI-device firmware side, Red Hat is considering whether a change is necessary to prevent usage of the cycle method within High Availability clusters, or whether there is some alternative solution that could prevent issues like this. This investigation is occurring in Red Hat Bugzilla #1271780.
Diagnostic Steps
- To demonstrate the nature of this problem, simply execute
fence_ipmilan -o reboot -m cycle [...]from one node against another node's fence device, then interact with a console on that fenced node constantly while waiting for thefence_ipmilanoperation to complete. If the node is still responsive on its console or ssh session after thefence_ipmilancommand has exited with a success status, then the cluster is susceptible to unexpected behavior when using thecyclemethod, and it should be avoided.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
