Resources run on two nodes simultaneously, data on shared storage is corrupted, and/or other unexpected behavior occurs in a RHEL High Availability cluster using fence_ipmilan with method=cycle
Red Hat Insights can detect this issue
Environment
- Red Hat Enterprise Linux (RHEL) 5, 6, 7, 8 with the High Availability Add On
- One or more stonith or fence devices configured to use agent:
fence_ipmilan
,fence_ilo3
,fence_ilo4
,fence_imm
, andfence_idrac
- Or a fencing device configured with
method=cycle
in/etc/cluster/cluster.conf
incman
-based clusters or in the CIB forpacemaker
-based clusters
Issue
- A node had trouble communicating and the cluster decided to fence it and take over its resources, but it seems that another node mounted file system resources before the node got powered off, and data was corrupted.
fence_ipmilan
returns success before a node actually gets powered off- A node failed to stop a resource and so needed to be fenced, and somehow that node was still alive to log the completion of that fence action from another node. How can this be possible if the node should have powered off before fencing completed?
Aug 17 08:33:08 node2 stonith-ng[17738]: notice: remote_op_done: Operation reboot of node2 by node1 for stonith_admin.cman.120215@sapha014hb0.ee6744ed: OK
- When a node is fenced in my
pacemaker
cluster due to a resource stop timeout, the rest of the cluster logs "telling cman to remove nodeid 9 from cluster", the membership changes, but GFS2 access stays blocked. All nodes log "Trying to acquire journal lock" but nothing else happens. We only see this behavior withmethod="cycle"
in ourstonith
device.
Resolution
IMPORTANT: Configure all IMPI based fencing agent such as fence_ipmilan
, fence_ilo3
, fence_ilo4
, fence_imm
, and fence_idrac
devices to use method=onoff
(the default in most cases) instead of cycle
and make sure that cluster node is configured to power off immediately for RHEL 5, 6 cluster nodes or powered off immediately for RHEL 7 cluster nodes.
If you have declared the attribute method
to have a value of cycle
for any fence-agent then you should modify it so that themethod
attribute has a value of onoff
.
RHEL 6
There are multiple fence-agents that have a default of cycle
for the method
attribute. If you are using one of the following fence-agents below then add the attribute method=onoff
to those configured fence-agents.
fence_ipmilan
fence_ilo3
fence_ilo4
fence_imm
fence_idrac
pacemaker-based clusters
Update the stonith device configurations to not specify a method, or use method=onoff
instead. Leaving the value off of an attribute when updating causes it to be un-set and uses the default which we do not want to do in this case.
# pcs stonith update node1-ipmi method=onoff
Purely cman-based clusters
Update any fencedevice
definitions in /etc/cluster/cluster.conf
to use method="onoff"
instead.
<fencedevice name="node1-ipmi" agent="fence_ipmilan" ipaddr="node1-ipmi.example.com" userid="myuser" password="StrongPassword" lanplus="1" method="onoff"/>
RHEL 7 or later
The only fencing agent that defaults to method=cycle
on RHEL 7 is fence_ilo3
. There are two ways to change this:
fence_ilo3
before installing RHBA-2018:0758 or later.
pacemaker-based clusters
- Update the
fence-agents
packages with the following errata RHBA-2018:0758. The errata changes the default ofmethod
toonoff
for the fencing agentfence_ilo3
. - WORKAROUND: Update the stonith device configurations to not specify a method unless
fence_ilo3
(and before errata above or later was installed), or usemethod=onoff
instead. Leaving the value off of an attribute when updating causes it to be un-set and uses the default.
# pcs stonith update node1-ipmi method=onoff
NOTE: RHEL 8 or later defaults to onoff
for the attribute method
for all fence-agents that use the method
attribute.
Root Cause
fence_ipmilan
offers a special method attribute that controls how a reboot
operation is carried out. If using the default value of onoff
, then the agent sends a power-off command to the device, then sends a power-on, and evaluates the results of those and reports that back as the exit status. This ensures that no successful return code can be sent back to the cluster stack until a node is successfully powered off.
However, the alternate value of cycle
results in the agent issuing a single command to the hardware device telling it to cycle the node itself. This relies on the device firmware carrying out the action in the proper way and reporting the status successfully, since both before and after the status of the server will be "on", so there is no way to confirm that it actually powered off. Some server make/model firmwares might actually return a successful status from this cycle
request before proceeding to power off the server. The end result is that the fence agent may believe the operation was a success several seconds or more before a node actually powers off.
This can cause problems for the cluster stack in a few ways, the most significant of which being that the successful completion of fencing signals to the resource manager on other nodes to start recovering resources that were running on fenced nodes, meaning those resources have the potential to run on two nodes simultaneously. If one node thinks the other has powered off and, for example, takes over a file system resource, mounts it, and submits I/O to it, all while the other node is still issuing I/O to it itself, data corruption could ensue.
While this ultimately would be a problem on the IPMI-device firmware side, Red Hat is considering whether a change is necessary to prevent usage of the cycle
method within High Availability clusters, or whether there is some alternative solution that could prevent issues like this. This investigation is occurring in Red Hat Bugzilla #1271780.
This applies to all IMPI based fencing agent such as fence_ipmilan
, fence_ilo3
, fence_ilo4
, fence_ilo5
, fence_imm
, and fence_idrac
.
Diagnostic Steps
- To demonstrate the nature of this problem, simply execute
fence_ipmilan -o reboot -m cycle [...]
from one node against another node's fence device, then interact with a console on that fenced node constantly while waiting for thefence_ipmilan
operation to complete. If the node is still responsive on its console or ssh session after thefence_ipmilan
command has exited with a success status, then the cluster is susceptible to unexpected behavior when using thecycle
method, and it should be avoided.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments