Chapter 5. Fencing Controller nodes with STONITH
Fencing is the process of isolating a failed node to protect the cluster and the cluster resources. Without fencing, a failed node might result in data corruption in a cluster.
Director uses Pacemaker to provide a highly available cluster of Controller nodes. Pacemaker uses a process called STONITH to fence failed nodes. STONITH is an acronym for "Shoot the other node in the head".
If a Controller node fails a health check, the Controller node that acts as the Pacemaker designated coordinator (DC) uses the Pacemaker
stonith service to fence the impacted Controller node.
STONITH is disabled by default and requires manual configuration so that Pacemaker can control the power management of each node in the cluster.
Deploying a highly available overcloud without STONITH is not supported. You must configure a STONITH device for each node that is a part of the Pacemaker cluster in a highly available overcloud. For more information on STONITH and Pacemaker, see Fencing in a Red Hat High Availability Cluster and Support Policies for RHEL High Availability Clusters.
For more information on fencing with Pacemaker in Red Hat Enterprise Linux, see "Configuring fencing in a Red Hat High Availability cluster" in the Red Hat Enterprise Linux 8 Configuring and Managing High Availability Clusters guide.
5.1. Supported fencing agents
When you deploy a high availability environment with fencing, you can choose one of the following fencing agents based on your environment needs. To change the fencing agent, you must configure additional parameters in the
fencing.yaml file, as described in Section 5.2, “Deploying and testing fencing on the overcloud”.
- Intelligent Platform Management Interface (IPMI)
- Default fencing mechanism that RHOSP uses to manage fencing.
- Storage Block Device (SBD)
- Use in deployments with Watchdog devices. The deployment must not use shared storage.
Use in deployments with the
kdumpcrash recovery service. If you choose this agent, make sure you have enough disk space to store the dump files.
You can configure this agent as a secondary mechanism in addition to the IPMI,
fence_rhevm, or Redfish fencing agents. If you configure multiple fencing agents, make sure that you allocate enough time for the first agent to complete the task before the second agent starts the next task.
Use in deployments with servers that support the DMTF Redfish APIs. To specify this agent, change the value of the
fencing.yamlfile. For more information about Redfish, see the DTMF Documentation.
fence_rhevmfor oVirt and Red Hat Virtualization (RHV)
Use to configure fencing for Controller nodes that run in oVirt or RHV environments. You can generate the
fencing.yamlfile in the same way as you do for IPMI fencing, but you must define the
pm_typeparameter in the
nodes.jsonfile to use oVirt or RHV.
By default, the
ssl_insecureparameter is set to accept self-signed certificates. You can change the parameter value based on your security requirements.
5.2. Deploying and testing fencing on the overcloud
The fencing configuration process includes the following stages:
- Reviewing the state of STONITH and Pacemaker.
- Redeploying the overcloud and testing the configuration.
Make sure that you can access the
nodes.json file that you created when you registered your Controller nodes in director. This file is a required input for the
fencing.yaml file that you generate during deployment.
Review the state of STONITH and Pacemaker
- Log in to each Controller node as the heat-admin user.
Verify that the cluster is running:
$ sudo pcs status
Cluster name: openstackHA Last updated: Wed Jun 24 12:40:27 2015 Last change: Wed Jun 24 11:36:18 2015 Stack: corosync Current DC: lb-c1a2 (2) - partition with quorum Version: 1.1.12-a14efad 3 Nodes configured 141 Resources configured
Verify that STONITH is disabled:
$ sudo pcs property show
Cluster Properties: cluster-infrastructure: corosync cluster-name: openstackHA dc-version: 1.1.12-a14efad have-watchdog: false stonith-enabled: false
fencing.yaml environment file
Choose one of the following options:
If you use the IPMI or Red Hat Virtualization fencing agent, run the following command to generate the
$ openstack overcloud generate fencing --output fencing.yaml nodes.jsonNote
This command converts
dracpower management details to IPMI equivalents.
Make sure that the
nodes.jsonfile contains the MAC address of one of the network interfaces (NICs) on the node. For more information, see Registering Nodes for the Overcloud.
- This command converts
If you use a different fencing agent, such as Storage Block Device (SBD),
fence_kdump, or Redfish, generate the
If you use pre-provisioned nodes, you also must create the
For more information about supported fencing agents, see Section 5.1, “Supported fencing agents”.
(Optional) Configure additional parameters for SBD fencing
If you are deploying fencing with the Storage Block Device (SBD) agent, you must add the following parameter to the
parameter_defaults: ExtraConfig: pacemaker::corosync::enable_sbd: true
By default, the
watchdog_timeout value is 10 seconds to prevent fencing from starting before the deployment ends. You can increase this value by adding the following parameter:
(Optional) Configure multi-layered fencing
You can configure multiple fencing agents to support complex fencing use-cases. For example, you can configure IPMI fencing together with
fence_kdump. The order of the fencing agents determines the order in which Pacemaker triggers each mechanism.
To define multiple fencing agents, add the level-specific parameters to the generated
parameter_defaults: EnableFencing: true FencingConfig: devices: level1: - agent: [VALUE] host_mac: aa:bb:cc:dd:ee:ff params: [PARAMETER]: [VALUE] level2: - agent: fence_agent2 host_mac: aa:bb:cc:dd:ee:ff params: [PAREMETER]: [VALUE]
[VALUE] with the actual parameters and values that the fencing agent requires.
Redeploy the overcloud and test the configuration
overcloud deploycommand and include the
fencing.yamlfile that you generated to configure fencing on the Controller nodes:
openstack overcloud deploy --templates \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e ~/templates/network-environment.yaml \ -e ~/templates/storage-environment.yaml --control-scale 3 --compute-scale 3 --ceph-storage-scale 3 --control-flavor control --compute-flavor Compute --ceph-storage-flavor ceph-storage --ntp-server pool.ntp.org --neutron-network-type vxlan --neutron-tunnel-types vxlan \ -e fencing.yaml
Log in to the overcloud and verify that fencing is configured for each of the Controller nodes:
Check that Pacemaker is configured as the resource manager:
$ source stackrc $ nova list | grep controller $ ssh heat-admin@<controller-x_ip> $ sudo pcs status |grep fence stonith-overcloud-controller-x (stonith:fence_ipmilan): Started overcloud-controller-y
In this example, Pacemaker is configured to use a STONITH resource for each of the Controller nodes that are specified in the
You must not configure the
fence-resourceprocess on the same node that it controls.
pcs stonith showcommand to check the fencing resource attributes:
$ sudo pcs stonith show <stonith-resource-controller-x>
The STONITH attribute values must match the values in the
Verify fencing on the Controller nodes
To test whether fencing works correctly, you trigger fencing by closing all ports on a Controller node and rebooting the server.
Log in to a Controller node:
$ source stackrc $ nova list |grep controller $ ssh heat-admin@<controller-x_ip>
Change to the root user and run the
iptablescommand on each port:
$ sudo -i iptables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT && iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT && iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 5016 -j ACCEPT && iptables -A INPUT -p udp -m state --state NEW -m udp --dport 5016 -j ACCEPT && iptables -A INPUT ! -i lo -j REJECT --reject-with icmp-host-prohibited && iptables -A OUTPUT -p tcp --sport 22 -j ACCEPT && iptables -A OUTPUT -p tcp --sport 5016 -j ACCEPT && iptables -A OUTPUT -p udp --sport 5016 -j ACCEPT && iptables -A OUTPUT ! -o lo -j REJECT --reject-with icmp-host-prohibitedImportant
This step drops all connections to the Controller node, which causes the server to reboot.
From a different Controller node, locate the fencing event in the Pacemaker log file:
$ ssh heat-admin@<controller-x_ip> $ less /var/log/cluster/corosync.log (less): /fenc*
If the STONITH service performed the fencing action on the Controller, the log file will show a fencing event.
Wait a few minutes and then verify that the rebooted Controller node is running in the cluster again by running the
5.3. Viewing STONITH information
To see how STONITH configures your fencing devices, run the
pcs stonith show --full command from the overcloud:
$ sudo pcs stonith show --full Resource: my-ipmilan-for-controller-0 (class=stonith type=fence_ipmilan) 1 Attributes: pcmk_host_list=overcloud-controller-0 ipaddr=10.100.0.51 login=admin passwd=abc lanplus=1 cipher=3 Operations: monitor interval=60s (my-ipmilan-for-controller-0-monitor-interval-60s) Resource: my-ipmilan-for-controller-1 (class=stonith type=fence_ipmilan) Attributes: pcmk_host_list=overcloud-controller-1 ipaddr=10.100.0.52 login=admin passwd=abc lanplus=1 cipher=3 Operations: monitor interval=60s (my-ipmilan-for-controller-1-monitor-interval-60s) Resource: my-ipmilan-for-controller-2 (class=stonith type=fence_ipmilan) Attributes: pcmk_host_list=overcloud-controller-2 ipaddr=10.100.0.53 login=admin passwd=abc lanplus=1 cipher=3 Operations: monitor interval=60s (my-ipmilan-for-controller-2-monitor-interval-60s)
--full option returns fencing details about the three Controller nodes.
This output shows the following information for each resource:
- IPMI power management service that the fencing device uses to turn the machines on and off as needed, such as fence_ipmilan.
- IP address of the IPMI interface, such as 10.100.0.51.
- User name to log in with, such as admin.
- Password to use to log in to the node, such as abc.
- Interval in seconds at which each host is monitored, such as 60s.
5.4. Fencing parameters
When you deploy fencing on the overcloud, you generate a
fencing.yaml file with the required parameters to configure fencing. For more information about deploying and testing fencing, see Section 5.2, “Deploying and testing fencing on the overcloud”.
The following example shows the structure of the
fencing.yaml environment file:
parameter_defaults: EnableFencing: true FencingConfig: devices: - agent: fence_ipmilan host_mac: 11:11:11:11:11:11 params: ipaddr: 10.0.0.101 lanplus: true login: admin passwd: InsertComplexPasswordHere pcmk_host_list: host04 privlvl: administrator
This file contains the following parameters:
- Enables the fencing functionality for Pacemaker-managed nodes.
Lists the fencing devices and the parameters for each device:
agent: Fencing agent name. Red Hat OpenStack Platform only supports
host_mac: Unique identifier for the fencing device.
params: List of fencing device parameters.
- Fencing device parameters
auth: IPMI authentication type (
password, or none).
ipaddr: IPMI IP address.
ipport: IPMI port.
login: Username for the IPMI device.
passwd: Password for the IPMI device.
lanplus: Use lanplus to improve security of connection.
privlvl: Privilege level on IPMI device
pcmk_host_list: List of Pacemaker hosts.