Red Hat Training
A Red Hat training course is available for Red Hat Enterprise Linux
9.9. Configuring Resources to Remain Stopped on Clean Node Shutdown (Red Hat Enterprise Linux 7.8 and later)
When a cluster node shuts down, Pacemaker’s default response is to stop all resources running on that node and recover them elsewhere, even if the shutdown is a clean shutdown. As of Red Hat Enterprise Linux 7.8, you can configure Pacemaker so that when a node shuts down cleanly, the resources attached to the node will be locked to the node and unable to start elsewhere until they start again when the node that has shut down rejoins the cluster. This allows you to power down nodes during maintenance windows when service outages are acceptable without causing that node’s resources to fail over to other nodes in the cluster.
9.9.1. Cluster Properties to Configure Resources to Remain Stopped on Clean Node Shutdown
The ability to prevent resources from failing over on a clean node shutdown is implemented by means of the following cluster properties.
- shutdown-lock
- When this cluster property is set to the default value of
false, the cluster will recover resources that are active on nodes being cleanly shut down. When this property is set totrue, resources that are active on the nodes being cleanly shut down are unable to start elsewhere until they start on the node again after it rejoins the cluster.Theshutdown-lockproperty will work for either cluster nodes or remote nodes, but not guest nodes.Ifshutdown-lockis set totrue, you can remove the lock on one cluster resource when a node is down so that the resource can start elsewhere by performing a manual refresh on the node with the following command.pcs resource refresh resource --node node
Note that once the resources are unlocked, the cluster is free to move the resources elsewhere. You can control the likelihood of this occurring by using stickiness values or location preferences for the resource.Note
A manual refresh will work with remote nodes only if you first run the following commands:- Run the
systemctl stop pacemaker_remotecommand on the remote node to stop the node. - Run the
pcs resource disable remote-connection-resourcecommand.
You can then perform a manual refresh on the remote node. - shutdown-lock-limit
- When this cluster property is set to a time other than the default value of 0, resources will be available for recovery on other nodes if the node does not rejoin within the specified time since the shutdown was initiated. Note, however, that the time interval will not be checked any more often than the value of the
cluster-recheck-intervalcluster property.Note
Theshutdown-lock-limitproperty will work with remote nodes only if you first run the following commands:- Run the
systemctl stop pacemaker_remotecommand on the remote node to stop the node. - Run the
pcs resource disable remote-connection-resourcecommand.
After you run these commands, the resources that had been running on the remote node will be available for recovery on other nodes when the amount of time specified as theshutdown-lock-limithas passed.
9.9.2. Setting the shutdown-lock Cluster Property
The following example sets the
shutdown-lock cluster property to true in an example cluster and shows the effect this has when the node is shut down and started again. This example cluster consists of three nodes: z1.example.com, z2.example.com, and z3.example.com.
- Set the
shutdown-lockproperty to totrueand verify its value. In this example theshutdown-lock-limitproperty maintains its default value of 0.[root@z3.example.com ~]#
pcs property set shutdown-lock=true[root@z3.example.com ~]#pcs property list --all | grep shutdown-lockshutdown-lock: true shutdown-lock-limit: 0 - Check the status of the cluster. In this example, resources
thirdandfifthare running onz1.example.com.[root@z3.example.com ~]#
pcs status... Full List of Resources: ... * first (ocf::pacemaker:Dummy): Started z3.example.com * second (ocf::pacemaker:Dummy): Started z2.example.com * third (ocf::pacemaker:Dummy): Started z1.example.com * fourth (ocf::pacemaker:Dummy): Started z2.example.com * fifth (ocf::pacemaker:Dummy): Started z1.example.com ... - Shut down
z1.example.com, which will stop the resources that are running on that node.[root@z3.example.com ~] #
pcs cluster stop z1.example.comStopping Cluster (pacemaker)... Stopping Cluster (corosync)...Running thepcs statuscommand shows that nodez1.example.comis offline and that the resources that had been running onz1.example.comareLOCKEDwhile the node is down.[root@z3.example.com ~]#
pcs status... Node List: * Online: [ z2.example.com z3.example.com ] * OFFLINE: [ z1.example.com ] Full List of Resources: ... * first (ocf::pacemaker:Dummy): Started z3.example.com * second (ocf::pacemaker:Dummy): Started z2.example.com * third (ocf::pacemaker:Dummy): Stopped z1.example.com (LOCKED) * fourth (ocf::pacemaker:Dummy): Started z3.example.com * fifth (ocf::pacemaker:Dummy): Stopped z1.example.com (LOCKED) ... - Start cluster services again on
z1.example.comso that it rejoins the cluster. Locked resources should get started on that node, although once they start they will not not necessarily remain on the same node.[root@z3.example.com ~]#
pcs cluster start z1.example.comStarting Cluster...In this example, resouces third and fifth are recovered on node z1.example.com.[root@z3.example.com ~]#
pcs status... Node List: * Online: [ z1.example.com z2.example.com z3.example.com ] Full List of Resources: .. * first (ocf::pacemaker:Dummy): Started z3.example.com * second (ocf::pacemaker:Dummy): Started z2.example.com * third (ocf::pacemaker:Dummy): Started z1.example.com * fourth (ocf::pacemaker:Dummy): Started z3.example.com * fifth (ocf::pacemaker:Dummy): Started z1.example.com ...