Red Hat Training

A Red Hat training course is available for Red Hat Enterprise Linux

9.9. Configuring Resources to Remain Stopped on Clean Node Shutdown (Red Hat Enterprise Linux 7.8 and later)

When a cluster node shuts down, Pacemaker’s default response is to stop all resources running on that node and recover them elsewhere, even if the shutdown is a clean shutdown. As of Red Hat Enterprise Linux 7.8, you can configure Pacemaker so that when a node shuts down cleanly, the resources attached to the node will be locked to the node and unable to start elsewhere until they start again when the node that has shut down rejoins the cluster. This allows you to power down nodes during maintenance windows when service outages are acceptable without causing that node’s resources to fail over to other nodes in the cluster.

9.9.1. Cluster Properties to Configure Resources to Remain Stopped on Clean Node Shutdown

The ability to prevent resources from failing over on a clean node shutdown is implemented by means of the following cluster properties.
shutdown-lock
When this cluster property is set to the default value of false, the cluster will recover resources that are active on nodes being cleanly shut down. When this property is set to true, resources that are active on the nodes being cleanly shut down are unable to start elsewhere until they start on the node again after it rejoins the cluster.
The shutdown-lock property will work for either cluster nodes or remote nodes, but not guest nodes.
If shutdown-lock is set to true, you can remove the lock on one cluster resource when a node is down so that the resource can start elsewhere by performing a manual refresh on the node with the following command.
pcs resource refresh resource --node node
Note that once the resources are unlocked, the cluster is free to move the resources elsewhere. You can control the likelihood of this occurring by using stickiness values or location preferences for the resource.

Note

A manual refresh will work with remote nodes only if you first run the following commands:
  1. Run the systemctl stop pacemaker_remote command on the remote node to stop the node.
  2. Run the pcs resource disable remote-connection-resource command.
You can then perform a manual refresh on the remote node.
shutdown-lock-limit
When this cluster property is set to a time other than the default value of 0, resources will be available for recovery on other nodes if the node does not rejoin within the specified time since the shutdown was initiated. Note, however, that the time interval will not be checked any more often than the value of the cluster-recheck-interval cluster property.

Note

The shutdown-lock-limit property will work with remote nodes only if you first run the following commands:
  1. Run the systemctl stop pacemaker_remote command on the remote node to stop the node.
  2. Run the pcs resource disable remote-connection-resource command.
After you run these commands, the resources that had been running on the remote node will be available for recovery on other nodes when the amount of time specified as the shutdown-lock-limit has passed.

9.9.2. Setting the shutdown-lock Cluster Property

The following example sets the shutdown-lock cluster property to true in an example cluster and shows the effect this has when the node is shut down and started again. This example cluster consists of three nodes: z1.example.com, z2.example.com, and z3.example.com.
  1. Set the shutdown-lock property to to true and verify its value. In this example the shutdown-lock-limit property maintains its default value of 0.
    [root@z3.example.com ~]# pcs property set shutdown-lock=true
    [root@z3.example.com ~]# pcs property list --all | grep shutdown-lock
    shutdown-lock: true
    shutdown-lock-limit: 0
    
  2. Check the status of the cluster. In this example, resources third and fifth are running on z1.example.com.
    [root@z3.example.com ~]# pcs status
    ...
    Full List of Resources:
    ...
    * first	(ocf::pacemaker:Dummy):	Started z3.example.com
    * second	(ocf::pacemaker:Dummy):	Started z2.example.com
    * third	(ocf::pacemaker:Dummy):	Started z1.example.com
    * fourth	(ocf::pacemaker:Dummy):	Started z2.example.com
    * fifth	(ocf::pacemaker:Dummy):	Started z1.example.com
    ...
    
  3. Shut down z1.example.com, which will stop the resources that are running on that node.
    [root@z3.example.com ~] # pcs cluster stop z1.example.com
    Stopping Cluster (pacemaker)...
    Stopping Cluster (corosync)...
    
    Running the pcs status command shows that node z1.example.com is offline and that the resources that had been running on z1.example.com are LOCKED while the node is down.
    [root@z3.example.com ~]# pcs status
    ...
    
    Node List:
    * Online: [ z2.example.com z3.example.com ]
    * OFFLINE: [ z1.example.com ]
    
    Full List of Resources:
    ...
    * first	(ocf::pacemaker:Dummy):	Started z3.example.com
    * second	(ocf::pacemaker:Dummy):	Started z2.example.com
    * third	(ocf::pacemaker:Dummy):	Stopped z1.example.com (LOCKED)
    * fourth	(ocf::pacemaker:Dummy):	Started z3.example.com
    * fifth	(ocf::pacemaker:Dummy):	Stopped z1.example.com (LOCKED)
    ...
    
  4. Start cluster services again on z1.example.com so that it rejoins the cluster. Locked resources should get started on that node, although once they start they will not not necessarily remain on the same node.
    [root@z3.example.com ~]# pcs cluster start z1.example.com
    Starting Cluster...
    
    In this example, resouces third and fifth are recovered on node z1.example.com.
    [root@z3.example.com ~]# pcs status
    ...
    
    Node List:
    * Online: [ z1.example.com z2.example.com z3.example.com ]
    
    Full List of Resources:
    ..
    * first	(ocf::pacemaker:Dummy):	Started z3.example.com
    * second	(ocf::pacemaker:Dummy):	Started z2.example.com
    * third	(ocf::pacemaker:Dummy):	Started z1.example.com
    * fourth	(ocf::pacemaker:Dummy):	Started z3.example.com
    * fifth	(ocf::pacemaker:Dummy):	Started z1.example.com
    
    ...