Chapter 17. Managing cluster resources

There are a variety of commands you can use to display, modify, and administer cluster resources.

17.1. Displaying configured resources

To display a list of all configured resources, use the following command.

pcs resource status

For example, if your system is configured with a resource named VirtualIP and a resource named WebSite, the pcs resource status command yields the following output.

# pcs resource status
 VirtualIP	(ocf::heartbeat:IPaddr2):	Started
 WebSite	(ocf::heartbeat:apache):	Started

To display the configured parameters for a resource, use the following command.

pcs resource config resource_id

For example, the following command displays the currently configured parameters for resource VirtualIP.

# pcs resource config VirtualIP
 Resource: VirtualIP (type=IPaddr2 class=ocf provider=heartbeat)
  Attributes: ip=192.168.0.120 cidr_netmask=24
  Operations: monitor interval=30s

To display the status of an individual resource, use the following command.

pcs resource status resource_id

For example, if your system is configured with a resource named VirtualIP the pcs resource status VirtualIP command yields the following output.

# pcs resource status VirtualIP
 VirtualIP      (ocf::heartbeat:IPaddr2):       Started

To display the status of the resources running on a specific node, use the following command. You can use this command to display the status of resources on both cluster and remote nodes.

pcs resource status node=node_id

For example, if node-01 is running resources named VirtualIP and WebSite the pcs resource status node=node-01 command might yield the following output.

# pcs resource status node=node-01
 VirtualIP      (ocf::heartbeat:IPaddr2):       Started
 WebSite        (ocf::heartbeat:apache):        Started

17.2. Exporting cluster resources as pcs commands

As of Red Hat Enterprise Linux 9.1, you can display the pcs commands that can be used to re-create configured cluster resources on a different system using the --output-format=cmd option of the pcs resource config command.

The following commands create four resources created for an active/passive Apache HTTP server in a Red Hat high availability cluster: an LVM-activate resource, a Filesystem resource, an IPaddr2 resource, and an Apache resource.

# pcs resource create my_lvm ocf:heartbeat:LVM-activate vgname=my_vg vg_access_mode=system_id --group apachegroup
# pcs resource create my_fs Filesystem device="/dev/my_vg/my_lv" directory="/var/www" fstype="xfs" --group apachegroup
# pcs resource create VirtualIP IPaddr2 ip=198.51.100.3 cidr_netmask=24 --group apachegroup
# pcs resource create Website apache configfile="/etc/httpd/conf/httpd.conf" statusurl="http://127.0.0.1/server-status" --group apachegroup

After you create the resources, the following command displays the pcs commands you can use to re-create those resources on a different system.

# pcs resource config --output-format=cmd
pcs resource create --no-default-ops --force -- my_lvm ocf:heartbeat:LVM-activate \
  vg_access_mode=system_id vgname=my_vg \
  op \
    monitor interval=30s id=my_lvm-monitor-interval-30s timeout=90s \
    start interval=0s id=my_lvm-start-interval-0s timeout=90s \
    stop interval=0s id=my_lvm-stop-interval-0s timeout=90s;
pcs resource create --no-default-ops --force -- my_fs ocf:heartbeat:Filesystem \
  device=/dev/my_vg/my_lv directory=/var/www fstype=xfs \
  op \
    monitor interval=20s id=my_fs-monitor-interval-20s timeout=40s \
    start interval=0s id=my_fs-start-interval-0s timeout=60s \
    stop interval=0s id=my_fs-stop-interval-0s timeout=60s;
pcs resource create --no-default-ops --force -- VirtualIP ocf:heartbeat:IPaddr2 \
  cidr_netmask=24 ip=198.51.100.3 \
  op \
    monitor interval=10s id=VirtualIP-monitor-interval-10s timeout=20s \
    start interval=0s id=VirtualIP-start-interval-0s timeout=20s \
    stop interval=0s id=VirtualIP-stop-interval-0s timeout=20s;
pcs resource create --no-default-ops --force -- Website ocf:heartbeat:apache \
  configfile=/etc/httpd/conf/httpd.conf statusurl=http://127.0.0.1/server-status \
  op \
    monitor interval=10s id=Website-monitor-interval-10s timeout=20s \
    start interval=0s id=Website-start-interval-0s timeout=40s \
    stop interval=0s id=Website-stop-interval-0s timeout=60s;
pcs resource group add apachegroup \
  my_lvm my_fs VirtualIP Website

To display the pcs command or commands you can use to re-create only one configured resource, specify the resource ID for that resource.

# pcs resource config VirtualIP --output-format=cmd
pcs resource create --no-default-ops --force -- VirtualIP ocf:heartbeat:IPaddr2 \
  cidr_netmask=24 ip=198.51.100.3 \
  op \
    monitor interval=10s id=VirtualIP-monitor-interval-10s timeout=20s \
    start interval=0s id=VirtualIP-start-interval-0s timeout=20s \
    stop interval=0s id=VirtualIP-stop-interval-0s timeout=20s

17.3. Modifying resource parameters

To modify the parameters of a configured resource, use the following command.

pcs resource update resource_id [resource_options]

The following sequence of commands show the initial values of the configured parameters for resource VirtualIP, the command to change the value of the ip parameter, and the values following the update command.

# pcs resource config VirtualIP
 Resource: VirtualIP (type=IPaddr2 class=ocf provider=heartbeat)
  Attributes: ip=192.168.0.120 cidr_netmask=24
  Operations: monitor interval=30s
# pcs resource update VirtualIP ip=192.169.0.120
# pcs resource config VirtualIP
 Resource: VirtualIP (type=IPaddr2 class=ocf provider=heartbeat)
  Attributes: ip=192.169.0.120 cidr_netmask=24
  Operations: monitor interval=30s
Note

When you update a resource’s operation with the pcs resource update command, any options you do not specifically call out are reset to their default values.

17.4. Clearing failure status of cluster resources

If a resource has failed, a failure message appears when you display the cluster status with the pcs status command. After attempting to resolve the cause of the failure, you can check the updated status of the resource by running the pcs status command again, and you can check the failure count for the cluster resources with the pcs resource failcount show --full command.

You can clear that failure status of a resource with the pcs resource cleanup command. The pcs resource cleanup command resets the resource status and failcount value for the resource. This command also removes the operation history for the resource and re-detects its current state.

The following command resets the resource status and failcount value for the resource specified by resource_id.

pcs resource cleanup resource_id

If you do not specify resource_id, the pcs resource cleanup command resets the resource status and failcount value for all resources with a failure count.

In addition to the pcs resource cleanup resource_id command, you can also reset the resource status and clear the operation history of a resource with the pcs resource refresh resource_id command. As with the pcs resource cleanup command, you can run the pcs resource refresh command with no options specified to reset the resource status and failcount value for all resources.

Both the pcs resource cleanup and the pcs resource refresh commands clear the operation history for a resource and re-detect the current state of the resource. The pcs resource cleanup command operates only on resources with failed actions as shown in the cluster status, while the pcs resource refresh command operates on resources regardless of their current state.

17.5. Moving resources in a cluster

Pacemaker provides a variety of mechanisms for configuring a resource to move from one node to another and to manually move a resource when needed.

You can manually move resources in a cluster with the pcs resource move and pcs resource relocate commands, as described in Manually moving cluster resources. In addition to these commands, you can also control the behavior of cluster resources by enabling, disabling, and banning resources, as described in Disabling, enabling, and banning cluster resources.

You can configure a resource so that it will move to a new node after a defined number of failures, and you can configure a cluster to move resources when external connectivity is lost.

17.5.1. Moving resources due to failure

When you create a resource, you can configure the resource so that it will move to a new node after a defined number of failures by setting the migration-threshold option for that resource. Once the threshold has been reached, this node will no longer be allowed to run the failed resource until:

  • The resource’s failure-timeout value is reached.
  • The administrator manually resets the resource’s failure count by using the pcs resource cleanup command.

The value of migration-threshold is set to INFINITY by default. INFINITY is defined internally as a very large but finite number. A value of 0 disables the migration-threshold feature.

Note

Setting a migration-threshold for a resource is not the same as configuring a resource for migration, in which the resource moves to another location without loss of state.

The following example adds a migration threshold of 10 to the resource named dummy_resource, which indicates that the resource will move to a new node after 10 failures.

# pcs resource meta dummy_resource migration-threshold=10

You can add a migration threshold to the defaults for the whole cluster with the following command.

# pcs resource defaults update migration-threshold=10

To determine the resource’s current failure status and limits, use the pcs resource failcount show command.

There are two exceptions to the migration threshold concept; they occur when a resource either fails to start or fails to stop. If the cluster property start-failure-is-fatal is set to true (which is the default), start failures cause the failcount to be set to INFINITY and always cause the resource to move immediately.

Stop failures are slightly different and crucial. If a resource fails to stop and STONITH is enabled, then the cluster will fence the node to be able to start the resource elsewhere. If STONITH is not enabled, then the cluster has no way to continue and will not try to start the resource elsewhere, but will try to stop it again after the failure timeout.

17.5.2. Moving resources due to connectivity changes

Setting up the cluster to move resources when external connectivity is lost is a two step process.

  1. Add a ping resource to the cluster. The ping resource uses the system utility of the same name to test if a list of machines (specified by DNS host name or IPv4/IPv6 address) are reachable and uses the results to maintain a node attribute called pingd.
  2. Configure a location constraint for the resource that will move the resource to a different node when connectivity is lost.

The following table describes the properties you can set for a ping resource.

Table 17.1. Properties of a ping resources

FieldDescription

dampen

The time to wait (dampening) for further changes to occur. This prevents a resource from bouncing around the cluster when cluster nodes notice the loss of connectivity at slightly different times.

multiplier

The number of connected ping nodes gets multiplied by this value to get a score. Useful when there are multiple ping nodes configured.

host_list

The machines to contact to determine the current connectivity status. Allowed values include resolvable DNS host names, IPv4 and IPv6 addresses. The entries in the host list are space separated.

The following example command creates a ping resource that verifies connectivity to gateway.example.com. In practice, you would verify connectivity to your network gateway/router. You configure the ping resource as a clone so that the resource will run on all cluster nodes.

# pcs resource create ping ocf:pacemaker:ping dampen=5s multiplier=1000 host_list=gateway.example.com clone

The following example configures a location constraint rule for the existing resource named Webserver. This will cause the Webserver resource to move to a host that is able to ping gateway.example.com if the host that it is currently running on cannot ping gateway.example.com.

# pcs constraint location Webserver rule score=-INFINITY pingd lt 1 or not_defined pingd

17.6. Disabling a monitor operation

The easiest way to stop a recurring monitor is to delete it. However, there can be times when you only want to disable it temporarily. In such cases, add enabled="false" to the operation’s definition. When you want to reinstate the monitoring operation, set enabled="true" to the operation’s definition.

When you update a resource’s operation with the pcs resource update command, any options you do not specifically call out are reset to their default values. For example, if you have configured a monitoring operation with a custom timeout value of 600, running the following commands will reset the timeout value to the default value of 20 (or whatever you have set the default value to with the pcs resource op defaults command).

# pcs resource update resourceXZY op monitor enabled=false
# pcs resource update resourceXZY op monitor enabled=true

In order to maintain the original value of 600 for this option, when you reinstate the monitoring operation you must specify that value, as in the following example.

# pcs resource update resourceXZY op monitor timeout=600 enabled=true

17.7. Configuring and managing cluster resource tags

You can use the pcs command to tag cluster resources. This allows you to enable, disable, manage, or unmanage a specified set of resources with a single command.

17.7.1. Tagging cluster resources for administration by category

The following procedure tags two resources with a resource tag and disables the tagged resources. In this example, the existing resources to be tagged are named d-01 and d-02.

Procedure

  1. Create a tag named special-resources for resources d-01 and d-02.

    [root@node-01]# pcs tag create special-resources d-01 d-02
  2. Display the resource tag configuration.

    [root@node-01]# pcs tag config
    special-resources
      d-01
      d-02
  3. Disable all resources that are tagged with the special-resources tag.

    [root@node-01]# pcs resource disable special-resources
  4. Display the status of the resources to confirm that resources d-01 and d-02 are disabled.

    [root@node-01]# pcs resource
      * d-01        (ocf::pacemaker:Dummy): Stopped (disabled)
      * d-02        (ocf::pacemaker:Dummy): Stopped (disabled)

In addition to the pcs resource disable command, the pcs resource enable, pcs resource manage, and pcs resource unmanage commands support the administration of tagged resources.

After you have created a resource tag:

  • You can delete a resource tag with the pcs tag delete command.
  • You can modify resource tag configuration for an existing resource tag with the pcs tag update command.

17.7.2. Deleting a tagged cluster resource

You cannot delete a tagged cluster resource with the pcs command. To delete a tagged resource, use the following procedure.

Procedure

  1. Remove the resource tag.

    1. The following command removes the resource tag special-resources from all resources with that tag,

      [root@node-01]# pcs tag remove special-resources
      [root@node-01]# pcs tag
       No tags defined
    2. The following command removes the resource tag special-resources from the resource d-01 only.

      [root@node-01]# pcs tag update special-resources remove d-01
  2. Delete the resource.

    [root@node-01]# pcs resource delete d-01
    Attempting to stop: d-01... Stopped