Chapter 20. Resource monitoring operations

To ensure that resources remain healthy, you can add a monitoring operation to a resource’s definition. If you do not specify a monitoring operation for a resource, by default the pcs command will create a monitoring operation, with an interval that is determined by the resource agent. If the resource agent does not provide a default monitoring interval, the pcs command will create a monitoring operation with an interval of 60 seconds.

The following table summarizes the properties of a resource monitoring operation.

Table 20.1. Properties of an Operation

FieldDescription

id

Unique name for the action. The system assigns this when you configure an operation.

name

The action to perform. Common values: monitor, start, stop

interval

If set to a nonzero value, a recurring operation is created that repeats at this frequency, in seconds. A nonzero value makes sense only when the action name is set to monitor. A recurring monitor action will be executed immediately after a resource start completes, and subsequent monitor actions are scheduled starting at the time the previous monitor action completed. For example, if a monitor action with interval=20s is executed at 01:00:00, the next monitor action does not occur at 01:00:20, but at 20 seconds after the first monitor action completes.

If set to zero, which is the default value, this parameter allows you to provide values to be used for operations created by the cluster. For example, if the interval is set to zero, the name of the operation is set to start, and the timeout value is set to 40, then Pacemaker will use a timeout of 40 seconds when starting this resource. A monitor operation with a zero interval allows you to set the timeout/on-fail/enabled values for the probes that Pacemaker does at startup to get the current status of all resources when the defaults are not desirable.

timeout

If the operation does not complete in the amount of time set by this parameter, abort the operation and consider it failed. The default value is the value of timeout if set with the pcs resource op defaults command, or 20 seconds if it is not set. If you find that your system includes a resource that requires more time than the system allows to perform an operation (such as start, stop, or monitor), investigate the cause and if the lengthy execution time is expected you can increase this value.

The timeout value is not a delay of any kind, nor does the cluster wait the entire timeout period if the operation returns before the timeout period has completed.

on-fail

The action to take if this action ever fails. Allowed values:

* ignore - Pretend the resource did not fail

* block - Do not perform any further operations on the resource

* stop - Stop the resource and do not start it elsewhere

* restart - Stop the resource and start it again (possibly on a different node)

* fence - STONITH the node on which the resource failed

* standby - Move all resources away from the node on which the resource failed

* migrate - Migrate resource to another node, if possible. This is the equivalent of setting the migration-threshold resource meta option to 1.

* demote - When a promote action fails for the resource, the resource will be demoted but will not be fully stopped. When a monitor action fails for a resource, if interval is set to a nonzero value and role is set to Promoted the resource will be demoted but will not be fully stopped.

The default for the stop operation is fence when STONITH is enabled and block otherwise. All other operations default to restart.

enabled

If false, the operation is treated as if it does not exist. Allowed values: true, false

20.1. Configuring resource monitoring operations

You can configure monitoring operations when you create a resource, using the following command.

pcs resource create resource_id standard:provider:type|type [resource_options] [op operation_action operation_options [operation_type operation_options]...]

For example, the following command creates an IPaddr2 resource with a monitoring operation. The new resource is called VirtualIP with an IP address of 192.168.0.99 and a netmask of 24 on eth2. A monitoring operation will be performed every 30 seconds.

# pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.0.99 cidr_netmask=24 nic=eth2 op monitor interval=30s

Alternately, you can add a monitoring operation to an existing resource with the following command.

pcs resource op add resource_id operation_action [operation_properties]

Use the following command to delete a configured resource operation.

pcs resource op remove resource_id operation_name operation_properties
Note

You must specify the exact operation properties to properly remove an existing operation.

To change the values of a monitoring option, you can update the resource. For example, you can create a VirtualIP with the following command.

# pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.0.99 cidr_netmask=24 nic=eth2

By default, this command creates these operations.

Operations: start interval=0s timeout=20s (VirtualIP-start-timeout-20s)
            stop interval=0s timeout=20s (VirtualIP-stop-timeout-20s)
            monitor interval=10s timeout=20s (VirtualIP-monitor-interval-10s)

To change the stop timeout operation, execute the following command.

# pcs resource update VirtualIP op stop interval=0s timeout=40s

# pcs resource config VirtualIP
 Resource: VirtualIP (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=192.168.0.99 cidr_netmask=24 nic=eth2
  Operations: start interval=0s timeout=20s (VirtualIP-start-timeout-20s)
              monitor interval=10s timeout=20s (VirtualIP-monitor-interval-10s)
              stop interval=0s timeout=40s (VirtualIP-name-stop-interval-0s-timeout-40s)

20.2. Configuring global resource operation defaults

You can change the default value of a resource operation for all resources with the pcs resource op defaults update command.

The following command sets a global default of a timeout value of 240 seconds for all monitoring operations.

# pcs resource op defaults update timeout=240s

The original pcs resource op defaults name=value command, which set resource operation defaults for all resources in previous releases, remains supported unless there is more than one set of defaults configured. However, pcs resource op defaults update is now the preferred version of the command.

20.2.1. Overriding resource-specific operation values

Note that a cluster resource will use the global default only when the option is not specified in the cluster resource definition. By default, resource agents define the timeout option for all operations. For the global operation timeout value to be honored, you must create the cluster resource without the timeout option explicitly or you must remove the timeout option by updating the cluster resource, as in the following command.

# pcs resource update VirtualIP op monitor interval=10s

For example, after setting a global default of a timeout value of 240 seconds for all monitoring operations and updating the cluster resource VirtualIP to remove the timeout value for the monitor operation, the resource VirtualIP will then have timeout values for start, stop, and monitor operations of 20s, 40s and 240s, respectively. The global default value for timeout operations is applied here only on the monitor operation, where the default timeout option was removed by the previous command.

# pcs resource config VirtualIP
 Resource: VirtualIP (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: ip=192.168.0.99 cidr_netmask=24 nic=eth2
   Operations: start interval=0s timeout=20s (VirtualIP-start-timeout-20s)
               monitor interval=10s (VirtualIP-monitor-interval-10s)
               stop interval=0s timeout=40s (VirtualIP-name-stop-interval-0s-timeout-40s)

20.2.2. Changing the default value of a resource operation for sets of resources

You can create multiple sets of resource operation defaults with the pcs resource op defaults set create command, which allows you to specify a rule that contains resource and operation expressions. All of the rule expressions supported by Pacemaker are allowed.

With this comand, you can configure a default resource operation value for all resources of a particular type. For example, it is now possible to configure implicit podman resources created by Pacemaker when bundles are in use.

The following command sets a default timeout value of 90s for all operations for all podman resources. In this example, ::podman means a resource of any class, any provider, of type podman.

The id option, which names the set of resource operation defaults, is not mandatory. If you do not set this option, pcs will generate an ID automatically. Setting this value allows you to provide a more descriptive name.

# pcs resource op defaults set create id=podman-timeout meta timeout=90s rule resource ::podman

The following command sets a default timeout value of 120s for the stop operation for all resources.

# pcs resource op defaults set create id=stop-timeout meta timeout=120s rule op stop

It is possible to set the default timeout value for a specific operation for all resources of a particular type. The following example sets a default timeout value of 120s for the stop operation for all podman resources.

# pcs resource op defaults set create id=podman-stop-timeout meta timeout=120s rule resource ::podman and op stop

20.2.3. Displaying currently configured resource operation default values

The pcs resource op defaults command displays a list of currently configured default values for resource operations, including any rules you specified.

The following command displays the default operation values for a cluster which has been configured with a default timeout value of 90s for all operations for all podman resources, and for which an ID for the set of resource operation defaults has been set as podman-timeout.

# pcs resource op defaults
Meta Attrs: podman-timeout
  timeout=90s
  Rule: boolean-op=and score=INFINITY
    Expression: resource ::podman

The following command displays the default operation values for a cluster which has been configured with a default timeout value of 120s for the stop operation for all podman resources, and for which an ID for the set of resource operation defaults has been set as podman-stop-timeout.

# pcs resource op defaults
Meta Attrs: podman-stop-timeout
  timeout=120s
  Rule: boolean-op=and score=INFINITY
    Expression: resource ::podman
    Expression: op stop

20.3. Configuring multiple monitoring operations

You can configure a single resource with as many monitor operations as a resource agent supports. In this way you can do a superficial health check every minute and progressively more intense ones at higher intervals.

Note

When configuring multiple monitor operations, you must ensure that no two operations are performed at the same interval.

To configure additional monitoring operations for a resource that supports more in-depth checks at different levels, you add an OCF_CHECK_LEVEL=n option.

For example, if you configure the following IPaddr2 resource, by default this creates a monitoring operation with an interval of 10 seconds and a timeout value of 20 seconds.

# pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.0.99 cidr_netmask=24 nic=eth2

If the Virtual IP supports a different check with a depth of 10, the following command causes Pacemaker to perform the more advanced monitoring check every 60 seconds in addition to the normal Virtual IP check every 10 seconds. (As noted, you should not configure the additional monitoring operation with a 10-second interval as well.)

# pcs resource op add VirtualIP monitor interval=60s OCF_CHECK_LEVEL=10