Red Hat Training

A Red Hat training course is available for RHEL 8

Chapter 33. Interpreting resource agent OCF return codes

Pacemaker resource agents conform to the Open Cluster Framework (OCF) Resource Agent API. This following tables describe the OCF return codes and how they are interpreted by Pacemaker.

The first thing the cluster does when an agent returns a code is to check the return code against the expected result. If the result does not match the expected value, then the operation is considered to have failed, and recovery action is initiated.

For any invocation, resource agents must exit with a defined return code that informs the caller of the outcome of the invoked action.

There are three types of failure recovery, as described in the following table.

Table 33.1. Types of Recovery Performed by the Cluster

TypeDescriptionAction Taken by the Cluster

soft

A transient error occurred.

Restart the resource or move it to a new location .

hard

A non-transient error that may be specific to the current node occurred.

Move the resource elsewhere and prevent it from being retried on the current node.

fatal

A non-transient error that will be common to all cluster nodes occurred (for example, a bad configuration was specified).

Stop the resource and prevent it from being started on any cluster node.

The following table provides The OCF return codes and the type of recovery the cluster will initiate when a failure code is received.Note that even actions that return 0 (OCF alias OCF_SUCCESS) can be considered to have failed, if 0 was not the expected return value.

Table 33.2. OCF Return Codes

Return CodeOCF LabelDescription

0

OCF_SUCCESS

* The action completed successfully. This is the expected return code for any successful start, stop, promote, and demote command.

* Type if unexpected: soft

1

OCF_ERR_GENERIC

* The action returned a generic error.

* Type: soft

* The resource manager will attempt to recover the resource or move it to a new location.

2

OCF_ERR_ARGS

* The resource’s configuration is not valid on this machine. For example, it refers to a location not found on the node.

* Type: hard

* The resource manager will move the resource elsewhere and prevent it from being retried on the current node

3

OCF_ERR_UNIMPLEMENTED

* The requested action is not implemented.

* Type: hard

4

OCF_ERR_PERM

* The resource agent does not have sufficient privileges to complete the task. This may be due, for example, to the agent not being able to open a certain file, to listen on a specific socket, or to write to a directory.

* Type: hard

* Unless specifically configured otherwise, the resource manager will attempt to recover a resource which failed with this error by restarting the resource on a different node (where the permission problem may not exist).

5

OCF_ERR_INSTALLED

* A required component is missing on the node where the action was executed. This may be due to a required binary not being executable, or a vital configuration file being unreadable.

* Type: hard

* Unless specifically configured otherwise, the resource manager will attempt to recover a resource which failed with this error by restarting the resource on a different node (where the required files or binaries may be present).

6

OCF_ERR_CONFIGURED

* The resource’s configuration on the local node is invalid.

* Type: fatal

* When this code is returned, Pacemaker will prevent the resource from running on any node in the cluster, even if the service configuraiton is valid on some other node.

7

OCF_NOT_RUNNING

* The resource is safely stopped. This implies that the resource has either gracefully shut down, or has never been started.

* Type if unexpected: soft

* The cluster will not attempt to stop a resource that returns this for any action.

8

OCF_RUNNING_PROMOTED

* The resource is running in promoted role.

* Type if unexpected: soft

9

OCF_FAILED_PROMOTED

* The resource is (or might be) in promoted role but has failed.

* Type: soft

* The resource will be demoted, stopped and then started (and possibly promoted) again.

190

 

* (RHEL 8.4 and later) The service is found to be properly active, but in such a condition that future failures are more likely.

191

 

* (RHEL 8.4 and later) The resource agent supports roles and the service is found to be properly active in the promoted role, but in such a condition that future failures are more likely.

other

N/A

Custom error code.