C.4. Failure Recovery and Independent Subtrees
In most enterprise environments, the normal course of action for failure recovery of a service is to restart the entire service if any component in the service fails. For example, in Example C.6, “Service foo Normal Failure Recovery”, if any of the scripts defined in this service fail, the normal course of action is to restart (or relocate or disable, according to the service recovery policy) the service. However, in some circumstances certain parts of a service may be considered non-critical; it may be necessary to restart only part of the service in place before attempting normal recovery. To accomplish that, you can use the
__independent_subtreeattribute. For example, in Example C.7, “Service foo Failure Recovery with
__independent_subtreeattribute is used to accomplish the following actions:
- If script:script_one fails, restart script:script_one, script:script_two, and script:script_three.
- If script:script_two fails, restart just script:script_two.
- If script:script_three fails, restart script:script_one, script:script_two, and script:script_three.
- If script:script_four fails, restart the whole service.
Example C.6. Service foo Normal Failure Recovery
<service name="foo"> <script name="script_one" ...> <script name="script_two" .../> </script> <script name="script_three" .../> </service>
Example C.7. Service foo Failure Recovery with
<service name="foo"> <script name="script_one" __independent_subtree="1" ...> <script name="script_two" __independent_subtree="1" .../> <script name="script_three" .../> </script> <script name="script_four" .../> </service>
In some circumstances, if a component of a service fails you may want to disable only that component without disabling the entire service, to avoid affecting other services that use other components of that service. As of the Red Hat Enterprise Linux 6.1 release, you can accomplish that by using the
__independent_subtree="2"attribute, which designates the independent subtree as non-critical.
You may only use the non-critical flag on singly-referenced resources. The non-critical flag works with all resources at all levels of the resource tree, but should not be used at the top level when defining services or virtual machines.
As of the Red Hat Enterprise Linux 6.1 release, you can set maximum restart and restart expirations on a per-node basis in the resource tree for independent subtrees. To set these thresholds, you can use the following attributes:
__max_restartsconfigures the maximum number of tolerated restarts prior to giving up.
__restart_expire_timeconfigures the amount of time, in seconds, after which a restart is no longer attempted.