3.2. Service Policies

RGManager has three service recovery policies which may be customized by the administrator on a per-service basis.

Note

These policies also apply to virtual machine resources.

3.2.1. Start Policy

RGManager by default starts all services when RGManager boots and a quorum is present. This behavior may be altered by administrators.
  • autostart (default) - start the service when RGManager boots and a quorum forms. If set to '0', the cluster will not start the service and instead place it in to the disabled state.

3.2.2. Recovery Policy

The recovery policy is the default action RGManager takes when a service fails on a particular node. There are four available options, defined in the following list.
  • restart (default) - restart the service on the same node. If no other recovery policy is specified, this recovery policy is used. If restarting fails, RGManager falls back to relocate the service.
  • relocate - Try to start the service on other node(s) in the cluster. If no other nodes successfully start the service, the service is then placed in the stopped state.
  • disable - Do nothing. Place the service into the disabled state.
  • restart-disable - Attempt to restart the service, in place. Place the service into the disabled state if restarting fails.

3.2.3. Restart Policy Extensions

When the restart recovery policy is used, you may additionally specify a maximum threshold for how many restarts may occur on the same node in a given time. There are two parameters available for services called max_restarts and restart_expire_time which control this.
The max_restarts parameter is an integer which specifies the maximum number of restarts before giving up and relocating the service to another host in the cluster.
The restart_expire_time parameter tells RGManager how long to remember a restart event.
The use of the two parameters together creates a sliding window for the number of tolerated restarts in a given amount of time. For example:
<service name="myservice" max_restarts="3" restart_expire_time="300" ...>
  ...
</service>
The above service tolerance is 3 restarts in 5 minutes. On the fourth service failure in 300 seconds, RGManager will not restart the service and instead relocate the service to another available host in the cluster.

Note

You must specify both parameters together; the use of either parameter by itself is undefined.