Cluster Failover Scripts
We have just created a cluster of two nodes to host our main web applications using Apache + Php + DRBD. We have configured the cluster and everything seems to be working fine. Now we need to create scripts to auto failover when one of the resources on the server fails or under heavy load. For instance, we need to have a script which will call one webpage which in turn will check database connectivity and access time and if this call fails or takes longer than expected then the cluster service should relocate to the next node.
So has anyone done something similar before?
Responses
Suggest you create a Service Group and put your custom script and possibly the Floating IP together in it. If one fails, they will both migrate to the other node. If the relevant services are not already started on the other node, they should be part of this as well. Optionally, this Service Group could be bound to a Failover Domain comprising of your two nodes.
Your custom script could simply have wget calling a URL with a specific timeout value. If you style it as an init script, i.e. responding accordingly to start, status and stop commands, it will nicely behave with the Script Resource Type.
Hi,
We could define a custom script resource in cluster.conf file to accomplish this requirement.
The format of script resouce would be same as the one used for "init" scripts. It would need to contain the start, stop, restart, force-reload, and status actions. cluster resource group manager (rgmanager) will periodically perform the status checks on script resource (i.e. rgmanager will make a call to the status action in script resouce) if this status check returns a non-zero value, then it would result in the generic error, and the script resource will fail. Also, when the status checks on script resouce is failed, any cluster service which is contains this script resouce will be stopped by rgmanager and it will initiate the recovery action on it. As per the recovery policy defined for the service, rgmanager will either relocate the service to another node (i.e. recovery="relocate") or it will try to restart the service on same node (i.e. recovery="restart").
For more information on how to create, use script resouce in cluster, please refer to the following links:
"What are the requirements of a "script" resource in Red Hat Enterprise Linux Clusters?"
https://access.redhat.com/knowledge/solutions/271703
http://refspecs.linux-foundation.org/LSB_3.2.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html
Kind regards,
Milan.
Hello Essa Dalloul,
I'll expand on Dhruv's comment.
The wget command will need to be in the "status" section on the initscript. That way the cluster manager will use the wget to check if the service is performing acceptably.
In addition, I /think/ you will probably need to create a compound service. Lets call them services foo, bar and baz. By this I mean that service bar depends on service foo, and that the compound service is called baz.
So the baz initscript has to first call "start" on the foo service. If this is successful, then the baz script has to call "start" on the bar service. Example snippet from the baz initscript
[snip]
case "$1" in
start)
service foo start
if [ $? eq 0]
then
service bar start
fi
[/snip]
Similarly the "stop" section from the baz initscript. Here we need to stop bar before we stop foo
[snip]
case "$1" in
...
stop)
service bar stop
if [ $? eq 0 ]
then
service foo stop
fi
[/snip]
Lastly, an example of the "status" section of the initscript will be need to be
[snip]
case "$1" in
...
status)
wget --timeout=XXX http://script/to/check/the/service
RETVAL=$?
[/snip]
As you can see the wget command uses a timeout, which is what you probably want.
In addition, you probably need to use logger (please see man logger) in the foo, bar or baz initscripts so that there will be additional logs in case there is a failure at some stage.
Hope this helps
-- Pai
Hello Essa Dalloul,
If a service fails on both nodes, the cluster manager marks it as "failed". So it will not keep bouncing from one node to the other.
But please do use logger in your compound initscript. That way you will find the reasons for the failure in your syslog.
Regards,
-- Pai
I know this thread is three and a half years old, but I am doing something almost identical and having issues. The issue I am having isn't with the script, because failover works then my web call fails, or when I manually re-locate. The issue that doesn't work is when the active node goes down completely, so that the script can't be executed at all. I would assume clustering would fail over automatically if the node itself is offline, but this doesn't happen.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
