Why did the service failed to relocate to another cluster node when the status on one of the resource was failed ?
Environment
- Red Hat Enterprise Linux Server 5 (with the High Availability and Resilient Storage Add Ons)
- Red Hat Enterprise Linux Server 6 (with the High Availability and Resilient Storage Add Ons)
Issue
- Status on one of the script resource in a service failed but the service did not failed over (relocated) to another node, why ?
Aug 9 09:33:01 node1 clurgmgrd: [7060]: <err> script:/etc/init.d/ndD3-1.sh: status of /etc/init.d/ndD3-1.sh failed (returned 1) <-------- [1]
Aug 9 09:33:01 node1 clurgmgrd[7060]: <notice> status on script "/etc/init.d/ndD3-1.sh" returned 1 (generic error)
Aug 9 09:33:01 node1 clurgmgrd[7060]: <warning> Some independent resources in service:nd1 failed; Attempting inline recovery
Aug 9 09:33:01 node1 clurgmgrd: [7060]: <err> script:/etc/init.d/ndD3-1.sh: stop of /etc/init.d/ndD3-1.sh failed (returned 1) <-------- [2]
Aug 9 09:33:01 node1 clurgmgrd[7060]: <notice> stop on script "/etc/init.d/ndD3-1.sh" returned 1 (generic error)
Aug 9 09:33:02 node1 nsca[4827]: Caught SIGTERM - shutting down...
Aug 9 09:33:02 node1 nsca[4827]: Cannot remove pidfile '/var/run/nsca1.pid' - check your privileges.
Aug 9 09:33:02 node1 nsca[4827]: Daemon shutdown
Aug 9 09:33:07 node1 multipathd: dm-16: umount map (uevent)
Aug 9 09:33:15 node1 clurgmgrd: [7060]: <notice> Deactivating vg13/nd13
Aug 9 09:33:15 node1 clurgmgrd: [7060]: <notice> Making resilient : lvchange -an vg13/nd13
Aug 9 09:33:15 node1 clurgmgrd: [7060]: <notice> Resilient command: lvchange -an vg13/nd13 --config devices{filter=["a|/dev/mapper/mpath0|","a|/dev/mapper/mpath1|","a|/dev/mapper/mpath3|","a|/dev/sda2|","r|.*|"]}
Aug 9 09:33:15 node1 multipathd: dm-16: remove map (uevent)
Aug 9 09:33:15 node1 multipathd: dm-16: devmap not registered, can't remove
Aug 9 09:33:15 node1 clurgmgrd: [7060]: <notice> Removing ownership tag (node1.example.com) from vg13/nd13
Aug 9 09:33:26 node1 clurgmgrd[7060]: <warning> Inline recovery of service:nd1 failed
Aug 9 09:33:26 node1 clurgmgrd[7060]: <notice> Stopping service service:nd1
Aug 9 09:33:26 node1 clurgmgrd: [7060]: <err> script:/etc/init.d/ndD3-1.sh: stop of /etc/init.d/ndD3-1.sh failed (returned 1) <---------- [2]
Aug 9 09:33:26 node1 clurgmgrd[7060]: <notice> stop on script "/etc/init.d/ndD3-1.sh" returned 1 (generic error)
Aug 9 09:33:26 node1 clurgmgrd: [7060]: <notice> Deactivating vg13/nd13
Aug 9 09:33:26 node1 clurgmgrd: [7060]: <notice> Making resilient : lvchange -an vg13/nd13
Aug 9 09:33:26 node1 clurgmgrd: [7060]: <notice> Resilient command: lvchange -an vg13/nd13 --config devices{filter=["a|/dev/mapper/mpath0|","a|/dev/mapper/mpath1|","a|/dev/mapper/mpath3|","a|/dev/sda2|","r|.*|"]}
Aug 9 09:33:26 node1 clurgmgrd: [7060]: <notice> Removing ownership tag (node1.example.com) from vg13/nd13
Aug 9 09:33:26 node1 clurgmgrd[7060]: <crit> #12: RG service:nd1 failed to stop; intervention required
Aug 9 09:33:26 node1 clurgmgrd[7060]: <notice> Service service:nd1 is failed <--------- [3]
Resolution
When the status on the script resource was failed [1], stop function on the script resource too failed [2] and hence the service was marked as failed [3]. When the service is marked as failed, manual intervention is required to un-fail the service. The service, first needs to be disable and then enable to start it again. Failed service will not relocate. In order that the relocate to happen, the service should be stopped on the source node.
In order to start the failed service, service needs to be disabled first and then enable/start it.
clusvcadm -d service-name <-- to disable the service
clusvcadm -e service-name <-- to enable the service
In order to write the service script to enable a service in cluster, see
How to write a service script to enable my service in Red Hat Clustering?
What are the requirements of a "script" resource in Red Hat Enterprise Linux Clusters?
Also see the below article to avoid failing of status,start and stop functions in the script resource.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
