High Availability does not Fail Over when a node goes offline
I have a 2-node cluster, which has a pretty minimal configuration:
2 nodes, each of which is a RHEL 6.7 VM running on a different ESX host server in a bladecenter.
Failover Domain which is Prioritized and Restricted
3 Virtual IP Address Resources
One Service Group, which consists of:
Parent Virtual IP Address (Primary)
Children of 2 other virtual IP addresses, and one RHEL service. The service simply calls a script from /etc/init.d, which starts a custom tomcat instance.
The cluster is online and healthy. If I manually try to failover the service from one one to another, everything works fine.
If I try to simulate a hardware failure by doing a hard power off on the (VM) node which is currently in control of the service, nothing happens.
On the second node, running clustat shows that node1 is offline, but then shows that the service name is still "started" and the owner (Last) is still the first node. It will not even allow me to force the service to migrate to the 2nd node. (That just hangs)
If I bring the power back up, then the first node comes back online, and THEN it actually tries to fail over to the second node, which is no longer necessary.
What I am wondering is how to make sure that the service fails over to the second node when the power (or any other hardware failure) is lost on the first node. I am unable to upload a cluster.conf due to the files being on a different network, and the server does not have internet connectivity.
Some additional notes:
I am not currently using fencing, though I have tested the vmware fencing option and that did not resolve the issue.
I also tried the cman two_node=1, expected_votes=1 in my cluster.conf for possible quorum issues but that did not resolve the issue.
Thanks!