'pcs resource cleanup' of remote-node results in both remote-node and container resource being reported as FAILED and being recovered in a RHEL 6 or 7 High Availability cluster with pacemaker
Issue
- Often, when we got a Failed action on the remote-node name (not on the vm resource itself) , it is impossible to get rid of it , even if the vm resource is successfully restarted and the remote-node successfully connected. The
pcs resource cleanup
command remains inefficient on such a Failed action and we have to stop and start pacemaker to remove the Failed action. - Cleaning up a remote-node resource results in both it and the container resource (the
VirtualDomain
) going to a FAILED state inpcs status
output andpacemaker
restarts them as a result.
Feb 1 14:06:41 cs-rh7-3 pengine[18388]: warning: Recovering container resource remote7-1. Resource is unexpectedly running and involves a remote-node.
Feb 1 14:06:41 cs-rh7-3 pengine[18388]: warning: Recovering container resource remote7-1. Resource is unexpectedly running and involves a remote-node.
Feb 1 14:06:41 cs-rh7-3 pengine[18388]: warning: Recovering container resource remote7-1. Resource is unexpectedly running and involves a remote-node.
Feb 1 14:06:41 cs-rh7-3 pengine[18388]: notice: Recover remote7-1#011(Started cs-rh7-3-clust.examplerh.com)
Feb 1 14:06:41 cs-rh7-3 pengine[18388]: notice: Recover testIP#011(Started clusterha-remote7-1-clust.examplerh.com)
Feb 1 14:06:41 cs-rh7-3 pengine[18388]: notice: Start clusterha-remote7-1-clust.examplerh.com#011(cs-rh7-3-clust.examplerh.com)
Feb 1 14:06:41 cs-rh7-3 pengine[18388]: notice: Calculated Transition 1842: /var/lib/pacemaker/pengine/pe-input-3038.bz2
Feb 1 14:06:41 cs-rh7-3 crmd[18389]: notice: Initiating action 100: stop remote7-1_stop_0 on cs-rh7-3-clust.examplerh.com (local)
Feb 1 14:06:41 cs-rh7-3 VirtualDomain(remote7-1)[4230]: INFO: Issuing graceful shutdown request for domain remote7-1.
Feb 1 14:06:46 cs-rh7-3 journal: Guest agent is not responding: Guest agent not available for now
Feb 1 14:06:46 cs-rh7-3 crmd[18389]: error: Unexpected disconnect on remote-node clusterha-remote7-1-clust.examplerh.com
Feb 1 14:06:46 cs-rh7-3 crmd[18389]: error: Operation clusterha-remote7-1-clust.examplerh.com_monitor_30000 (node=cs-rh7-3-clust.examplerh.com, call=8, status=4, cib-update=2088, confirmed=false) Error
Feb 1 14:06:46 cs-rh7-3 crmd[18389]: notice: Transition aborted by clusterha-remote7-1-clust.examplerh.com_monitor_30000 'create' on cs-rh7-3-clust.examplerh.com: Old event (magic=4:1;105:1839:0:cbbb7862-6043-4a5a-bfea-b989a8d6e0ee, cib=0.331.1, source=process_graph_event:593, 0)
Environment
- Red Hat Enterprise Linux (RHEL) 7 with the High Availability Add On
pacemaker
- One or more guest remote nodes -
VirtualDomain
resources withremote-node
specified as an attribute
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.