'pcs resource cleanup' of remote-node results in both remote-node and container resource being reported as FAILED and being recovered in a RHEL 6 or 7 High Availability cluster with pacemaker
Issue
- Often, when we got a Failed action on the remote-node name (not on the vm resource itself) , it is impossible to get rid of it , even if the vm resource is successfully restarted and the remote-node successfully connected. The
pcs resource cleanupcommand remains inefficient on such a Failed action and we have to stop and start pacemaker to remove the Failed action. - Cleaning up a remote-node resource results in both it and the container resource (the
VirtualDomain) going to a FAILED state inpcs statusoutput andpacemakerrestarts them as a result.
Feb 1 14:06:41 cs-rh7-3 pengine[18388]: warning: Recovering container resource remote7-1. Resource is unexpectedly running and involves a remote-node.
Feb 1 14:06:41 cs-rh7-3 pengine[18388]: warning: Recovering container resource remote7-1. Resource is unexpectedly running and involves a remote-node.
Feb 1 14:06:41 cs-rh7-3 pengine[18388]: warning: Recovering container resource remote7-1. Resource is unexpectedly running and involves a remote-node.
Feb 1 14:06:41 cs-rh7-3 pengine[18388]: notice: Recover remote7-1#011(Started cs-rh7-3-clust.examplerh.com)
Feb 1 14:06:41 cs-rh7-3 pengine[18388]: notice: Recover testIP#011(Started clusterha-remote7-1-clust.examplerh.com)
Feb 1 14:06:41 cs-rh7-3 pengine[18388]: notice: Start clusterha-remote7-1-clust.examplerh.com#011(cs-rh7-3-clust.examplerh.com)
Feb 1 14:06:41 cs-rh7-3 pengine[18388]: notice: Calculated Transition 1842: /var/lib/pacemaker/pengine/pe-input-3038.bz2
Feb 1 14:06:41 cs-rh7-3 crmd[18389]: notice: Initiating action 100: stop remote7-1_stop_0 on cs-rh7-3-clust.examplerh.com (local)
Feb 1 14:06:41 cs-rh7-3 VirtualDomain(remote7-1)[4230]: INFO: Issuing graceful shutdown request for domain remote7-1.
Feb 1 14:06:46 cs-rh7-3 journal: Guest agent is not responding: Guest agent not available for now
Feb 1 14:06:46 cs-rh7-3 crmd[18389]: error: Unexpected disconnect on remote-node clusterha-remote7-1-clust.examplerh.com
Feb 1 14:06:46 cs-rh7-3 crmd[18389]: error: Operation clusterha-remote7-1-clust.examplerh.com_monitor_30000 (node=cs-rh7-3-clust.examplerh.com, call=8, status=4, cib-update=2088, confirmed=false) Error
Feb 1 14:06:46 cs-rh7-3 crmd[18389]: notice: Transition aborted by clusterha-remote7-1-clust.examplerh.com_monitor_30000 'create' on cs-rh7-3-clust.examplerh.com: Old event (magic=4:1;105:1839:0:cbbb7862-6043-4a5a-bfea-b989a8d6e0ee, cib=0.331.1, source=process_graph_event:593, 0)
Environment
- Red Hat Enterprise Linux (RHEL) 7 with the High Availability Add On
pacemaker- One or more guest remote nodes -
VirtualDomainresources withremote-nodespecified as an attribute
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
