'pcs resource cleanup' of remote-node results in both remote-node and container resource being reported as FAILED and being recovered in a RHEL 6 or 7 High Availability cluster with pacemaker

Solution Unverified - Updated -

Issue

  • Often, when we got a Failed action on the remote-node name (not on the vm resource itself) , it is impossible to get rid of it , even if the vm resource is successfully restarted and the remote-node successfully connected. The pcs resource cleanup command remains inefficient on such a Failed action and we have to stop and start pacemaker to remove the Failed action.
  • Cleaning up a remote-node resource results in both it and the container resource (the VirtualDomain) going to a FAILED state in pcs status output and pacemaker restarts them as a result.
Feb  1 14:06:41 cs-rh7-3 pengine[18388]: warning: Recovering container resource remote7-1. Resource is unexpectedly running and involves a remote-node.
Feb  1 14:06:41 cs-rh7-3 pengine[18388]: warning: Recovering container resource remote7-1. Resource is unexpectedly running and involves a remote-node.
Feb  1 14:06:41 cs-rh7-3 pengine[18388]: warning: Recovering container resource remote7-1. Resource is unexpectedly running and involves a remote-node.
Feb  1 14:06:41 cs-rh7-3 pengine[18388]:  notice: Recover remote7-1#011(Started cs-rh7-3-clust.examplerh.com)
Feb  1 14:06:41 cs-rh7-3 pengine[18388]:  notice: Recover testIP#011(Started clusterha-remote7-1-clust.examplerh.com)
Feb  1 14:06:41 cs-rh7-3 pengine[18388]:  notice: Start   clusterha-remote7-1-clust.examplerh.com#011(cs-rh7-3-clust.examplerh.com)
Feb  1 14:06:41 cs-rh7-3 pengine[18388]:  notice: Calculated Transition 1842: /var/lib/pacemaker/pengine/pe-input-3038.bz2
Feb  1 14:06:41 cs-rh7-3 crmd[18389]:  notice: Initiating action 100: stop remote7-1_stop_0 on cs-rh7-3-clust.examplerh.com (local)
Feb  1 14:06:41 cs-rh7-3 VirtualDomain(remote7-1)[4230]: INFO: Issuing graceful shutdown request for domain remote7-1.
Feb  1 14:06:46 cs-rh7-3 journal: Guest agent is not responding: Guest agent not available for now
Feb  1 14:06:46 cs-rh7-3 crmd[18389]:   error: Unexpected disconnect on remote-node clusterha-remote7-1-clust.examplerh.com
Feb  1 14:06:46 cs-rh7-3 crmd[18389]:   error: Operation clusterha-remote7-1-clust.examplerh.com_monitor_30000 (node=cs-rh7-3-clust.examplerh.com, call=8, status=4, cib-update=2088, confirmed=false) Error
Feb  1 14:06:46 cs-rh7-3 crmd[18389]:  notice: Transition aborted by clusterha-remote7-1-clust.examplerh.com_monitor_30000 'create' on cs-rh7-3-clust.examplerh.com: Old event (magic=4:1;105:1839:0:cbbb7862-6043-4a5a-bfea-b989a8d6e0ee, cib=0.331.1, source=process_graph_event:593, 0)

Environment

  • Red Hat Enterprise Linux (RHEL) 7 with the High Availability Add On
  • pacemaker
  • One or more guest remote nodes - VirtualDomain resources with remote-node specified as an attribute

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content