'pcs resource cleanup' of remote-node results in both remote-node and container resource being reported as FAILED and being recovered in a RHEL 6 or 7 High Availability cluster with pacemaker

Solution Unverified - Updated 2024-08-02T05:55:44+00:00 -

Issue

Often, when we got a Failed action on the remote-node name (not on the vm resource itself) , it is impossible to get rid of it , even if the vm resource is successfully restarted and the remote-node successfully connected. The pcs resource cleanup command remains inefficient on such a Failed action and we have to stop and start pacemaker to remove the Failed action.
Cleaning up a remote-node resource results in both it and the container resource (the VirtualDomain) going to a FAILED state in pcs status output and pacemaker restarts them as a result.

Feb  1 14:06:41 cs-rh7-3 pengine[18388]: warning: Recovering container resource remote7-1. Resource is unexpectedly running and involves a remote-node.
Feb  1 14:06:41 cs-rh7-3 pengine[18388]: warning: Recovering container resource remote7-1. Resource is unexpectedly running and involves a remote-node.
Feb  1 14:06:41 cs-rh7-3 pengine[18388]: warning: Recovering container resource remote7-1. Resource is unexpectedly running and involves a remote-node.
Feb  1 14:06:41 cs-rh7-3 pengine[18388]:  notice: Recover remote7-1#011(Started cs-rh7-3-clust.examplerh.com)
Feb  1 14:06:41 cs-rh7-3 pengine[18388]:  notice: Recover testIP#011(Started clusterha-remote7-1-clust.examplerh.com)
Feb  1 14:06:41 cs-rh7-3 pengine[18388]:  notice: Start   clusterha-remote7-1-clust.examplerh.com#011(cs-rh7-3-clust.examplerh.com)
Feb  1 14:06:41 cs-rh7-3 pengine[18388]:  notice: Calculated Transition 1842: /var/lib/pacemaker/pengine/pe-input-3038.bz2
Feb  1 14:06:41 cs-rh7-3 crmd[18389]:  notice: Initiating action 100: stop remote7-1_stop_0 on cs-rh7-3-clust.examplerh.com (local)
Feb  1 14:06:41 cs-rh7-3 VirtualDomain(remote7-1)[4230]: INFO: Issuing graceful shutdown request for domain remote7-1.
Feb  1 14:06:46 cs-rh7-3 journal: Guest agent is not responding: Guest agent not available for now
Feb  1 14:06:46 cs-rh7-3 crmd[18389]:   error: Unexpected disconnect on remote-node clusterha-remote7-1-clust.examplerh.com
Feb  1 14:06:46 cs-rh7-3 crmd[18389]:   error: Operation clusterha-remote7-1-clust.examplerh.com_monitor_30000 (node=cs-rh7-3-clust.examplerh.com, call=8, status=4, cib-update=2088, confirmed=false) Error
Feb  1 14:06:46 cs-rh7-3 crmd[18389]:  notice: Transition aborted by clusterha-remote7-1-clust.examplerh.com_monitor_30000 'create' on cs-rh7-3-clust.examplerh.com: Old event (magic=4:1;105:1839:0:cbbb7862-6043-4a5a-bfea-b989a8d6e0ee, cib=0.331.1, source=process_graph_event:593, 0)

Environment

Red Hat Enterprise Linux (RHEL) 7 with the High Availability Add On
pacemaker
One or more guest remote nodes - VirtualDomain resources with remote-node specified as an attribute

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Select Your Language

'pcs resource cleanup' of remote-node results in both remote-node and container resource being reported as FAILED and being recovered in a RHEL 6 or 7 High Availability cluster with pacemaker

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links