Abnormal resource removal

Latest response

Hi,
In a two-node cluster, the master is on node0. After node1 is shut down, cluster resources are added to the cluster. When node1 is turned on, the resources just added on node0 are deleted. How to avoid this problem?
The version information of Linux, pacemaker and corosync is as follows:
Linux version 3.10.0-693.el7.x86_64、pacemaker-1.1.23-1.el7_9.1.x86_64、corosync-2.4.5-7.el7.x86_64
Thank you advance for sharing your experience here.

Responses

Hi Shui cc,

Could you give us a little more detail about the problem, by answering the questions below?

Are the resources removed from the pacemaker/corosync configuration, or are the resources stopped. If they are stopped, do they get started or remain stopped once the node1 is active?

After re-reading your description, this is what I have seen happing to a cluman/rgmanager (older version RH cluster), once the two nodes synced the configuration resources got restarted. I have not been able to fix it for this software.

I do not know if you can configure pacemaker/corosync to avoid restarting resources during a configuration sync.

I hope experienced users or Red Hatters can answer your question.

Regards,

Jan Gerrit

Thank you for your reply.

In the corosync.log, we can see the main information as follows(xx_pool_H is what we call cluster resources):

Mar 29 11:57:57 [4875] xx pengine: info: rsc_action_digest_cmp: Parameters to xx_pool_H_monitor_0 on node0 changed: was 8b36ac1527c6ce34f76bb21fb642d3cb vs. now f2317cad3d54cec5d7d7aa7d0bf35cf8 (reload:3.0.14) 0:0;8:3:7:c9bb2393-efb3-4248-a9a3-cc175b05e92a

Mar 29 11:57:57 [4875] xx pengine: notice: pe__clear_failcount: Clearing failure of xx_pool_H on node0 because resource parameters have changed | xx_pool_H_clear_failcount_0

Mar 29 11:57:57 xx pengine[4875]: notice: Clearing failure of xx_pool_H on node0 because resource parameters have changed

Mar 29 11:57:57 xx pengine[4875]: warning: Detected active orphan xx_pool_H running on node0

We know that the essential reason for this problem is that when the shut down node1 joins the cluster, the cluster compares the cib.xml of the two nodes and finds that there is an orphan resource, that is, xx_pool_H, so the cluster deletes the new orphaned resource. But we do not know how to control or avoid this problem? Can you share what you know?

In the cluster configuration, we set the stop-orphan-resources attribute to false. When the shutdown node1 joins the cluster, the newly added resource XX_pool_H will not be removed, but the resource is started in isolation on node0. The cluster status is as follows as shown, this is not what we want, because this situation will cause the resource to fail to start on other nodes when switching clusters. In other words, when the stop-orphan-resources attribute is set to false, how to make the start state of the resource "Started node0" instead of "RPHANED Started node0"?

Online: [ node0 node1 ]

Full list of resources:

XX_pool_H (ocf::heartbeat:zpool): ORPHANED Started node0 (unmanaged)