One node reports "A processor failed, forming new configuration" in a RHEL 6 High Availability cluster, but other nodes report nothing, and subsequent processor failures show inconsistent member lists
Issue
- We saw one node report "A processor failed" in the cluster and close the connection to one node, but it never reported a configuration change and none of the other nodes reported any processor failure or changes and no fencing took place
Jul 23 11:35:27 node1 corosync[12007]: [TOTEM ] A processor failed, forming new configuration.
Jul 23 11:35:39 node1 dlm_controld[12218]: node_history_cluster_remove no nodeid 2
Jul 23 11:35:39 node1 kernel: dlm: closing connection to node 2
Jul 23 11:35:39 node1 gfs_controld[12265]: receive_start 6:6 add node with started_count 5
Jul 23 11:35:39 node1 gfs_controld[12265]: receive_start 6:6 add node with started_count 5
Jul 23 11:35:39 node1 gfs_controld[12265]: receive_start 6:6 add node with started_count 5
Jul 23 11:35:39 node1 gfs_controld[12265]: receive_start 6:6 add node with started_count 5
- After a single node reported a processor failure and no other nodes reported the same thing, a node that did actually reboot resulted in a configuration change, but that one node that reported the earlier change detected a different node missing than the rest:
Jul 23 11:38:46 node1 corosync[12007]: [TOTEM ] A processor failed, forming new configuration.
Jul 23 11:38:58 node1 corosync[12007]: [QUORUM] Members[9]: 1 2 3 4 5 7 8 9 10
Jul 23 11:38:58 node1 corosync[12007]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul 23 11:38:58 node1 kernel: dlm: closing connection to node 6
Jul 23 11:38:56 node6 corosync[10194]: [TOTEM ] A processor failed, forming new configuration.
Jul 23 11:39:08 node6 corosync[10194]: [QUORUM] Members[9]: 1 3 4 5 6 7 8 9 10
Jul 23 11:39:08 node6 corosync[10194]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul 23 11:38:56 node3 corosync[11236]: [TOTEM ] A processor failed, forming new configuration.
Jul 23 11:39:08 node3 corosync[11236]: [QUORUM] Members[9]: 1 3 4 5 6 7 8 9 10
Jul 23 11:39:08 node3 corosync[11236]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Environment
- Red Hat Enterprise Linux (RHEL) 6 with the High Availability Add On
- Cluster of 3 or more nodes
corosync-1.4.1-7.el6.x86_64- This issue may affect other releases. This release was in use in the one instance of this being observed
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.