One node reports "A processor failed, forming new configuration" in a RHEL 6 High Availability cluster, but other nodes report nothing, and subsequent processor failures show inconsistent member lists

Solution In Progress - Updated -

Issue

  • We saw one node report "A processor failed" in the cluster and close the connection to one node, but it never reported a configuration change and none of the other nodes reported any processor failure or changes and no fencing took place
Jul 23 11:35:27 node1 corosync[12007]:   [TOTEM ] A processor failed, forming new configuration.
Jul 23 11:35:39 node1 dlm_controld[12218]: node_history_cluster_remove no nodeid 2
Jul 23 11:35:39 node1 kernel: dlm: closing connection to node 2
Jul 23 11:35:39 node1 gfs_controld[12265]: receive_start 6:6 add node with started_count 5
Jul 23 11:35:39 node1 gfs_controld[12265]: receive_start 6:6 add node with started_count 5
Jul 23 11:35:39 node1 gfs_controld[12265]: receive_start 6:6 add node with started_count 5
Jul 23 11:35:39 node1 gfs_controld[12265]: receive_start 6:6 add node with started_count 5
  • After a single node reported a processor failure and no other nodes reported the same thing, a node that did actually reboot resulted in a configuration change, but that one node that reported the earlier change detected a different node missing than the rest:
Jul 23 11:38:46 node1 corosync[12007]:   [TOTEM ] A processor failed, forming new configuration.
Jul 23 11:38:58 node1 corosync[12007]:   [QUORUM] Members[9]: 1 2 3 4 5 7 8 9 10
Jul 23 11:38:58 node1 corosync[12007]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul 23 11:38:58 node1 kernel: dlm: closing connection to node 6
Jul 23 11:38:56 node6 corosync[10194]:   [TOTEM ] A processor failed, forming new configuration.
Jul 23 11:39:08 node6 corosync[10194]:   [QUORUM] Members[9]: 1 3 4 5 6 7 8 9 10
Jul 23 11:39:08 node6 corosync[10194]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul 23 11:38:56 node3 corosync[11236]:   [TOTEM ] A processor failed, forming new configuration.
Jul 23 11:39:08 node3 corosync[11236]:   [QUORUM] Members[9]: 1 3 4 5 6 7 8 9 10
Jul 23 11:39:08 node3 corosync[11236]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.

Environment

  • Red Hat Enterprise Linux (RHEL) 6 with the High Availability Add On
  • Cluster of 3 or more nodes
  • corosync-1.4.1-7.el6.x86_64
    • This issue may affect other releases. This release was in use in the one instance of this being observed

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In
Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.