RHEL High Availability cluster nodes will not join when using Cisco network switches
Issue
Problem
- Nodes are online and communicating but can't form a quorate cluster
- Nodes start cman but take several minutes to detect the other node via multicast (openais):
Node1: Oct 19 04:34:41 node01 openais[4000]: [CLM ] CLM CONFIGURATION CHANGE Oct 19 04:34:41 node01 openais[4000]: [CLM ] New Configuration: Oct 19 04:34:41 node01 openais[4000]: [CLM ] r(0) ip(10.16.177.21) Oct 19 04:34:41 node01 openais[4000]: [CLM ] Members Left: Oct 19 04:34:41 node01 openais[4000]: [CLM ] Members Joined: Oct 19 04:34:41 node01 openais[4000]: [CLM ] r(0) ip(10.16.177.21) Oct 19 04:34:41 node01 openais[4000]: [SYNC ] This node is within the primary component and will provide service. Oct 19 04:34:41 node01 openais[4000]: [TOTEM] entering OPERATIONAL state. Oct 19 04:34:41 node01 openais[4000]: [CMAN ] quorum regained, resuming activity Oct 19 04:34:41 node01 openais[4000]: [CLM ] got nodejoin message 10.16.177.21 Node2: Oct 19 04:34:45 node02 openais[3365]: [CLM ] CLM CONFIGURATION CHANGE Oct 19 04:34:45 node02 openais[3365]: [CLM ] New Configuration: Oct 19 04:34:45 node02 openais[3365]: [CLM ] r(0) ip(10.16.177.22) Oct 19 04:34:45 node02 openais[3365]: [CLM ] Members Left: Oct 19 04:34:45 node02 openais[3365]: [CLM ] Members Joined: Oct 19 04:34:45 node02 openais[3365]: [CLM ] r(0) ip(10.16.177.22) Oct 19 04:34:45 node02 openais[3365]: [SYNC ] This node is within the primary component and will provide service. Oct 19 04:34:45 node02 openais[3365]: [TOTEM] entering OPERATIONAL state. Oct 19 04:34:45 node02 openais[3365]: [CMAN ] quorum regained, resuming activity Oct 19 04:34:45 node02 openais[3365]: [CLM ] got nodejoin message 10.16.177.22 Then 3 minutes later the nodes see each other: Oct 19 04:37:55 node01 openais[4000]: [CLM ] CLM CONFIGURATION CHANGE Oct 19 04:37:55 node01 openais[4000]: [CLM ] New Configuration: Oct 19 04:37:55 node01 openais[4000]: [CLM ] r(0) ip(10.16.177.21) Oct 19 04:37:55 node01 openais[4000]: [CLM ] r(0) ip(10.16.177.22) Oct 19 04:37:55 node01 openais[4000]: [CLM ] Members Left: Oct 19 04:37:55 node01 openais[4000]: [CLM ] Members Joined: Oct 19 04:37:55 node01 openais[4000]: [CLM ] r(0) ip(10.16.177.22) Oct 19 04:37:55 node01 openais[4000]: [SYNC ] This node is within the primary component and will provide service. Oct 19 04:37:55 node01 openais[4000]: [TOTEM] entering OPERATIONAL state. Oct 19 04:37:55 node01 openais[4000]: [MAIN ] Killing node node02 because it has rejoined the cluster with existing state Oct 19 04:37:55 node01 openais[4000]: [CMAN ] cman killed by node 2 because we rejoined the cluster without a full restart Oct 19 04:37:55 node02 openais[3365]: [CLM ] CLM CONFIGURATION CHANGE Oct 19 04:37:55 node02 openais[3365]: [CLM ] New Configuration: Oct 19 04:37:55 node02 openais[3365]: [CLM ] r(0) ip(10.16.177.21) Oct 19 04:37:55 node02 openais[3365]: [CLM ] r(0) ip(10.16.177.22) Oct 19 04:37:55 node02 openais[3365]: [CLM ] Members Left: Oct 19 04:37:55 node02 openais[3365]: [CLM ] Members Joined: Oct 19 04:37:55 node02 openais[3365]: [CLM ] r(0) ip(10.16.177.21) Oct 19 04:37:55 node02 openais[3365]: [SYNC ] This node is within the primary component and will provide service. Oct 19 04:37:55 node02 openais[3365]: [TOTEM] entering OPERATIONAL state. Oct 19 04:37:55 node02 openais[3365]: [MAIN ] Killing node node01 because it has rejoined the cluster with existing state Oct 19 04:37:55 node02 openais[3365]: [CMAN ] cman killed by node 1 because we rejoined the cluster without a full restart
Environment
- Red Hat Enterprise Linux (RHEL) 5, 6, or 7 with the High Availability Add On
- Red Hat Cluster Suite (RHCS) 4
- Multicast communications
- Network used for cluster communication contains a Cisco switch
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.