openais FAILED TO RECEIVE in /var/log/messages in RHEL5

Solution Verified - Updated -

Issue

  • A cluster node logged the message "FAILED TO RECEIVE" and other node had lots of "Retransmit List" messages. Then the cluster nodes go into split-brain.
  • When starting cman on one node after a reboot, it hangs a long time on 'starting fencing' and never seems to become a member. If I check the logs, I see several cpg_join errors followed by "FAILED TO RECEIVE"
  Mar 10 15:26:13 node2 openais[11158]: [CLM  ] CLM CONFIGURATION CHANGE 
  Mar 10 15:26:13 node2 openais[11158]: [CLM  ] New Configuration: 
  Mar 10 15:26:13 node2 openais[11158]: [CLM  ]     r(0) ip(192.168.10.10)  
  Mar 10 15:26:13 node2 openais[11158]: [CLM  ]     r(0) ip(192.168.10.11)  
  Mar 10 15:26:13 node2 openais[11158]: [CLM  ] Members Left: 
  Mar 10 15:26:13 node2 openais[11158]: [CLM  ] Members Joined: 
  Mar 10 15:26:13 node2 openais[11158]: [CLM  ]     r(0) ip(192.168.10.10)  
  Mar 10 15:26:13 node2 openais[11158]: [SYNC ] This node is within the primary component and will provide service. 
  Mar 10 15:26:13 node2 openais[11158]: [TOTEM] entering OPERATIONAL state. 
  Mar 10 15:26:13 node2 openais[11158]: [CLM  ] got nodejoin message 192.168.10.10
  Mar 10 15:26:13 node2 openais[11158]: [CLM  ] got nodejoin message 192.168.10.11 
  Mar 10 15:26:25 node2 groupd[11171]: cpg_join error retrying
  Mar 10 15:27:05 node2 last message repeated 4 times
  Mar 10 15:28:15 node2 last message repeated 7 times
  Mar 10 15:29:25 node2 last message repeated 7 times
  Mar 10 15:30:25 node2 last message repeated 6 times
  Mar 10 15:30:35 node2 groupd[11171]: cpg_join error retrying

Environment

  • Red Hat Enterprise Linux (RHEL) 5 with the High Availability Add On

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content