openais FAILED TO RECEIVE in /var/log/messages in RHEL5
Issue
- A cluster node logged the message "FAILED TO RECEIVE" and other node had lots of "Retransmit List" messages. Then the cluster nodes go into split-brain.
- When starting
cman
on one node after a reboot, it hangs a long time on 'starting fencing' and never seems to become a member. If I check the logs, I see severalcpg_join
errors followed by "FAILED TO RECEIVE"
Mar 10 15:26:13 node2 openais[11158]: [CLM ] CLM CONFIGURATION CHANGE
Mar 10 15:26:13 node2 openais[11158]: [CLM ] New Configuration:
Mar 10 15:26:13 node2 openais[11158]: [CLM ] r(0) ip(192.168.10.10)
Mar 10 15:26:13 node2 openais[11158]: [CLM ] r(0) ip(192.168.10.11)
Mar 10 15:26:13 node2 openais[11158]: [CLM ] Members Left:
Mar 10 15:26:13 node2 openais[11158]: [CLM ] Members Joined:
Mar 10 15:26:13 node2 openais[11158]: [CLM ] r(0) ip(192.168.10.10)
Mar 10 15:26:13 node2 openais[11158]: [SYNC ] This node is within the primary component and will provide service.
Mar 10 15:26:13 node2 openais[11158]: [TOTEM] entering OPERATIONAL state.
Mar 10 15:26:13 node2 openais[11158]: [CLM ] got nodejoin message 192.168.10.10
Mar 10 15:26:13 node2 openais[11158]: [CLM ] got nodejoin message 192.168.10.11
Mar 10 15:26:25 node2 groupd[11171]: cpg_join error retrying
Mar 10 15:27:05 node2 last message repeated 4 times
Mar 10 15:28:15 node2 last message repeated 7 times
Mar 10 15:29:25 node2 last message repeated 7 times
Mar 10 15:30:25 node2 last message repeated 6 times
Mar 10 15:30:35 node2 groupd[11171]: cpg_join error retrying
Environment
- Red Hat Enterprise Linux (RHEL) 5 with the High Availability Add On
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.