openais FAILED TO RECEIVE in /var/log/messages in RHEL5
Issue
- A cluster node logged the message "FAILED TO RECEIVE" and other node had lots of "Retransmit List" messages. Then the cluster nodes go into split-brain.
- When starting
cmanon one node after a reboot, it hangs a long time on 'starting fencing' and never seems to become a member. If I check the logs, I see severalcpg_joinerrors followed by "FAILED TO RECEIVE"
Mar 10 15:26:13 node2 openais[11158]: [CLM ] CLM CONFIGURATION CHANGE
Mar 10 15:26:13 node2 openais[11158]: [CLM ] New Configuration:
Mar 10 15:26:13 node2 openais[11158]: [CLM ] r(0) ip(192.168.10.10)
Mar 10 15:26:13 node2 openais[11158]: [CLM ] r(0) ip(192.168.10.11)
Mar 10 15:26:13 node2 openais[11158]: [CLM ] Members Left:
Mar 10 15:26:13 node2 openais[11158]: [CLM ] Members Joined:
Mar 10 15:26:13 node2 openais[11158]: [CLM ] r(0) ip(192.168.10.10)
Mar 10 15:26:13 node2 openais[11158]: [SYNC ] This node is within the primary component and will provide service.
Mar 10 15:26:13 node2 openais[11158]: [TOTEM] entering OPERATIONAL state.
Mar 10 15:26:13 node2 openais[11158]: [CLM ] got nodejoin message 192.168.10.10
Mar 10 15:26:13 node2 openais[11158]: [CLM ] got nodejoin message 192.168.10.11
Mar 10 15:26:25 node2 groupd[11171]: cpg_join error retrying
Mar 10 15:27:05 node2 last message repeated 4 times
Mar 10 15:28:15 node2 last message repeated 7 times
Mar 10 15:29:25 node2 last message repeated 7 times
Mar 10 15:30:25 node2 last message repeated 6 times
Mar 10 15:30:35 node2 groupd[11171]: cpg_join error retrying
Environment
- Red Hat Enterprise Linux (RHEL) 5 with the High Availability Add On
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
