corosync service stops randomly on one node in a RHEL 6 High Availability cluster
Issue
- corosync services stop randomly on one node
- Suddenly the cluster-related processes from
cmanandpacemakerall report that the cluster is dead andcorosynchas died, whereascorosyncdoes not print anything in any of its logs and is simply no longer running:
Jul 8 18:00:21 node1 fenced[39630]: cluster is down, exiting
Jul 8 18:00:21 node1 fenced[39630]: daemon cpg_dispatch error 2
Jul 8 18:00:21 node1 dlm_controld[39654]: cluster is down, exiting
Jul 8 18:00:21 node1 attrd[39839]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Jul 8 18:00:21 node1 gfs_controld[39701]: cluster is down, exiting
Jul 8 18:00:21 node1 pacemakerd[39830]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Jul 8 18:00:21 node1 cib[39836]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Jul 8 18:00:21 node1 crmd[39841]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Jul 8 18:00:21 node1 stonith-ng[39837]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Jul 8 18:00:21 node1 gfs_controld[39701]: daemon cpg_dispatch error 2
Jul 8 18:00:21 node1 dlm_controld[39654]: daemon cpg_dispatch error 2
Jul 8 18:00:21 node1 attrd[39839]: crit: attrd_cs_destroy: Lost connection to Corosync service!
Jul 8 18:00:21 node1 stonith-ng[39837]: error: stonith_peer_cs_destroy: Corosync connection terminated
Jul 8 18:00:21 node1 pacemakerd[39830]: error: mcp_cpg_destroy: Connection destroyed
Jul 8 18:00:21 node1 cib[39836]: error: cib_cs_destroy: Corosync connection lost! Exiting.
Jul 8 18:00:21 node1 crmd[39841]: error: crmd_cs_destroy: connection terminated
Jul 8 18:00:21 node1 attrd[39839]: error: attrd_cib_connection_destroy: Connection to the CIB terminated...
Jul 8 18:00:23 node1 kernel: dlm: closing connection to node 2
Jul 8 18:00:23 node1 kernel: dlm: closing connection to node 4
Jul 8 18:00:23 node1 kernel: dlm: closing connection to node 3
Jul 8 18:00:23 node1 kernel: dlm: closing connection to node 6
Jul 8 18:00:23 node1 kernel: dlm: closing connection to node 5
Jul 8 18:00:23 node1 kernel: dlm: closing connection to node 1
Environment
- Red Hat Enterprise Linux (RHEL) 6 with the High Availability Add On
corosync
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
