corosync service stops randomly on one node in a RHEL 6 High Availability cluster
Issue
- corosync services stop randomly on one node
- Suddenly the cluster-related processes from
cmanandpacemakerall report that the cluster is dead andcorosynchas died, whereascorosyncdoes not print anything in any of its logs and is simply no longer running:
Jul 8 18:00:21 node1 fenced[39630]: cluster is down, exiting
Jul 8 18:00:21 node1 fenced[39630]: daemon cpg_dispatch error 2
Jul 8 18:00:21 node1 dlm_controld[39654]: cluster is down, exiting
Jul 8 18:00:21 node1 attrd[39839]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Jul 8 18:00:21 node1 gfs_controld[39701]: cluster is down, exiting
Jul 8 18:00:21 node1 pacemakerd[39830]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Jul 8 18:00:21 node1 cib[39836]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Jul 8 18:00:21 node1 crmd[39841]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Jul 8 18:00:21 node1 stonith-ng[39837]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Jul 8 18:00:21 node1 gfs_controld[39701]: daemon cpg_dispatch error 2
Jul 8 18:00:21 node1 dlm_controld[39654]: daemon cpg_dispatch error 2
Jul 8 18:00:21 node1 attrd[39839]: crit: attrd_cs_destroy: Lost connection to Corosync service!
Jul 8 18:00:21 node1 stonith-ng[39837]: error: stonith_peer_cs_destroy: Corosync connection terminated
Jul 8 18:00:21 node1 pacemakerd[39830]: error: mcp_cpg_destroy: Connection destroyed
Jul 8 18:00:21 node1 cib[39836]: error: cib_cs_destroy: Corosync connection lost! Exiting.
Jul 8 18:00:21 node1 crmd[39841]: error: crmd_cs_destroy: connection terminated
Jul 8 18:00:21 node1 attrd[39839]: error: attrd_cib_connection_destroy: Connection to the CIB terminated...
Jul 8 18:00:23 node1 kernel: dlm: closing connection to node 2
Jul 8 18:00:23 node1 kernel: dlm: closing connection to node 4
Jul 8 18:00:23 node1 kernel: dlm: closing connection to node 3
Jul 8 18:00:23 node1 kernel: dlm: closing connection to node 6
Jul 8 18:00:23 node1 kernel: dlm: closing connection to node 5
Jul 8 18:00:23 node1 kernel: dlm: closing connection to node 1
Environment
- Red Hat Enterprise Linux (RHEL) 6 with the High Availability Add On
corosync
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.