corosync service stops randomly on one node in a RHEL 6 High Availability cluster

Solution In Progress - Updated -

Issue

  • corosync services stop randomly on one node
  • Suddenly the cluster-related processes from cman and pacemaker all report that the cluster is dead and corosync has died, whereas corosync does not print anything in any of its logs and is simply no longer running:
Jul  8 18:00:21 node1 fenced[39630]: cluster is down, exiting
Jul  8 18:00:21 node1 fenced[39630]: daemon cpg_dispatch error 2
Jul  8 18:00:21 node1 dlm_controld[39654]: cluster is down, exiting
Jul  8 18:00:21 node1 attrd[39839]:    error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Jul  8 18:00:21 node1 gfs_controld[39701]: cluster is down, exiting
Jul  8 18:00:21 node1 pacemakerd[39830]:    error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Jul  8 18:00:21 node1 cib[39836]:    error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Jul  8 18:00:21 node1 crmd[39841]:    error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Jul  8 18:00:21 node1 stonith-ng[39837]:    error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Jul  8 18:00:21 node1 gfs_controld[39701]: daemon cpg_dispatch error 2
Jul  8 18:00:21 node1 dlm_controld[39654]: daemon cpg_dispatch error 2
Jul  8 18:00:21 node1 attrd[39839]:     crit: attrd_cs_destroy: Lost connection to Corosync service!
Jul  8 18:00:21 node1 stonith-ng[39837]:    error: stonith_peer_cs_destroy: Corosync connection terminated
Jul  8 18:00:21 node1 pacemakerd[39830]:    error: mcp_cpg_destroy: Connection destroyed
Jul  8 18:00:21 node1 cib[39836]:    error: cib_cs_destroy: Corosync connection lost!  Exiting.
Jul  8 18:00:21 node1 crmd[39841]:    error: crmd_cs_destroy: connection terminated
Jul  8 18:00:21 node1 attrd[39839]:    error: attrd_cib_connection_destroy: Connection to the CIB terminated...
Jul  8 18:00:23 node1 kernel: dlm: closing connection to node 2
Jul  8 18:00:23 node1 kernel: dlm: closing connection to node 4
Jul  8 18:00:23 node1 kernel: dlm: closing connection to node 3
Jul  8 18:00:23 node1 kernel: dlm: closing connection to node 6
Jul  8 18:00:23 node1 kernel: dlm: closing connection to node 5
Jul  8 18:00:23 node1 kernel: dlm: closing connection to node 1

Environment

  • Red Hat Enterprise Linux (RHEL) 6 with the High Availability Add On
  • corosync

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In
Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.