corosync crashes and pacemaker daemons report "Connection to the CPG API failed: Library error (2)" in RHEL 7 Update 1

Solution Unverified - Updated -

Issue

  • Using Red Hat HA Add-on We found services restarting without warning
  • corosync seems to crash and get restarted by systemd
  • cib, crmd, attrd, stonith-ng, pacemakerd are all reporting "CPG API: Library error" and connection errors
Apr  9 15:15:36 node1 cib[48941]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Apr  9 15:15:36 node1 crmd[48944]: error: crmd_quorum_destroy: connection terminated
Apr  9 15:15:36 node1 attrd[48943]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Apr  9 15:15:36 node1 stonith-ng[48942]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Apr  9 15:15:36 node1 cib[48941]: error: cib_cs_destroy: Corosync connection lost!  Exiting.
Apr  9 15:15:36 node1 stonith-ng[48942]: error: stonith_peer_cs_destroy: Corosync connection terminated
Apr  9 15:15:36 node1 pacemakerd[48940]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Apr  9 15:15:36 node1 pacemakerd[48940]: error: mcp_cpg_destroy: Connection destroyed
Apr  9 15:15:36 node1 attrd[48943]: crit: attrd_cpg_destroy: Lost connection to Corosync service!
Apr  9 15:15:36 node1 attrd[48943]: notice: main: Cleaning up before exit
Apr  9 15:15:36 node1 attrd[48943]: notice: crm_client_disconnect_all: Disconnecting client 0x1c05600, pid=48944...
Apr  9 15:15:36 node1 crmd[48944]: notice: crmd_exit: Forcing immediate exit: Link has been severed (67)
Apr  9 15:15:36 node1 lrmd[3060]: error: crm_ipc_read: Connection to stonith-ng failed
Apr  9 15:15:36 node1 lrmd[3060]: error: mainloop_gio_callback: Connection to stonith-ng[0x13743b0] closed (I/O condition=17)
Apr  9 15:15:36 node1 lrmd[3060]: error: stonith_connection_destroy_cb: LRMD lost STONITH connection
Apr  9 15:15:36 node1 lrmd[3060]: error: stonith_connection_failed: STONITH connection failed, finalizing 1 pending operations.
Apr  9 15:15:36 node1 systemd: corosync.service: main process exited, code=killed, status=6/ABRT
Apr  9 15:15:36 node1 systemd: pacemaker.service: main process exited, code=exited, status=107/n/a
Apr  9 15:15:36 node1 systemd: Unit pacemaker.service entered failed state.
Apr  9 15:15:36 node1 lrmd[3060]: warning: qb_ipcs_event_sendv: new_event_notification (3060-48944-8): Bad file descriptor (9)
Apr  9 15:15:36 node1 lrmd[3060]: warning: send_client_notify: Notification of client crmd/6220b3c5-81f7-486f-8d59-7c4a8f2f2a02 failed
Apr  9 15:15:36 node1 lrmd[3060]: warning: send_client_notify: Notification of client crmd/6220b3c5-81f7-486f-8d59-7c4a8f2f2a02 failed
Apr  9 15:15:36 node1 lrmd[3060]: warning: send_client_notify: Notification of client crmd/6220b3c5-81f7-486f-8d59-7c4a8f2f2a02 failed
Apr  9 15:15:36 node1 lrmd[3060]: warning: send_client_notify: Notification of client crmd/6220b3c5-81f7-486f-8d59-7c4a8f2f2a02 failed
Apr  9 15:15:36 node1 lrmd[3060]: warning: send_client_notify: Notification of client crmd/6220b3c5-81f7-486f-8d59-7c4a8f2f2a02 failed
Apr  9 15:15:36 node1 lrmd[3060]: warning: send_client_notify: Notification of client crmd/6220b3c5-81f7-486f-8d59-7c4a8f2f2a02 failed
Apr  9 15:15:36 node1 systemd: Unit corosync.service entered failed state.
Apr  9 15:15:36 node1 systemd: pacemaker.service holdoff time over, scheduling restart.
Apr  9 15:15:36 node1 systemd: Stopping Pacemaker High Availability Cluster Manager...
Apr  9 15:15:36 node1 systemd: Starting Corosync Cluster Engine...

Environment

  • Red Hat Enterprise Linux (RHEL) 7 with the High Availability Add On
  • corosync
  • libqb-0.17.1-1.el7_1.1

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content