corosync crashes and pacemaker daemons report "Connection to the CPG API failed: Library error (2)" in RHEL 7 Update 1
Issue
- Using Red Hat HA Add-on We found services restarting without warning
corosyncseems to crash and get restarted bysystemdcib,crmd,attrd,stonith-ng,pacemakerdare all reporting "CPG API: Library error" and connection errors
Apr 9 15:15:36 node1 cib[48941]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Apr 9 15:15:36 node1 crmd[48944]: error: crmd_quorum_destroy: connection terminated
Apr 9 15:15:36 node1 attrd[48943]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Apr 9 15:15:36 node1 stonith-ng[48942]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Apr 9 15:15:36 node1 cib[48941]: error: cib_cs_destroy: Corosync connection lost! Exiting.
Apr 9 15:15:36 node1 stonith-ng[48942]: error: stonith_peer_cs_destroy: Corosync connection terminated
Apr 9 15:15:36 node1 pacemakerd[48940]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Apr 9 15:15:36 node1 pacemakerd[48940]: error: mcp_cpg_destroy: Connection destroyed
Apr 9 15:15:36 node1 attrd[48943]: crit: attrd_cpg_destroy: Lost connection to Corosync service!
Apr 9 15:15:36 node1 attrd[48943]: notice: main: Cleaning up before exit
Apr 9 15:15:36 node1 attrd[48943]: notice: crm_client_disconnect_all: Disconnecting client 0x1c05600, pid=48944...
Apr 9 15:15:36 node1 crmd[48944]: notice: crmd_exit: Forcing immediate exit: Link has been severed (67)
Apr 9 15:15:36 node1 lrmd[3060]: error: crm_ipc_read: Connection to stonith-ng failed
Apr 9 15:15:36 node1 lrmd[3060]: error: mainloop_gio_callback: Connection to stonith-ng[0x13743b0] closed (I/O condition=17)
Apr 9 15:15:36 node1 lrmd[3060]: error: stonith_connection_destroy_cb: LRMD lost STONITH connection
Apr 9 15:15:36 node1 lrmd[3060]: error: stonith_connection_failed: STONITH connection failed, finalizing 1 pending operations.
Apr 9 15:15:36 node1 systemd: corosync.service: main process exited, code=killed, status=6/ABRT
Apr 9 15:15:36 node1 systemd: pacemaker.service: main process exited, code=exited, status=107/n/a
Apr 9 15:15:36 node1 systemd: Unit pacemaker.service entered failed state.
Apr 9 15:15:36 node1 lrmd[3060]: warning: qb_ipcs_event_sendv: new_event_notification (3060-48944-8): Bad file descriptor (9)
Apr 9 15:15:36 node1 lrmd[3060]: warning: send_client_notify: Notification of client crmd/6220b3c5-81f7-486f-8d59-7c4a8f2f2a02 failed
Apr 9 15:15:36 node1 lrmd[3060]: warning: send_client_notify: Notification of client crmd/6220b3c5-81f7-486f-8d59-7c4a8f2f2a02 failed
Apr 9 15:15:36 node1 lrmd[3060]: warning: send_client_notify: Notification of client crmd/6220b3c5-81f7-486f-8d59-7c4a8f2f2a02 failed
Apr 9 15:15:36 node1 lrmd[3060]: warning: send_client_notify: Notification of client crmd/6220b3c5-81f7-486f-8d59-7c4a8f2f2a02 failed
Apr 9 15:15:36 node1 lrmd[3060]: warning: send_client_notify: Notification of client crmd/6220b3c5-81f7-486f-8d59-7c4a8f2f2a02 failed
Apr 9 15:15:36 node1 lrmd[3060]: warning: send_client_notify: Notification of client crmd/6220b3c5-81f7-486f-8d59-7c4a8f2f2a02 failed
Apr 9 15:15:36 node1 systemd: Unit corosync.service entered failed state.
Apr 9 15:15:36 node1 systemd: pacemaker.service holdoff time over, scheduling restart.
Apr 9 15:15:36 node1 systemd: Stopping Pacemaker High Availability Cluster Manager...
Apr 9 15:15:36 node1 systemd: Starting Corosync Cluster Engine...
Environment
- Red Hat Enterprise Linux (RHEL) 7 with the High Availability Add On
corosynclibqb-0.17.1-1.el7_1.1
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
