corosync crashes and pacemaker daemons report "Connection to the CPG API failed: Library error (2)" in RHEL 7 Update 1
Issue
- Using Red Hat HA Add-on We found services restarting without warning
corosync
seems to crash and get restarted bysystemd
cib
,crmd
,attrd
,stonith-ng
,pacemakerd
are all reporting "CPG API: Library error" and connection errors
Apr 9 15:15:36 node1 cib[48941]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Apr 9 15:15:36 node1 crmd[48944]: error: crmd_quorum_destroy: connection terminated
Apr 9 15:15:36 node1 attrd[48943]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Apr 9 15:15:36 node1 stonith-ng[48942]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Apr 9 15:15:36 node1 cib[48941]: error: cib_cs_destroy: Corosync connection lost! Exiting.
Apr 9 15:15:36 node1 stonith-ng[48942]: error: stonith_peer_cs_destroy: Corosync connection terminated
Apr 9 15:15:36 node1 pacemakerd[48940]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Apr 9 15:15:36 node1 pacemakerd[48940]: error: mcp_cpg_destroy: Connection destroyed
Apr 9 15:15:36 node1 attrd[48943]: crit: attrd_cpg_destroy: Lost connection to Corosync service!
Apr 9 15:15:36 node1 attrd[48943]: notice: main: Cleaning up before exit
Apr 9 15:15:36 node1 attrd[48943]: notice: crm_client_disconnect_all: Disconnecting client 0x1c05600, pid=48944...
Apr 9 15:15:36 node1 crmd[48944]: notice: crmd_exit: Forcing immediate exit: Link has been severed (67)
Apr 9 15:15:36 node1 lrmd[3060]: error: crm_ipc_read: Connection to stonith-ng failed
Apr 9 15:15:36 node1 lrmd[3060]: error: mainloop_gio_callback: Connection to stonith-ng[0x13743b0] closed (I/O condition=17)
Apr 9 15:15:36 node1 lrmd[3060]: error: stonith_connection_destroy_cb: LRMD lost STONITH connection
Apr 9 15:15:36 node1 lrmd[3060]: error: stonith_connection_failed: STONITH connection failed, finalizing 1 pending operations.
Apr 9 15:15:36 node1 systemd: corosync.service: main process exited, code=killed, status=6/ABRT
Apr 9 15:15:36 node1 systemd: pacemaker.service: main process exited, code=exited, status=107/n/a
Apr 9 15:15:36 node1 systemd: Unit pacemaker.service entered failed state.
Apr 9 15:15:36 node1 lrmd[3060]: warning: qb_ipcs_event_sendv: new_event_notification (3060-48944-8): Bad file descriptor (9)
Apr 9 15:15:36 node1 lrmd[3060]: warning: send_client_notify: Notification of client crmd/6220b3c5-81f7-486f-8d59-7c4a8f2f2a02 failed
Apr 9 15:15:36 node1 lrmd[3060]: warning: send_client_notify: Notification of client crmd/6220b3c5-81f7-486f-8d59-7c4a8f2f2a02 failed
Apr 9 15:15:36 node1 lrmd[3060]: warning: send_client_notify: Notification of client crmd/6220b3c5-81f7-486f-8d59-7c4a8f2f2a02 failed
Apr 9 15:15:36 node1 lrmd[3060]: warning: send_client_notify: Notification of client crmd/6220b3c5-81f7-486f-8d59-7c4a8f2f2a02 failed
Apr 9 15:15:36 node1 lrmd[3060]: warning: send_client_notify: Notification of client crmd/6220b3c5-81f7-486f-8d59-7c4a8f2f2a02 failed
Apr 9 15:15:36 node1 lrmd[3060]: warning: send_client_notify: Notification of client crmd/6220b3c5-81f7-486f-8d59-7c4a8f2f2a02 failed
Apr 9 15:15:36 node1 systemd: Unit corosync.service entered failed state.
Apr 9 15:15:36 node1 systemd: pacemaker.service holdoff time over, scheduling restart.
Apr 9 15:15:36 node1 systemd: Stopping Pacemaker High Availability Cluster Manager...
Apr 9 15:15:36 node1 systemd: Starting Corosync Cluster Engine...
Environment
- Red Hat Enterprise Linux (RHEL) 7 with the High Availability Add On
corosync
libqb-0.17.1-1.el7_1.1
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.