A node fails to join the cluster or start resources and the logs show "notice: mcp_read_config: Configured corosync to accept connections from group 189: Library error (2)" in a RHEL 7 High Availability cluster
Issue
- When we start
pacemaker
the nodes inpcs status
are shown as "unclean" indefinitely. pacemaker
reports a Library Error when configuringcorosync
to accept connections from thehacluster
user:
Apr 4 20:07:04 node2 pacemakerd[162412]: notice: mcp_read_config: Configured corosync to accept connections from group 189: Library error (2)
corosync
seems to have crashed from aSIGABRT
and then whensystemd
restarted it pacemaker daemons couldn't start and report a number of errors starting with "notice: mcp_read_config: Configured corosync to accept connections from group 189: Library error (2)".corosync
repeatedly claims that it denied a connection due to invalid IPC credentials.
Apr 4 20:07:03 node2 stonith-ng[93181]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Apr 4 20:07:03 node2 attrd[93182]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Apr 4 20:07:03 node2 pacemakerd[93179]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Apr 4 20:07:03 node2 crmd[93183]: error: crmd_quorum_destroy: connection terminated
Apr 4 20:07:03 node2 pacemakerd[93179]: error: mcp_cpg_destroy: Connection destroyed
Apr 4 20:07:03 node2 stonith-ng[93181]: error: stonith_peer_cs_destroy: Corosync connection terminated
Apr 4 20:07:03 node2 attrd[93182]: crit: attrd_cpg_destroy: Lost connection to Corosync service!
Apr 4 20:07:03 node2 attrd[93182]: notice: main: Cleaning up before exit
Apr 4 20:07:03 node2 attrd[93182]: notice: crm_client_disconnect_all: Disconnecting client 0x1bd94a0, pid=93183...
Apr 4 20:07:03 node2 crmd[93183]: notice: crmd_exit: Forcing immediate exit: Link has been severed (67)
Apr 4 20:07:03 node2 lrmd[7841]: warning: qb_ipcs_event_sendv: new_event_notification (7841-93183-8): Bad file descriptor (9)
Apr 4 20:07:03 node2 lrmd[7841]: warning: send_client_notify: Notification of client crmd/0568b3bf-820e-48bb-afe1-6cf1bfd38ead failed
Apr 4 20:07:03 node2 cib[93180]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Apr 4 20:07:03 node2 cib[93180]: error: cib_cs_destroy: Corosync connection lost! Exiting.
Apr 4 20:07:03 node2 lrmd[7841]: warning: send_client_notify: Notification of client crmd/0568b3bf-820e-48bb-afe1-6cf1bfd38ead failed
Apr 4 20:07:03 node2 systemd: corosync.service: main process exited, code=killed, status=6/ABRT
Apr 4 20:07:03 node2 systemd: pacemaker.service: main process exited, code=exited, status=107/n/a
Apr 4 20:07:03 node2 systemd: Unit pacemaker.service entered failed state.
Apr 4 20:07:03 node2 systemd: Unit corosync.service entered failed state.
Apr 4 20:07:03 node2 systemd: pacemaker.service holdoff time over, scheduling restart.
Apr 4 20:07:03 node2 systemd: Stopping Pacemaker High Availability Cluster Manager...
Apr 4 20:07:03 node2 systemd: Starting Corosync Cluster Engine...
Apr 4 20:07:03 node2 corosync[162405]: [MAIN ] Corosync Cluster Engine ('2.3.4'): started and ready to provide service.
Apr 4 20:07:03 node2 corosync[162405]: [MAIN ] Corosync built-in features: dbus systemd xmlconf snmp pie relro bindnow
Apr 4 20:07:03 node2 corosync[162406]: [TOTEM ] Initializing transport (UDP/IP Unicast).
Apr 4 20:07:03 node2 corosync[162406]: [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none
Apr 4 20:07:03 node2 corosync[162406]: [TOTEM ] The network interface [192.168.10.2] is now up.
Apr 4 20:07:03 node2 corosync[162406]: [SERV ] Service engine loaded: corosync configuration map access [0]
Apr 4 20:07:03 node2 corosync[162406]: [QB ] server name: cmap
Apr 4 20:07:03 node2 corosync[162406]: [SERV ] Service engine loaded: corosync configuration service [1]
Apr 4 20:07:03 node2 corosync[162406]: [QB ] server name: cfg
Apr 4 20:07:03 node2 corosync[162406]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Apr 4 20:07:03 node2 corosync[162406]: [QB ] server name: cpg
Apr 4 20:07:03 node2 corosync[162406]: [SERV ] Service engine loaded: corosync profile loading service [4]
Apr 4 20:07:03 node2 corosync[162406]: [QUORUM] Using quorum provider corosync_votequorum
Apr 4 20:07:03 node2 corosync[162406]: [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2
Apr 4 20:07:03 node2 corosync[162406]: [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5]
Apr 4 20:07:03 node2 corosync[162406]: [QB ] server name: votequorum
Apr 4 20:07:03 node2 corosync[162406]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Apr 4 20:07:03 node2 corosync[162406]: [QB ] server name: quorum
Apr 4 20:07:03 node2 corosync[162406]: [TOTEM ] adding new UDPU member {c}
Apr 4 20:07:03 node2 corosync[162406]: [TOTEM ] adding new UDPU member {192.168.10.2}
Apr 4 20:07:03 node2 corosync[162406]: [TOTEM ] A new membership (192.168.10.2:40) was formed. Members joined: 2
Apr 4 20:07:03 node2 corosync[162406]: [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2
Apr 4 20:07:03 node2 corosync[162406]: [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2
Apr 4 20:07:03 node2 corosync[162406]: [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2
Apr 4 20:07:03 node2 corosync[162406]: [QUORUM] Members[1]: 2
Apr 4 20:07:03 node2 corosync[162406]: [MAIN ] Completed service synchronization, ready to provide service.
Apr 4 20:07:03 node2 corosync[162406]: [TOTEM ] A new membership (192.168.10.1:44) was formed. Members joined: 1
Apr 4 20:07:03 node2 corosync[162406]: [QUORUM] This node is within the primary component and will provide service.
Apr 4 20:07:03 node2 corosync[162406]: [QUORUM] Members[2]: 1 2
Apr 4 20:07:03 node2 corosync[162406]: [MAIN ] Completed service synchronization, ready to provide service.
Apr 4 20:07:03 node2 corosync: Starting Corosync Cluster Engine (corosync): [ OK ]
Apr 4 20:07:04 node2 systemd: Started Corosync Cluster Engine.
Apr 4 20:07:04 node2 systemd: Starting Pacemaker High Availability Cluster Manager...
Apr 4 20:07:04 node2 systemd: Started Pacemaker High Availability Cluster Manager.
Apr 4 20:07:04 node2 pacemakerd[162412]: notice: crm_add_logfile: Additional logging available in /var/log/pacemaker.log
Apr 4 20:07:04 node2 pacemakerd: crm_ipc_connect: Could not establish pacemakerd connection: Connection refused (111)
Apr 4 20:07:04 node2 pacemakerd[162412]: notice: mcp_read_config: Configured corosync to accept connections from group 189: Library error (2)
Apr 4 20:07:04 node2 pacemakerd[162412]: notice: main: Starting Pacemaker 1.1.12 (Build: a14efad): generated-manpages agent-manpages ascii-docs publican-docs ncurses libqb-logging libqb-ipc upstart systemd nagios corosync-native atomic-attrd acls
Apr 4 20:07:04 node2 pacemakerd[162412]: notice: find_and_track_existing_processes: Tracking existing lrmd process (pid=7841)
Apr 4 20:07:04 node2 pacemakerd[162412]: notice: find_and_track_existing_processes: Tracking existing pengine process (pid=7843)
Apr 4 20:07:04 node2 pacemakerd[162412]: notice: cluster_connect_quorum: Quorum acquired
Apr 4 20:07:04 node2 pacemakerd[162412]: notice: crm_update_peer_state: pcmk_quorum_notification: Node web01.int01.moqom.com[1] - state is now member (was (null))
Apr 4 20:07:04 node2 pacemakerd[162412]: notice: crm_update_peer_state: pcmk_quorum_notification: Node node2.int01.moqom.com[2] - state is now member (was (null))
Apr 4 20:07:04 node2 cib[162413]: notice: crm_add_logfile: Additional logging available in /var/log/pacemaker.log
Apr 4 20:07:04 node2 stonith-ng[162414]: notice: crm_add_logfile: Additional logging available in /var/log/pacemaker.log
Apr 4 20:07:04 node2 stonith-ng[162414]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
Apr 4 20:07:04 node2 attrd[162415]: notice: crm_add_logfile: Additional logging available in /var/log/pacemaker.log
Apr 4 20:07:04 node2 attrd[162415]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
Apr 4 20:07:04 node2 crmd[162416]: notice: crm_add_logfile: Additional logging available in /var/log/pacemaker.log
Apr 4 20:07:04 node2 crmd[162416]: notice: main: CRM Git Version: a14efad
Apr 4 20:07:04 node2 corosync[162406]: [MAIN ] Denied connection attempt from 189:189
Apr 4 20:07:04 node2 corosync[162406]: [QB ] Invalid IPC credentials (162406-162415-2).
Apr 4 20:07:04 node2 attrd[162415]: error: cluster_connect_cpg: Could not connect to the Cluster Process Group API: 11
Apr 4 20:07:04 node2 attrd[162415]: error: main: Cluster connection failed
Apr 4 20:07:04 node2 attrd[162415]: notice: main: Cleaning up before exit
Apr 4 20:07:04 node2 kernel: attrd[162415]: segfault at 1b8 ip 00007fc88ffc2ff0 sp 00007fff5b74d338 error 4 in libqb.so.0.17.1[7fc88ffb4000+22000]
Apr 4 20:07:04 node2 pacemakerd[162412]: error: child_waitpid: Managed process 162415 (attrd) dumped core
Apr 4 20:07:04 node2 pacemakerd[162412]: error: pcmk_child_exit: The attrd process (162415) terminated with signal 11 (core=1)
Apr 4 20:07:04 node2 pacemakerd[162412]: notice: pcmk_process_exit: Respawning failed child process: attrd
Environment
- Red Hat Enterprise Linux (RHEL) 7 with the High Availability Add On
- No stonith devices configured in the cluster or the cluster property
stonith-enabled=false
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.