A node fails to join the cluster or start resources and the logs show "notice: mcp_read_config: Configured corosync to accept connections from group 189: Library error (2)" in a RHEL 7 High Availability cluster

Solution Unverified - Updated -

Issue

  • When we start pacemaker the nodes in pcs status are shown as "unclean" indefinitely.
  • pacemaker reports a Library Error when configuring corosync to accept connections from the hacluster user:
Apr  4 20:07:04 node2 pacemakerd[162412]: notice: mcp_read_config: Configured corosync to accept connections from group 189: Library error (2)
  • corosync seems to have crashed from a SIGABRT and then when systemd restarted it pacemaker daemons couldn't start and report a number of errors starting with "notice: mcp_read_config: Configured corosync to accept connections from group 189: Library error (2)". corosync repeatedly claims that it denied a connection due to invalid IPC credentials.
Apr  4 20:07:03 node2 stonith-ng[93181]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Apr  4 20:07:03 node2 attrd[93182]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Apr  4 20:07:03 node2 pacemakerd[93179]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Apr  4 20:07:03 node2 crmd[93183]: error: crmd_quorum_destroy: connection terminated
Apr  4 20:07:03 node2 pacemakerd[93179]: error: mcp_cpg_destroy: Connection destroyed
Apr  4 20:07:03 node2 stonith-ng[93181]: error: stonith_peer_cs_destroy: Corosync connection terminated
Apr  4 20:07:03 node2 attrd[93182]: crit: attrd_cpg_destroy: Lost connection to Corosync service!
Apr  4 20:07:03 node2 attrd[93182]: notice: main: Cleaning up before exit
Apr  4 20:07:03 node2 attrd[93182]: notice: crm_client_disconnect_all: Disconnecting client 0x1bd94a0, pid=93183...
Apr  4 20:07:03 node2 crmd[93183]: notice: crmd_exit: Forcing immediate exit: Link has been severed (67)
Apr  4 20:07:03 node2 lrmd[7841]: warning: qb_ipcs_event_sendv: new_event_notification (7841-93183-8): Bad file descriptor (9)
Apr  4 20:07:03 node2 lrmd[7841]: warning: send_client_notify: Notification of client crmd/0568b3bf-820e-48bb-afe1-6cf1bfd38ead failed
Apr  4 20:07:03 node2 cib[93180]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Apr  4 20:07:03 node2 cib[93180]: error: cib_cs_destroy: Corosync connection lost!  Exiting.
Apr  4 20:07:03 node2 lrmd[7841]: warning: send_client_notify: Notification of client crmd/0568b3bf-820e-48bb-afe1-6cf1bfd38ead failed
Apr  4 20:07:03 node2 systemd: corosync.service: main process exited, code=killed, status=6/ABRT
Apr  4 20:07:03 node2 systemd: pacemaker.service: main process exited, code=exited, status=107/n/a
Apr  4 20:07:03 node2 systemd: Unit pacemaker.service entered failed state.
Apr  4 20:07:03 node2 systemd: Unit corosync.service entered failed state.
Apr  4 20:07:03 node2 systemd: pacemaker.service holdoff time over, scheduling restart.
Apr  4 20:07:03 node2 systemd: Stopping Pacemaker High Availability Cluster Manager...
Apr  4 20:07:03 node2 systemd: Starting Corosync Cluster Engine...
Apr  4 20:07:03 node2 corosync[162405]: [MAIN  ] Corosync Cluster Engine ('2.3.4'): started and ready to provide service.
Apr  4 20:07:03 node2 corosync[162405]: [MAIN  ] Corosync built-in features: dbus systemd xmlconf snmp pie relro bindnow
Apr  4 20:07:03 node2 corosync[162406]: [TOTEM ] Initializing transport (UDP/IP Unicast).
Apr  4 20:07:03 node2 corosync[162406]: [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none
Apr  4 20:07:03 node2 corosync[162406]: [TOTEM ] The network interface [192.168.10.2] is now up.
Apr  4 20:07:03 node2 corosync[162406]: [SERV  ] Service engine loaded: corosync configuration map access [0]
Apr  4 20:07:03 node2 corosync[162406]: [QB    ] server name: cmap
Apr  4 20:07:03 node2 corosync[162406]: [SERV  ] Service engine loaded: corosync configuration service [1]
Apr  4 20:07:03 node2 corosync[162406]: [QB    ] server name: cfg
Apr  4 20:07:03 node2 corosync[162406]: [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Apr  4 20:07:03 node2 corosync[162406]: [QB    ] server name: cpg
Apr  4 20:07:03 node2 corosync[162406]: [SERV  ] Service engine loaded: corosync profile loading service [4]
Apr  4 20:07:03 node2 corosync[162406]: [QUORUM] Using quorum provider corosync_votequorum
Apr  4 20:07:03 node2 corosync[162406]: [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2
Apr  4 20:07:03 node2 corosync[162406]: [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
Apr  4 20:07:03 node2 corosync[162406]: [QB    ] server name: votequorum
Apr  4 20:07:03 node2 corosync[162406]: [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Apr  4 20:07:03 node2 corosync[162406]: [QB    ] server name: quorum
Apr  4 20:07:03 node2 corosync[162406]: [TOTEM ] adding new UDPU member {c}
Apr  4 20:07:03 node2 corosync[162406]: [TOTEM ] adding new UDPU member {192.168.10.2}
Apr  4 20:07:03 node2 corosync[162406]: [TOTEM ] A new membership (192.168.10.2:40) was formed. Members joined: 2
Apr  4 20:07:03 node2 corosync[162406]: [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2
Apr  4 20:07:03 node2 corosync[162406]: [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2
Apr  4 20:07:03 node2 corosync[162406]: [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2
Apr  4 20:07:03 node2 corosync[162406]: [QUORUM] Members[1]: 2
Apr  4 20:07:03 node2 corosync[162406]: [MAIN  ] Completed service synchronization, ready to provide service.
Apr  4 20:07:03 node2 corosync[162406]: [TOTEM ] A new membership (192.168.10.1:44) was formed. Members joined: 1
Apr  4 20:07:03 node2 corosync[162406]: [QUORUM] This node is within the primary component and will provide service.
Apr  4 20:07:03 node2 corosync[162406]: [QUORUM] Members[2]: 1 2
Apr  4 20:07:03 node2 corosync[162406]: [MAIN  ] Completed service synchronization, ready to provide service.
Apr  4 20:07:03 node2 corosync: Starting Corosync Cluster Engine (corosync): [  OK  ]
Apr  4 20:07:04 node2 systemd: Started Corosync Cluster Engine.
Apr  4 20:07:04 node2 systemd: Starting Pacemaker High Availability Cluster Manager...
Apr  4 20:07:04 node2 systemd: Started Pacemaker High Availability Cluster Manager.
Apr  4 20:07:04 node2 pacemakerd[162412]: notice: crm_add_logfile: Additional logging available in /var/log/pacemaker.log
Apr  4 20:07:04 node2 pacemakerd: crm_ipc_connect: Could not establish pacemakerd connection: Connection refused (111)
Apr  4 20:07:04 node2 pacemakerd[162412]: notice: mcp_read_config: Configured corosync to accept connections from group 189: Library error (2)
Apr  4 20:07:04 node2 pacemakerd[162412]: notice: main: Starting Pacemaker 1.1.12 (Build: a14efad):  generated-manpages agent-manpages ascii-docs publican-docs ncurses libqb-logging libqb-ipc upstart systemd nagios  corosync-native atomic-attrd acls
Apr  4 20:07:04 node2 pacemakerd[162412]: notice: find_and_track_existing_processes: Tracking existing lrmd process (pid=7841)
Apr  4 20:07:04 node2 pacemakerd[162412]: notice: find_and_track_existing_processes: Tracking existing pengine process (pid=7843)
Apr  4 20:07:04 node2 pacemakerd[162412]: notice: cluster_connect_quorum: Quorum acquired
Apr  4 20:07:04 node2 pacemakerd[162412]: notice: crm_update_peer_state: pcmk_quorum_notification: Node web01.int01.moqom.com[1] - state is now member (was (null))
Apr  4 20:07:04 node2 pacemakerd[162412]: notice: crm_update_peer_state: pcmk_quorum_notification: Node node2.int01.moqom.com[2] - state is now member (was (null))
Apr  4 20:07:04 node2 cib[162413]: notice: crm_add_logfile: Additional logging available in /var/log/pacemaker.log
Apr  4 20:07:04 node2 stonith-ng[162414]: notice: crm_add_logfile: Additional logging available in /var/log/pacemaker.log
Apr  4 20:07:04 node2 stonith-ng[162414]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
Apr  4 20:07:04 node2 attrd[162415]: notice: crm_add_logfile: Additional logging available in /var/log/pacemaker.log
Apr  4 20:07:04 node2 attrd[162415]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
Apr  4 20:07:04 node2 crmd[162416]: notice: crm_add_logfile: Additional logging available in /var/log/pacemaker.log
Apr  4 20:07:04 node2 crmd[162416]: notice: main: CRM Git Version: a14efad
Apr  4 20:07:04 node2 corosync[162406]: [MAIN  ] Denied connection attempt from 189:189
Apr  4 20:07:04 node2 corosync[162406]: [QB    ] Invalid IPC credentials (162406-162415-2).
Apr  4 20:07:04 node2 attrd[162415]: error: cluster_connect_cpg: Could not connect to the Cluster Process Group API: 11
Apr  4 20:07:04 node2 attrd[162415]: error: main: Cluster connection failed
Apr  4 20:07:04 node2 attrd[162415]: notice: main: Cleaning up before exit
Apr  4 20:07:04 node2 kernel: attrd[162415]: segfault at 1b8 ip 00007fc88ffc2ff0 sp 00007fff5b74d338 error 4 in libqb.so.0.17.1[7fc88ffb4000+22000]
Apr  4 20:07:04 node2 pacemakerd[162412]: error: child_waitpid: Managed process 162415 (attrd) dumped core
Apr  4 20:07:04 node2 pacemakerd[162412]: error: pcmk_child_exit: The attrd process (162415) terminated with signal 11 (core=1)
Apr  4 20:07:04 node2 pacemakerd[162412]: notice: pcmk_process_exit: Respawning failed child process: attrd

Environment

  • Red Hat Enterprise Linux (RHEL) 7 with the High Availability Add On
  • No stonith devices configured in the cluster or the cluster property stonith-enabled=false

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content