RHEL 7 High Availability cluster nodes frequently getting fenced and lrmd reporting "error: crm_abort: lrmd_ipc_dispatch: Triggered assert at main.c:123 : flags & crm_ipc_client_response" and segfaulting
Issue
- My 3 nodes are all rebooting in a loop and
lrmdseems to be constantly segfaulting. - The nodes in my cluster won't stop fencing each other and I see
lrmdreporting "Triggered assert at main.c:123 : flags & crm_ipc_client_response" and segfaulting
Jul 6 08:57:02 node1 lrmd[4164]: error: crm_abort: lrmd_ipc_dispatch: Triggered assert at main.c:123 : flags & crm_ipc_client_response
Jul 6 08:57:02 node1 lrmd[4164]: error: lrmd_ipc_dispatch: Invalid client request: 0x1219ce0
- I see constant repeating errors from
lrmdabout notifications failing andcrmdcrashing after "crit: lrm_connection_destroy: LRM Connection failed"
Jul 6 08:57:12 node1 crmd[33886]: crit: lrm_connection_destroy: LRM Connection failed
Jul 6 08:57:12 node1 crmd[33886]: warning: do_update_resource: Resource pcmk-node1 no longer exists in the lrmd
Jul 6 08:57:12 node1 lrmd[4164]: warning: qb_ipcs_event_sendv: new_event_notification (4164-33886-8): Bad file descriptor (9)
Jul 6 08:57:12 node1 lrmd[4164]: warning: send_client_notify: Notification of client crmd/8988b67f-ff65-4a39-a330-69efcbf12567 failed
Jul 6 08:57:12 node1 lrmd[4164]: warning: send_client_notify: Notification of client crmd/8988b67f-ff65-4a39-a330-69efcbf12567 failed
Jul 6 08:57:12 node1 crmd[33886]: notice: process_lrm_event: Operation pcmk-node1_stop_0: ok (node=pcmk-node1, call=2, rc=0, cib-update=0, confirmed=true)
Jul 6 08:57:12 node1 attrd[4166]: notice: attrd_peer_remove: Removing all pcmk-node1 attributes for pcmk-node1
Jul 6 08:57:12 node1 lrmd[4164]: warning: send_client_notify: Notification of client crmd/8988b67f-ff65-4a39-a330-69efcbf12567 failed
Jul 6 08:57:12 node1 crmd[33886]: error: do_log: FSA: Input I_ERROR from lrm_connection_destroy() received in state S_NOT_DC
Jul 6 08:57:12 node1 crmd[33886]: notice: do_state_transition: State transition S_NOT_DC -> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL origin=lrm_connection_destroy ]
Jul 6 08:57:12 node1 crmd[33886]: warning: do_recover: Fast-tracking shutdown in response to errors
Jul 6 08:57:12 node1 lrmd[4164]: warning: send_client_notify: Notification of client crmd/8988b67f-ff65-4a39-a330-69efcbf12567 failed
Jul 6 08:57:12 node1 crmd[33886]: error: do_log: FSA: Input I_TERMINATE from do_recover() received in state S_RECOVERY
Jul 6 08:57:12 node1 lrmd[4164]: warning: send_client_notify: Notification of client crmd/8988b67f-ff65-4a39-a330-69efcbf12567 failed
Jul 6 08:57:12 node1 attrd[4166]: notice: attrd_peer_remove: Removing all pcmk-slnec1ctl2 attributes for pcmk-slnec1ctl2
Jul 6 08:57:12 node1 lrmd[4164]: warning: send_client_notify: Notification of client crmd/8988b67f-ff65-4a39-a330-69efcbf12567 failed
Jul 6 08:57:12 node1 crmd[33886]: notice: do_lrm_control: Disconnected from the LRM
Jul 6 08:57:12 node1 lrmd[4164]: warning: send_client_notify: Notification of client crmd/8988b67f-ff65-4a39-a330-69efcbf12567 failed
Jul 6 08:57:12 node1 crmd[33886]: notice: terminate_cs_connection: Disconnecting from Corosync
Jul 6 08:57:12 node1 lrmd[4164]: warning: send_client_notify: Notification of client crmd/8988b67f-ff65-4a39-a330-69efcbf12567 failed
Jul 6 08:57:12 node1 lrmd[4164]: warning: send_client_notify: Notification of client crmd/8988b67f-ff65-4a39-a330-69efcbf12567 failed
Jul 6 08:57:12 node1 lrmd[4164]: warning: send_client_notify: Notification of client crmd/8988b67f-ff65-4a39-a330-69efcbf12567 failed
Jul 6 08:57:12 node1 lrmd[4164]: warning: send_client_notify: Notification of client crmd/8988b67f-ff65-4a39-a330-69efcbf12567 failed
Jul 6 08:57:12 node1 crmd[33886]: error: crmd_fast_exit: Could not recover from internal error
Jul 6 08:57:12 node1 lrmd[4164]: warning: send_client_notify: Notification of client crmd/8988b67f-ff65-4a39-a330-69efcbf12567 failed
Jul 6 08:57:12 node1 lrmd[4164]: warning: send_client_notify: Notification of client crmd/8988b67f-ff65-4a39-a330-69efcbf12567 failed
Jul 6 08:57:12 node1 lrmd[4164]: warning: send_client_notify: Notification of client crmd/8988b67f-ff65-4a39-a330-69efcbf12567 failed
Jul 6 08:57:12 node1 lrmd[4164]: warning: send_client_notify: Notification of client crmd/8988b67f-ff65-4a39-a330-69efcbf12567 failed
Jul 6 08:57:12 node1 lrmd[4164]: warning: send_client_notify: Notification of client crmd/8988b67f-ff65-4a39-a330-69efcbf12567 failed
Jul 6 08:57:12 node1 pacemakerd[4050]: error: pcmk_child_exit: The crmd process (33886) exited: Generic Pacemaker error (201)
Jul 6 08:57:12 node1 pacemakerd[4050]: notice: pcmk_process_exit: Respawning failed child process: crmd
Jul 6 08:57:12 node1 crmd[36596]: notice: crm_add_logfile: Additional logging available in /var/log/pacemaker.log
Environment
- Red Hat Enterprise Linux (RHEL) 7 with the High Availability Add On
- One or more
stonithdevices in the CIB has a name (ID) matching the name of one of the cluster nodes.- The clusternode name comes either from
corosyncas specified in/etc/corosync/corosync.conf, or if the nodes are specified by IP address in that file, then the hostname (uname -noutput) of the node is used as the name.
- The clusternode name comes either from
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.