lrmd segfaults and the logs report "crm_ipc_read: Connection to lrmd failed" in a RHEL 6 High Availability cluster with pacemaker
Issue
- My services experienced a failover in my cluster after the logs indicate
lrmddumped core lrmdis segfaulting after reporting a long string of garbage characters in the log
Oct 20 04:00:20 [22743] node1 lrmd: info: log_finished: finished - rsc:H\$ÐHl$ØHûLd$àLl$èHÕLt$ðL|$øHìxH
ÿI÷AÌEÆMÍ action:H\$èHl$ðHûLd$øHìH
ÿ± call_id:-1387311296 pid:51 exit-code:0 exec-time:0ms queue-time:-23040ms
Oct 20 04:00:20 [22743] node1 lrmd: error: crm_xml_err: XML Error: string is not in UTF-8
Oct 20 04:00:20 [22744] node1 crmd: error: crm_ipc_read: Connection to lrmd failed
Oct 20 04:00:20 [6855] node1 pacemakerd: error: child_waitpid: Managed process 22743 (lrmd) dumped core
Oct 20 04:00:20 [6855] node1 pacemakerd: notice: pcmk_child_exit: Child process lrmd terminated with signal 11 (pid=22743, core=1)
Oct 20 04:00:20 [22744] node1 crmd: error: mainloop_gio_callback: Connection to lrmd[0x184e540] closed (I/O condition=17)
Oct 20 04:00:20 [22744] node1 crmd: info: lrmd_ipc_connection_destroy: IPC connection destroyed
Oct 20 04:00:20 [22744] node1 crmd: crit: lrm_connection_destroy: LRM Connection failed
Oct 20 04:00:20 [22744] node1 crmd: error: do_log: FSA: Input I_ERROR from lrm_connection_destroy() received in state S_NOT_DC
Oct 20 04:00:20 [22744] node1 crmd: notice: do_state_transition: State transition S_NOT_DC -> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL origin=lrm_connection_destroy ]
Oct 20 04:00:20 [22744] node1 crmd: warning: do_recover: Fast-tracking shutdown in response to errors
Oct 20 04:00:20 [22744] node1 crmd: error: do_log: FSA: Input I_TERMINATE from do_recover() received in state S_RECOVERY
Oct 20 04:00:20 [22744] node1 crmd: info: do_state_transition: State transition S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE cause=C_FSA_INTERNAL origin=do_recover ]
Oct 20 04:00:20 [22744] node1 crmd: info: do_shutdown: Disconnecting STONITH...
Oct 20 04:00:20 [6855] node1 pacemakerd: notice: pcmk_process_exit: Respawning failed child process: lrmd
lrmdis crashing repeatedly with its backtrace showing it failing instonith_dispatch_internal- I'm fraquently seeing in the logs "connection to lrmd failed"
Nov 03 06:00:13 [4805] node1 crmd: error: crm_ipc_read: Connection to lrmd failed
Nov 03 06:00:13 [4805] node1 crmd: error: mainloop_gio_callback: Connection to lrmd[0x965c40] closed (I/O condition=17)
Nov 03 06:00:13 [4786] node1 pacemakerd: error: child_waitpid: Managed process 4802 (lrmd) dumped core
Nov 03 06:00:13 [4805] node1 crmd: info: lrmd_ipc_connection_destroy: IPC connection destroyed
Nov 03 06:00:13 [4805] node1 crmd: crit: lrm_connection_destroy: LRM Connection failed
Nov 03 06:00:13 [4786] node1 pacemakerd: notice: pcmk_child_exit: Child process lrmd terminated with signal 11 (pid=4802, core=1)
Nov 03 06:00:13 [4805] node1 crmd: error: do_log: FSA: Input I_ERROR from lrm_connection_destroy() received in state S_TRANSITION_ENGINE
Nov 03 06:00:13 [4805] node1 crmd: warning: do_state_transition: State transition S_TRANSITION_ENGINE -> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL origin=lrm_connection_destroy ]
Nov 03 06:00:13 [4805] node1 crmd: warning: do_recover: Fast-tracking shutdown in response to errors
Nov 03 06:00:13 [4786] node1 pacemakerd: notice: pcmk_process_exit: Respawning failed child process: lrmd
Environment
- Red Hat Enterprise Linux (RHEL) 6 with the High Availability Add On
pacemaker
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
