lrmd segfaults and the logs report "crm_ipc_read: Connection to lrmd failed" in a RHEL 6 High Availability cluster with pacemaker
Issue
- My services experienced a failover in my cluster after the logs indicate
lrmddumped core lrmdis segfaulting after reporting a long string of garbage characters in the log
Oct 20 04:00:20 [22743] node1 lrmd: info: log_finished: finished - rsc:H\$ÐHl$ØHûLd$àLl$èHÕLt$ðL|$øHìxH
ÿI÷AÌEÆMÍ action:H\$èHl$ðHûLd$øHìH
ÿ± call_id:-1387311296 pid:51 exit-code:0 exec-time:0ms queue-time:-23040ms
Oct 20 04:00:20 [22743] node1 lrmd: error: crm_xml_err: XML Error: string is not in UTF-8
Oct 20 04:00:20 [22744] node1 crmd: error: crm_ipc_read: Connection to lrmd failed
Oct 20 04:00:20 [6855] node1 pacemakerd: error: child_waitpid: Managed process 22743 (lrmd) dumped core
Oct 20 04:00:20 [6855] node1 pacemakerd: notice: pcmk_child_exit: Child process lrmd terminated with signal 11 (pid=22743, core=1)
Oct 20 04:00:20 [22744] node1 crmd: error: mainloop_gio_callback: Connection to lrmd[0x184e540] closed (I/O condition=17)
Oct 20 04:00:20 [22744] node1 crmd: info: lrmd_ipc_connection_destroy: IPC connection destroyed
Oct 20 04:00:20 [22744] node1 crmd: crit: lrm_connection_destroy: LRM Connection failed
Oct 20 04:00:20 [22744] node1 crmd: error: do_log: FSA: Input I_ERROR from lrm_connection_destroy() received in state S_NOT_DC
Oct 20 04:00:20 [22744] node1 crmd: notice: do_state_transition: State transition S_NOT_DC -> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL origin=lrm_connection_destroy ]
Oct 20 04:00:20 [22744] node1 crmd: warning: do_recover: Fast-tracking shutdown in response to errors
Oct 20 04:00:20 [22744] node1 crmd: error: do_log: FSA: Input I_TERMINATE from do_recover() received in state S_RECOVERY
Oct 20 04:00:20 [22744] node1 crmd: info: do_state_transition: State transition S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE cause=C_FSA_INTERNAL origin=do_recover ]
Oct 20 04:00:20 [22744] node1 crmd: info: do_shutdown: Disconnecting STONITH...
Oct 20 04:00:20 [6855] node1 pacemakerd: notice: pcmk_process_exit: Respawning failed child process: lrmd
lrmdis crashing repeatedly with its backtrace showing it failing instonith_dispatch_internal- I'm fraquently seeing in the logs "connection to lrmd failed"
Nov 03 06:00:13 [4805] node1 crmd: error: crm_ipc_read: Connection to lrmd failed
Nov 03 06:00:13 [4805] node1 crmd: error: mainloop_gio_callback: Connection to lrmd[0x965c40] closed (I/O condition=17)
Nov 03 06:00:13 [4786] node1 pacemakerd: error: child_waitpid: Managed process 4802 (lrmd) dumped core
Nov 03 06:00:13 [4805] node1 crmd: info: lrmd_ipc_connection_destroy: IPC connection destroyed
Nov 03 06:00:13 [4805] node1 crmd: crit: lrm_connection_destroy: LRM Connection failed
Nov 03 06:00:13 [4786] node1 pacemakerd: notice: pcmk_child_exit: Child process lrmd terminated with signal 11 (pid=4802, core=1)
Nov 03 06:00:13 [4805] node1 crmd: error: do_log: FSA: Input I_ERROR from lrm_connection_destroy() received in state S_TRANSITION_ENGINE
Nov 03 06:00:13 [4805] node1 crmd: warning: do_state_transition: State transition S_TRANSITION_ENGINE -> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL origin=lrm_connection_destroy ]
Nov 03 06:00:13 [4805] node1 crmd: warning: do_recover: Fast-tracking shutdown in response to errors
Nov 03 06:00:13 [4786] node1 pacemakerd: notice: pcmk_process_exit: Respawning failed child process: lrmd
Environment
- Red Hat Enterprise Linux (RHEL) 6 with the High Availability Add On
pacemaker
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.