RHEL8: systemd 崩溃,带有 Segmentation Fault 错误,导致登录速度非常慢,并最终导致系统无法使用
Issue
同时出现以下的所有症状:
-
使用
ssh
或控制台登录到服务器需要 25 秒才能完成,日志中可以看到以下信息[...] pam_systemd(...): Failed to create session: Connection timed out
-
Cron job 需要 25 秒才能启动,日志中可以看到以下信息
[...] pam_systemd(crond:session): Failed to create session: Connection timed out
-
在 OCP 集群版本 4.8.z 中,在节点上出现以下错误:
[...] Failed to list units: Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)
-
systemd
崩溃,带有 SEGV 信号,日志中可以看到以下信息[...] systemd-coredump[SOMEPID]: Due to PID 1 having crashed coredump collection will now be turned off. [...] systemd[1]: Caught <SEGV>, dumped core as pid SOMEPID. [...] systemd[1]: Freezing execution.
-
systemd
尚未崩溃(没有上述信息),但可以看到以下内核堆栈# cat /proc/1/stack [<0>] futex_wait_queue_me+0xb6/0x110 [<0>] futex_wait+0x11f/0x210 [<0>] do_futex+0x317/0x4b0 [<0>] __x64_sys_futex+0x145/0x1f0 [<0>] do_syscall_64+0x5b/0x1a0 [<0>] entry_SYSCALL_64_after_hwframe+0x65/0xca
-
systemd
的 coredump 显示以下回溯追踪信息之一 (并不是一个完整的列表),它们都与内存分配问题相关(地址会有所不同)#0 0x00007fc726d9f67b in kill () at ../sysdeps/unix/syscall-template.S:78 #1 0x000055efd5314f7a in crash (sig=6) at ../src/core/main.c:194 #2 <signal handler called> #3 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #4 0x00007fc726d89db5 in __GI_abort () at abort.c:79 #5 0x00007fc726de24e7 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7fc726ef1a0e "%s\n") at ../sysdeps/posix/libc_fatal.c:181 #6 0x00007fc726de95ec in malloc_printerr (str=str@entry=0x7fc726ef3a88 "malloc(): smallbin double linked list corrupted") at malloc.c:5374 [...]
#0 0x00007f2cf051667b in kill () at ../sysdeps/unix/syscall-template.S:78 #1 0x000055e7679b6f7a in crash (sig=11) at ../src/core/main.c:194 #2 <signal handler called> #3 tcache_get (tc_idx=1) at malloc.c:2951 #4 __GI___libc_malloc (bytes=bytes@entry=34) at malloc.c:3058 #5 0x00007f2cf056880e in __GI___strdup (...) at strdup.c:42 [...]
#0 0x00007f2f50ac467b in kill () at ../sysdeps/unix/syscall-template.S:78 #1 0x00005558a6d7bf7a in crash (sig=11) at ../src/core/main.c:194 #2 <signal handler called> #3 0x00007f2f50b11818 in _int_malloc (av=av@entry=0x7f2f50e4cbc0 <main_arena>, bytes=bytes@entry=14) at malloc.c:3683 #4 0x00007f2f50b12c72 in __GI___libc_malloc (bytes=bytes@entry=14) at malloc.c:3073 #5 0x00007f2f50b1680e in __GI___strdup (...) at strdup.c:42 [...]
#0 0x00007f2f7ad0f67b in kill () at ../sysdeps/unix/syscall-template.S:78 #1 0x00005571223aef7a in crash (sig=11) at ../src/core/main.c:194 #2 <signal handler called> #3 _int_malloc (av=av@entry=0x7f2f7b097bc0 <main_arena>, bytes=bytes@entry=28) at malloc.c:3655 #4 0x00007f2f7ad5dc72 in __GI___libc_malloc (bytes=bytes@entry=28) at malloc.c:3073 #5 0x00007f2f7c4d4261 in malloc_multiply (need=28, size=1) at ../src/basic/alloc-util.h:63 [...]
#0 0x00007f7e5221d67b in kill () at ../sysdeps/unix/syscall-template.S:78 #1 0x0000559c07060f7a in crash (sig=11) at ../src/core/main.c:194 #2 <signal handler called> #3 _int_malloc (av=av@entry=0x7f7e525a5bc0 <main_arena>, bytes=bytes@entry=24) at malloc.c:3655 #4 0x00007f7e5226c8d6 in __libc_calloc (n=n@entry=1, elem_size=elem_size@entry=24) at malloc.c:3444 [...]
Environment
- Red Hat Enterprise Linux 8.4
- systemd-239-45.el8_4.8 及更早版本
- Red Hat Enterprise Linux 8.5
- systemd-239-51.el8_5.1 及更早版本
- Red Hat Openshift Container Platform 中的 Red Hat CoreOS
- 带有 systemd-239-45.el8_4.8 的 4.8.35
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.