[RHEL7] kdump service is reloaded many times during boot phase on a large memory system
Issue
- kdump service is reloaded many times during boot phase on a large system.
- The server is getting slow to repond or even stuck on boot where systemd-udevd reloads kdump service so many times.
- As a result of these too many / frequent reload attempts of kdump service on boot, kernfs_mutex, that should be taken upon opening files on sysfs, is severely contended and udevd tends to be stuck in going through the /sys/devices/... entries.
-- Reboot --
Mar 15 18:06:30 localhost.localdomain systemd-journal[919]: Runtime journal is using 8.0M (max allowed 4.0G, trying to leave 4.0G free of 187.9G available → current limit 4.0G).
Mar 15 18:06:30 localhost.localdomain kernel: microcode: microcode updated early to revision 0x2006a08, date = 2020-06-16
Mar 15 18:06:30 localhost.localdomain kernel: Initializing cgroup subsys cpuset
Mar 15 18:06:30 localhost.localdomain kernel: Initializing cgroup subsys cpu
Mar 15 18:06:30 localhost.localdomain kernel: Initializing cgroup subsys cpuacct
Mar 15 18:06:30 localhost.localdomain kernel: Linux version 3.10.0-1160.11.1.rt56.1145.bz1926043.el7.x86_64 (mockbuild@x86-040.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) #1 SMP PREEMPT RT Mon Feb 22 16:24:06 UTC 2021
Mar 15 18:06:30 localhost.localdomain kernel: Command line: BOOT_IMAGE=/vmlinuz-3.10.0-1160.11.1.rt56.1145.bz1926043.el7.x86_64 root=/dev/mapper/rootvg-root ro audit=1 crashkernel=768M spectre_v2=retpoline rd.lvm.lv=rootvg/root selinux=0 ipv6.disable=0 console=ttyS1,115200 cgroup.memory=nokmem raid=noautodetect no_timer_check clock=tsc clocksource=tsc tsc=reliable rcu_nocbs=1-19,21-39,41-59,61-79 rcu_nocb_poll=1 nohz=on nohz_full=1-19,21-39,41-59,61-79 isolcpus=1-19,21-39,41-59,61-79 irqaffinity=0,20,40,60 enforcing=0 noswap default_hugepagesz=1G hugepagesz=1G hugepages=263 mce=off nmi_watchdog=1 fsck.mode=force fsck.repair=yes skew_tick=1 softlockup_panic=0 idle=poll nosoftlockup intel_pstate=disable intel_idle.max_cstate=1 iommu=pt intel_iommu=on pcie_aspm.policy=performance crash_kexec_post_notifiers systemd.log_level=debug systemd.log_target=kmsg log_buf_len=15M skew_tick=1 isolcpus=1-19,21-39,41-59,61-79 intel_pstate=disable nosoftlockup nohz=on nohz_full=1-19,21-39,41-59,61-79 rcu_nocbs=1-19,21-39,41-59,61-79
...
Mar 15 18:06:52 localhost.localdomain systemd-udevd[1690]: RUN '/bin/sh -c '/usr/bin/systemctl is-active kdump.service || exit 0; /usr/bin/systemd-run --quiet /usr/bin/kdumpctl reload'' /usr/lib/udev/rules.d/98-kexec.rules:14
...
Mar 15 18:06:52 localhost.localdomain systemd-udevd[1692]: RUN '/bin/sh -c '/usr/bin/systemctl is-active kdump.service || exit 0; /usr/bin/systemd-run --quiet /usr/bin/kdumpctl reload'' /usr/lib/udev/rules.d/98-kexec.rules:14
...
Mar 15 18:06:52 localhost.localdomain systemd-udevd[1691]: RUN '/bin/sh -c '/usr/bin/systemctl is-active kdump.service || exit 0; /usr/bin/systemd-run --quiet /usr/bin/kdumpctl reload'' /usr/lib/udev/rules.d/98-kexec.rules:14
...
Mar 15 18:06:52 localhost.localdomain systemd-udevd[1689]: RUN '/bin/sh -c '/usr/bin/systemctl is-active kdump.service || exit 0; /usr/bin/systemd-run --quiet /usr/bin/kdumpctl reload'' /usr/lib/udev/rules.d/98-kexec.rules:14
...
- systemd-udevd runs
kdumpctl reload
688 times on boot in this case:
$ cat ./sos_commands/logs/journalctl_--no-pager | awk '/Mar 15 18:/&&/kdumpctl reload/' | wc -l
688
Environment
- Red Hat Enterprise Linux 7.3 and newer (non-rt kernel)
- Red Hat Enterprise Linux 7.3 Realtime and newer (kernel-rt)
- kexec-tools older than kexec-tools-2.0.15-51.el7_9.3
- Large systems with large RAM
- A system with 376GB RAM in one case
- A system with 2TB RAM in another case
- A system with only a few gigabytes of RAM sometimes
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.