RHEL8/9 debug kernel do not generate a vmcore file through kexec-tools

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 8 kernel-debug
  • Red Hat Enterprise Linux 9 kernel-debug

Issue

  • Debug kernel (aka kernel-debug) doesn’t create a vmcore file when it crashes in RHEL 8/9.

Resolution

  • For RHEL8, the issue has been discussed in BZ#2006000 and resolved with the errata RHBA-2022:7705 to this kexec-tools-2.0.24-5.el8 or higher version.
  • For RHEL9, the issue has been discussed in BZ#2076425 and resolved with the errata RHBA-2022:8301 to this kexec-tools-2.0.24-5.el9 or higher version.

  • Please note, since KASAN is the biggest memory eater in kdump with debug kernel and also other debug features also consume much memory, kdump still fails even after upgrading to the errata version of kexec-tools.

  • If sysrq-c is run with crashkernel=auto being set in the cmdline on kernel-debug, then kdump is stuck with a call trace just like this. A vmcore file is not captured.
# cat /proc/cmdline 
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-477.21.1.el8_8.x86_64+debug root=/dev/mapper/rhel-root ro crashkernel=auto resume=/dev/mapper/rhel-swap rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap console=tty0 console=ttyS0,115200 page_poison=1 vsyscall=none slub_debug=P fips=1 boot=UUID=653ce02e-b3fd-49d3-9990-f004ae370585


[  361.598228] sysrq: SysRq : Trigger a crash
[  361.601750] Kernel panic - not syncing: sysrq triggered crash
[  361.601750] 
[  361.607066] CPU: 1 PID: 2672 Comm: bash Kdump: loaded Not tainted 4.18.0-477.21.1.el8_8.x86_64+debug #1
[  361.612904] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
[  361.619168] Call Trace:
[  361.621763]  dump_stack+0x5c/0x80
[  361.624454]  panic+0x1dd/0x407
[  361.627052]  ? __warn_printk+0xdf/0xdf
[  361.629804]  ? printk+0x9f/0xc9
[  361.632474]  ? lock_downgrade+0x710/0x710
[  361.635292]  sysrq_handle_crash+0x1b/0x20
[  361.638045]  __handle_sysrq.cold.12+0x183/0x521
[  361.640893]  write_sysrq_trigger+0x4c/0x50
[  361.643646]  proc_reg_write+0x187/0x220
[  361.646278]  ? proc_reg_read+0x220/0x220
[  361.648888]  ? rcu_read_lock_held+0xc0/0xc0
[  361.651574]  vfs_write+0x157/0x460
[  361.654012]  ksys_write+0xb8/0x170
[  361.656431]  ? __ia32_sys_read+0xb0/0xb0
[  361.658933]  ? lockdep_hardirqs_on_prepare+0x298/0x3f0
[  361.661848]  ? do_syscall_64+0x22/0x450
[  361.664273]  do_syscall_64+0xa5/0x450
[  361.666632]  entry_SYSCALL_64_after_hwframe+0x66/0xdb
[  361.669237] RIP: 0033:0x7fb3ac8b55a8
[  361.671544] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 f5 3f 2a 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
[  361.680029] RSP: 002b:00007ffdefd1a168 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  361.683676] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fb3ac8b55a8
[  361.686728] RDX: 0000000000000002 RSI: 00005575c158d310 RDI: 0000000000000001
[  361.689747] RBP: 00005575c158d310 R08: 000000000000000a R09: 00007fb3ac915800
[  361.692823] R10: 000000000000000a R11: 0000000000000246 R12: 00007fb3acb556e0
[  361.695840] R13: 0000000000000002 R14: 00007fb3acb50860 R15: 0000000000000002
  • In order for kdump to work fine with kernel-debug, please try crashkernel=768M in the cmdline as the message from kdumpctl suggests:
# kdumpctl restart
kdump: kexec: unloaded kdump kernel
kdump: Stopping kdump: [OK]
kdump: Trying to use 4.18.0-477.21.1.el8_8.x86_64.
kdump: Fallback to using debug kernel
kdump: Using debug kernel, you may need to set a larger crashkernel than the default value. <<-------
kdump: kexec: loaded kdump kernel
kdump: Starting kdump: [OK]

Root Cause

  • The failure on debug kernel is a known issue and expected because debug kernel will cost much memory, e.g KASAN initialization.

  • KASAN is only the biggest memory eater in kdump with debug kernel, other debug features also consume much memory.

Diagnostic Steps

Step:1.
With the next kernel command line setting with the old package kexec-tools-2.0.20-46.el8_4.1.x86_64 when we crash the server with the debug kernel, it gets stuck and vmcore is not generated.

# cat /proc/cmdline 
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-305.el8.x86_64+debug root=/dev/mapper/rhel-root ro resume=/dev/mapper/rhel-swap rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap crashkernel=auto systemd.unified_cgroup_hierarchy=1 console=ttyS0,115200 console=tty0 log_buf_len=10M kasan_multi_shot initcall_debug loglevel=8 earlyprintk=ttyS0,115200 print_fatal_signals=1 memblock=debug

[root@localhost ~]# rpm -qa | grep -i kexec
kexec-tools-2.0.20-46.el8_4.1.x86_64
  • Server got stuck with the below logs.
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=0 from=0x1000000 max_addr=0x0 kasan_populate_pgd+0x6a0/0x7ed
[    0.000000] memblock_reserve: [0x0000000000015000-0x0000000000015fff] memblock_alloc_internal+0x143/0x22a
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=0 from=0x1000000 max_addr=0x0 kasan_populate_pgd+0x6a0/0x7ed
[    0.000000] memblock_reserve: [0x0000000000014000-0x0000000000014fff] memblock_alloc_internal+0x143/0x22a
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=0 from=0x1000000 max_addr=0x0 kasan_populate_pgd+0x6a0/0x7ed
[    0.000000] memblock_reserve: [0x0000000000013000-0x0000000000013fff] memblock_alloc_internal+0x143/0x22a
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=0 from=0x1000000 max_addr=0x0 kasan_populate_pgd+0x6a0/0x7ed
[    0.000000] memblock_reserve: [0x0000000000012000-0x0000000000012fff] memblock_alloc_internal+0x143/0x22a
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=0 from=0x1000000 max_addr=0x0 kasan_populate_pgd+0x6a0/0x7ed
[    0.000000] memblock_reserve: [0x0000000000011000-0x0000000000011fff] memblock_alloc_internal+0x143/0x22a
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=0 from=0x1000000 max_addr=0x0 kasan_populate_pgd+0x6a0/0x7ed
[    0.000000] memblock_reserve: [0x0000000000010000-0x0000000000010fff] memblock_alloc_internal+0x143/0x22a
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=0 from=0x1000000 max_addr=0x0 kasan_populate_pgd+0x6a0/0x7ed
[    0.000000] Kernel panic - not syncing: memblock_alloc_try_nid: Failed to allocate 4096 bytes align=0x1000 nid=0 from=0x1000000 max_addr=0x0
[    0.000000] 
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.18.0-305.el8.x86_64+debug #1
[    0.000000] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-3.module+el8.7.0+16689+53d59bc2 04/01/2014
[    0.000000] Call Trace:
[    0.000000]  ? dump_stack+0x8e/0xd0
[    0.000000]  ? panic+0x1cc/0x3f3
[    0.000000]  ? __warn_printk+0xdb/0xdb
[    0.000000]  ? memblock_alloc_internal+0x163/0x22a
[    0.000000]  ? memblock_alloc_range+0x11/0x11
[    0.000000]  ? kasan_report+0x20/0x50
[    0.000000]  ? memblock_alloc_try_nid+0x87/0x9d
[    0.000000]  ? kasan_populate_pgd+0x6a0/0x7ed
[    0.000000]  ? kasan_populate_shadow+0xe4/0xf9
[    0.000000]  ? kasan_init+0x47e/0x644
[    0.000000]  ? setup_arch+0x159b/0x1a19
[    0.000000]  ? reserve_standard_io_resources+0x49/0x49
[    0.000000]  ? vprintk_emit+0x164/0x490
[    0.000000]  ? printk+0x9f/0xc5
[    0.000000]  ? kmsg_dump_rewind_nolock+0xd9/0xd9
[    0.000000]  ? cgroup_init_early+0x2ce/0x474
[    0.000000]  ? start_kernel+0xc6/0x7ba
[    0.000000]  ? thread_stack_cache_init+0x6/0x6
[    0.000000]  ? x86_family+0x5/0x20
[    0.000000]  ? load_ucode_bsp+0x49/0x22b
[    0.000000]  ? secondary_startup_64_no_verify+0xc2/0xcb
[    0.000000] ---[ end Kernel panic - not syncing: memblock_alloc_try_nid: Failed to allocate 4096 bytes align=0x1000 nid=0 from=0x1000000 max_addr=0x0
[    0.000000]  ]---
PANIC: early exception 0x0d IP 10:ffffffff8120ddf6 error 763 cr2 0xffff88807d001000

Step:2.
After upgrading the Kexec version (kexec-tools-2.0.24-5.el8) the vmcore was successfully generated even though the memblock_alloc_try_nid is recorded in the console logs.


# cat /proc/cmdline BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-305.el8.x86 root=/dev/mapper/rhel-root ro resume=/dev/mapper/rhel-swap rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap crashkernel=auto systemd.unified_cgroup_hierarchy=1 console=ttyS0,115200 console=tty0 log_buf_len=10M kasan_multi_shot initcall_debug loglevel=8 earlyprintk=ttyS0,115200 print_fatal_signals=1 memblock=debug [user1@localhost ~]$ rpm -qa | grep -i kexec kexec-tools-2.0.24-6.el8.x86_64
  • Vmcore was generated
[root@localhost ~]# 
[root@localhost ~]# tree /var/crash/
/var/crash/
└── 127.0.0.1-2023-03-28-20:33:58
    ├── kexec-dmesg.log
    ├── vmcore                  <<<------
    └── vmcore-dmesg.txt

  • Logs from serial console.
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x1000000 max_addr=0x0 early_alloc.constprop.21+0x44/0x9a
[    0.000000] memblock_reserve: [0x000000007efa3000-0x000000007efa3fff] memblock_alloc_internal+0x143/0x22a
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=0 from=0x1000000 max_addr=0x0 kasan_populate_pgd+0x3ad/0x7ed
[    0.000000] memblock_reserve: [0x000000007efa2000-0x000000007efa2fff] memblock_alloc_internal+0x143/0x22a
[    0.000000] memblock_alloc_try_nid_nopanic: 2097152 bytes align=0x200000 nid=0 from=0x1000000 max_addr=0x0 kasan_populate_pgd+0x525/0x7ed
[    0.000000] memblock_reserve: [0x000000007ec00000-0x000000007edfffff] memblock_alloc_internal+0x143/0x22a

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments