Crashkernel Reservation Size in the Presence of Third Party Modules

Updated -

Red Hat Lightspeed can detect this issue

Proactively detect and remediate issues impacting your systems.
View matching systems and remediation

Environment

  • Red Hat Enterprise Linux 6
  • Red Hat Enterprise Linux 7
  • Red Hat Enterprise Linux 8

Issue

Since the size of the crashkernel reservation is determined on a sliding scale based on the amount of memory on the server, the use of crashkernel=auto may not accommodate the memory required to load third party modules. The use of this setting in the presence of third party modules may result in an undersized crashkernel reservation, leading to vmcore dumps failing, delaying both issue resolution and root cause analysis of unexpected reboots, crashes, or hangs.

Resolution

While crashkernel=auto is appropriate for most environments, its algorithm decides the amount of memory to reserve for the crashkernel based on system architecture and total RAM size. As previously stated, this does not account for third party modules that may be installed and loaded on a given system and the memory they require to load and successfully boot the crashkernel.

Since memory requirements can vary from module to module and server to server depending on various factors such as workload, environment, module version, hardware present, and so on, the presence of third party modules on a server may necessitate altering the kdump configuration in order to successfully capture a vmcore. Given the wide range and scope of possible third party modules and their various sizes, there are two best practice recommendations.

For third party modules that are installed and loaded and are required for kdump to successfully dump a vmcore, the crashkernel reservation size should be increased and manually set at the kernel command line. This may include but is not limited to third party networking drivers when dumping remotely via NFS or SSH or third party storage drivers when dumping to a device that utilizes that driver.

For third party modules that are installed and loaded but are not required for kdump to successfully dump a vmcore, the recommended best practice is to blacklist the unnecessary driver in the kdump configuration.

Diagnostics steps

  1. kdump Deployment

    • Ensure the kexec-tools package that provides kdump is installed and that the kdump service is active:

      # rpm -qa | grep kexec
      kexec-tools-2.0.20-14.el8.x86_64
      
    • For Red Hat Enterprise Linux 7 or later:

      # systemctl status kdump
      ● kdump.service - Crash recovery kernel arming
         Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
         Active: active (exited) since Fri 2021-01-01 02:21:00 EST; 2s ago
      [...]
      
    • For Red Hat Enterprise Linux 6:

      # chkconfig | grep kdump
      kdump           0:off   1:off   2:on    3:on    4:on    5:on    6:off
      
    • If the kexec-tools package is absent or the kdump service is inactive, please reference the following article to install, enable, start, and configure kdump:

  2. Find Third Party Modules

    • If you are unsure if your system has third party modules installed and loaded, you can check this in the /proc/modules file. Third party modules can be identified by the presence of a kernel taint flag in the /proc/modules output. Since taint flags will always be parenthesized in the /proc/modules output, try grepping for an escaped parenthesis:

      # grep \( /proc/modules
      hello 16384 0 - Live 0xffffffffc050b000 (OE)
      
    • The output above shows an example third party module installed and loaded named hello. The taint flag (OE) indicates the module is a third party module. For more information on kernel taint flags, please see the following article:

  3. Increase crashkernel Reservation for Modules Necessary for kdump to Dump a vmcore

    • Once you've identified any third party modules loaded on your server and necessary to dump a vmcore successfully, you need to increase the crashkernel reservation. The module size reported in commands such as lsmod or the /proc/modules file can be used only as starting points for the size by which to increase the crashkernel reservation. Note that the module size indicated in commands such as lsmod or the /proc/modules file is only the size necessary to load and initialize the module. The actual memory usage from a given module's execution can be much higher, and can potentially vary by environment including the hardware used. For example the memory usage for a storage driver that scans 4 LUNs will be lower than that of the same driver that scans 400 LUNs.

    • Note that the /proc/modules output does not provide headers, so you must remember that the second space-separated field of each line is the module's size. The lsmod command does use headers, but lsmod does not distinguish between third party modules and Red Hat supported modules:

      # lsmod | head
      Module                  Size  Used by
      hello                  16384  0
      intel_rapl_msr         16384  0
      intel_rapl_common      24576  1 intel_rapl_msr
      isst_if_common         16384  0
      nfit                   65536  0
      libnvdimm             192512  1 nfit
      crct10dif_pclmul       16384  1
      crc32_pclmul           16384  0
      ghash_clmulni_intel    16384  0
      
      
      # grep \( /proc/modules
      hello 16384 0 - Live 0xffffffffc050b000 (OE)
      
    • Once you have determined the size of memory required to initialize the third party modules installed on your server, you must manually set the crashkernel size. Use the size you have derived from lsmod or /proc/modules as a starting point and test kdump for successful vmcore collection. Setting the crashkernel reservation manually is accomplished in different ways depending on the version of Red Hat Enterprise Linux you are running.

  4. Setting the crashkernel Reservation

    • In Red Hat Enterprise Linux 6

      • The crashkernel reservation is made in Red Hat Enterprise Linux 6 by adding the desired value to the kernel line in /boot/grub/grub.conf, then rebooting the system. Here it is being set to 132 MB to account for third party modules:

        kernel /vmlinuz-2.6.32-754.el6.x86_64 ro root=/dev/mapper/vg_rhel6-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 rd_LVM_LV=vg_rhel6/lv_swap crashkernel=auto  KEYBOARDTYPE=pc KEYTABLE=us rd_LVM_LV=vg_rhel6/lv_root rd_NO_DM console=tty0 console=ttyS0,115200 crashkernel=132M
        
        # reboot now
        
    • In Red Hat Enterprise Linux 7

      • In Red Hat Enterprise Linux 7 the crashkernel reservation is specified by naming it in /etc/default/grub, rebuilding the /boot/grub2/grub.cfg configuration file, and rebooting the system:

        GRUB_CMDLINE_LINUX="crashkernel=132M rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap console=tty0 console=ttyS0,115200"   <<<---- crashkernel reservation specified on this line
        
        On BIOS-based machines: # grub2-mkconfig -o /boot/grub2/grub.cfg
        On UEFI-based machines: # grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg
        
        # reboot now
        
    • In Red Hat Enterprise Linux 8

      • In Red Hat Enterprise Linux 8 the crashkernel reservation is set by using the kernelopts line in /boot/grub2/grubenv then rebooting the system:

        # grep crashkernel /boot/grub2/grubenv
        kernelopts=root=/dev/mapper/rhel-root ro crashkernel=132M@16M resume=/dev/mapper/rhel-swap rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap console=tty0 console=ttyS0,115200
        
        # reboot now
        
  5. Test Kdump

    • The final step after modifying your kdump configuration in any way - including resizing the crashkernel reservation or blacklisting modules - is to test kdump to ensure a vmcore is successfully dumped with the new configuration in use. This is accomplished by first ensuring that the SysRq is enabled:

      # echo 1 > /proc/sys/kernel/sysrq
      
    • Then executing the following command. Warning: The following command will cause a kernel panic and crash your system!

      # echo c > /proc/sysrq-trigger
      
    • Finally, verify a vmcore exists in the target dump path:

      # ls /var/crash/127.0.0.1-2020-08-21-12\:11\:00/
      vmcore  vmcore-dmesg.txt
      
  6. Blacklisting Third Party Modules Unnecessary for kdump to Successfully Dump a vmcore (Optional)

    • For third party modules that are not needed to dump a vmcore, it is recommended to blacklist the modules within the kdump configuration. This is accomplished in different ways depending on the version of Red Hat Enterprise Linux you are running.

    • For Red Hat Enterprise Linux 6:

      • In Red Hat Enterprise Linux 6 kernel modules can be blacklisted by restarting the kdump service after editing the /etc/kdump.conf file to include a blacklist line as seen in the example provided below:

        # grep -v ^# /etc/kdump.conf
        path /var/crash
        core_collector makedumpfile -c --message-level 1 -d 31
        blacklist module1 module2
        
        # service kdump restart
        Stopping kdump:                                            [  OK  ]
        Starting kdump:                                            [  OK  ]
        
    • For Red Hat Enterprise Linux 7 and later:

      • In Red Hat Enterprise Linux 7 and later modules are blacklisted on the ‘KDUMP_COMMANDLINE_APPEND’ line of the /etc/sysconfig/kdump file with the rd.blacklist.driver directive. The kdump service must then be restarted:

        KDUMP_COMMANDLINE_APPEND="irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug transparent_hugepage=never nokaslr novmcoredd hest_disable rd.driver.blacklist=module1,module2,module3,module4"
        
        # systemctl restart kdump
        

Comments