Chapter 7. Installing and configuring kdump

7.1. What is kdump

kdump is a service providing a crash dumping mechanism. The service enables you to save the contents of the system’s memory for later analysis. kdump uses the kexec system call to boot into the second kernel (a capture kernel) without rebooting; and then captures the contents of the crashed kernel’s memory (a crash dump or a vmcore) and saves it. The second kernel resides in a reserved part of the system memory.

Important

A kernel crash dump can be the only information available in the event of a system failure (a critical bug). Therefore, ensuring that kdump is operational is important in mission-critical environments. Red Hat advise that system administrators regularly update and test kexec-tools in your normal kernel update cycle. This is especially important when new kernel features are implemented.

7.2. Installing kdump

In many cases, the kdump service is installed and activated by default on the new Red Hat Enterprise Linux installations. The Anaconda installer provides a screen for kdump configuration when performing an interactive installation using the graphical or text interface. The installer screen is titled Kdump and is available from the main Installation Summary screen, and only allows limited configuration - you can only select whether kdump is enabled and how much memory is reserved.

Enable kdump during RHEL installation

Some installation options, such as custom Kickstart installations, in some cases do not install or enable kdump by default. If this is the case on your system, follow the procedure below to install kdump.

Prerequisites

  • An active Red Hat Enterprise Linux subscription
  • A repository containing the kexec-tools package for your system CPU architecture
  • Fulfilled kdump requirements

Procedure

  1. Execute the following command to check whether kdump is installed on your system:

    $ rpm -q kexec-tools

    Output if the package is installed:

    kexec-tools-2.0.17-11.el8.x86_64

    Output if the package is not installed:

    package kexec-tools is not installed
  2. Install kdump and other necessary packages by:

    # yum install kexec-tools
Important

Starting with Red Hat Enterprise Linux 7.4 (kernel-3.10.0-693.el7) the Intel IOMMU driver is supported with kdump. For prior versions, Red Hat Enterprise Linux 7.3 (kernel-3.10.0-514[.XYZ].el7) and earlier, it is advised that Intel IOMMU support is disabled, otherwise kdump kernel is likely to become unresponsive.

Additional resources

7.3. Configuring kdump on the command line

7.3.1. Configuring kdump memory usage

The memory reserved for the kdump feature is always reserved during the system boot. The amount of memory is specified in the system’s Grand Unified Bootloader (GRUB) 2 configuration. The procedure below describes how to configure the memory reserved for kdump through the command line.

Prerequisites

Procedure

  1. Edit the /etc/default/grub file using the root permissions.
  2. Set the crashkernel= option to the required value.

    For example, to reserve 128 MB of memory, use the following:

    crashkernel=128M

    Alternatively, you can set the amount of reserved memory to a variable depending on the total amount of installed memory. The syntax for memory reservation into a variable is crashkernel=<range1>:<size1>,<range2>:<size2>. For example:

    crashkernel=512M-2G:64M,2G-:128M

    The above example reserves 64 MB of memory if the total amount of system memory is 512 MB or higher and lower than 2 GB. If the total amount of memory is more than 2 GB, 128 MB is reserved for kdump instead.

    • Offset the reserved memory.

      Some systems require to reserve memory with a certain fixed offset since crashkernel reservation is very early, and it wants to reserve some area for special usage. If the offset is set, the reserved memory begins there. To offset the reserved memory, use the following syntax:

      crashkernel=128M@16M

      The example above means that kdump reserves 128 MB of memory starting at 16 MB (physical address 0x01000000). If the offset parameter is set to 0 or omitted entirely, kdump offsets the reserved memory automatically. This syntax can also be used when setting a variable memory reservation as described above; in this case, the offset is always specified last (for example, crashkernel=512M-2G:64M,2G-:128M@16M).

  3. Use the following command to update the GRUB2 configuration file:

    # grub2-mkconfig -o /boot/grub2/grub.cfg
Note

The alternative way to configure memory for kdump is to append the crashkernel=<SOME_VALUE> parameter to the kernelopts variable with the grub2-editenv which will update all of your boot entries. Or you can use the grubby utility to update kernel command line parameters of just one entry.

Warning

On systems where NVDIMM is configured as storage, the page table metadata can be stored in regular memory or on the NVDIMM with default overhead of 64 bytes per 4k page.

If the metadata is stored in regular memory and the size of NVDIMM is huge, it eventually needs to reserve extra huge system memory in the kdump kernel, but this is not acceptable by kdump kernel.

So to resolve the problem, persistent memory (pmem) is recommended not to be set as dumping target, if pmem namespace is created with the --map mem argument on the system.

Additional resources

7.3.2. Configuring the kdump target

When a kernel crash is captured, the core dump can be either stored as a file in a local file system, written directly to a device, or sent over a network using the NFS (Network File System) or SSH (Secure Shell) protocol. Only one of these options can be set at a time, and the default behavior is to store the vmcore file in the /var/crash/ directory of the local file system.

Prerequisites

Procedure

To store the vmcore file in /var/crash/ directory of the local file system:

  • Edit the /etc/kdump.conf file and specify the path:

    path /var/crash

    The option path /var/crash represents the path to the file system in which kdump saves the vmcore file. When you specify a dump target in the /etc/kdump.conf file, then the path is relative to the specified dump target.

    If you do not specify a dump target in the /etc/kdump.conf file, then the path represents the absolute path from the root directory. Depending on what is mounted in the current system, the dump target and the adjusted dump path are taken automatically.

Warning

kdump saves the vmcore file in /var/crash/var/crash directory, when the dump target is mounted at /var/crash and the option path is also set as /var/crash in the /etc/kdump.conf file. For example, in the following instance, the ext4 file system is already mounted at /var/crash and the path are set as /var/crash:

grep -v ^# etc/kdump.conf | grep -v ^$
ext4 /dev/mapper/vg00-varcrashvol
path /var/crash
core_collector makedumpfile -c --message-level 1 -d 31

This results in the /var/crash/var/crash path. To solve this problem, use the option path / instead of path /var/crash

To change the local directory in which the core dump is to be saved, as root, edit the /etc/kdump.conf configuration file as described below.

  1. Remove the hash sign ("#") from the beginning of the #path /var/crash line.
  2. Replace the value with the intended directory path. For example:

    path /usr/local/cores
    Important

    In Red Hat Enterprise Linux 8, the directory defined as the kdump target using the path directive must exist when the kdump systemd service is started - otherwise the service fails. This behavior is different from earlier releases of Red Hat Enterprise Linux, where the directory was being created automatically if it did not exist when starting the service.

To write the file to a different partition, as root, edit the /etc/kdump.conf configuration file as described below.

  1. Remove the hash sign ("#") from the beginning of the #ext4 line, depending on your choice.

    • device name (the #ext4 /dev/vg/lv_kdump line)
    • file system label (the #ext4 LABEL=/boot line)
    • UUID (the #ext4 UUID=03138356-5e61-4ab3-b58e-27507ac41937 line)
  2. Change the file system type as well as the device name, label or UUID to the desired values. For example:

    ext4 UUID=03138356-5e61-4ab3-b58e-27507ac41937
    Important

    It is recommended to specify storage devices using a LABEL= or UUID=. Disk device names such as /dev/sda3 are not guaranteed to be consistent across reboot.

    Important

    When dumping to Direct Access Storage Device (DASD) on IBM Z hardware, it is essential that the dump devices are correctly specified in /etc/dasd.conf before proceeding.

To write the dump directly to a device:

  1. Remove the hash sign ("#") from the beginning of the #raw /dev/vg/lv_kdump line.
  2. Replace the value with the intended device name. For example:

    raw /dev/sdb1

To store the dump to a remote machine using the NFS protocol:

  1. Remove the hash sign ("#") from the beginning of the #nfs my.server.com:/export/tmp line.
  2. Replace the value with a valid hostname and directory path. For example:

    nfs penguin.example.com:/export/cores

To store the dump to a remote machine using the SSH protocol:

  1. Remove the hash sign ("#") from the beginning of the #ssh user@my.server.com line.
  2. Replace the value with a valid username and hostname.
  3. Include your SSH key in the configuration.

    • Remove the hash sign from the beginning of the #sshkey /root/.ssh/kdump_id_rsa line.
    • Change the value to the location of a key valid on the server you are trying to dump to. For example:

      ssh john@penguin.example.com
      sshkey /root/.ssh/mykey

Additional resources

7.3.3. Configuring the core collector

kdump uses a program specified as core collector to capture the vmcore. Currently, the only fully supported core collector is the makedumpfile utility. It has several configurable options, which affect the collection process. For example the extent of collected data, or whether the resulting vmcore should be compressed.

To enable and configure the core collector, follow the procedure below.

Prerequisites

Procedure

  1. As root, edit the /etc/kdump.conf configuration file and remove the hash sign ("#") from the beginning of the #core_collector makedumpfile -l --message-level 1 -d 31.
  2. Add the -c parameter. For example:

    core_collector makedumpfile -c

    The command above enables the dump file compression.

  3. Add the -d value parameter. For example:

    core_collector makedumpfile -d 17 -c

    The command above removes both zero and free pages from the dump. The value represents a bitmask, where each bit is associated with a certain type of memory pages and determines whether that type of pages will be collected. For description of respective bits see Section 7.5.4, “Supported kdump filtering levels”.

Additional resources

  • See the makedumpfile(8) man page for a complete list of available options.

7.3.4. Configuring the kdump default failure responses

By default, when kdump fails to create a vmcore file at the target location specified in Section 7.3.2, “Configuring the kdump target”, the system reboots, and the dump is lost in the process. To change this behavior, follow the procedure below.

Prerequisites

Procedure

  1. As root, remove the hash sign ("#") from the beginning of the #failure_action line in the /etc/kdump.conf configuration file.
  2. Replace the value with a desired action as described in Section 7.5.5, “Supported default failure responses”. For example:

    failure_action poweroff

7.3.5. Enabling and disabling the kdump service

To start the kdump service at boot time, follow the procedure below.

Prerequisites

Procedure

  1. To enable the kdump service, use the following command:

    # systemctl enable kdump.service

    This enables the service for multi-user.target.

  2. To start the service in the current session, use the following command:

    # systemctl start kdump.service
  3. To stop the kdump service, type the following command:

    # systemctl stop kdump.service
  4. To disable the kdump service, execute the following command:

    # systemctl disable kdump.service
Warning

It is recommended to set kptr_restrict=1 as default. When kptr_restrict is set to (1) as default, the kdumpctl service loads the crash kernel even if Kernel Address Space Layout (KASLR) is enabled or not enabled.

Troubleshooting step

When kptr_restrict is not set to (1), and if KASLR is enabled, the contents of /proc/kore file are generated as all zeros. Consequently, the kdumpctl service fails to access the /proc/kcore and load the crash kernel.

To work around this problem, the kexec-kdump-howto.txt file displays a warning message, which specifies to keep the recommended setting as kptr_restrict=1.

To ensure that kdumpctl service loads the crash kernel, verify that:

  • Kernel kptr_restrict=1 in the sysctl.conf file.

Additional resources

7.4. Configuring kdump in the web console

Setup and test the kdump configuration in the RHEL 8 web console.

The web console is part of a default installation of Red Hat Enterprise Linux 8 and enables or disables the kdump service at boot time. Further, the web console conveniently enables you to configure the reserved memory for kdump; or to select the vmcore saving location in an uncompressed or compressed format.

Prerequisites

7.4.1. Configuring kdump memory usage and target location in web console

The procedure below shows you how to use the Kernel Dump tab in the Red Hat Enterprise Linux web console interface to configure the amount of memory that is reserved for the kdump kernel. The procedure also describes how to specify the target location of the vmcore dump file and how to test your configuration.

Prerequisites

Procedure

  1. Open the Kernel Dump tab and start the kdump service.
  2. Configure the kdump memory usage through the command line.
  3. Click the link next to the Crash dump location option.

    web console initial screen
  4. Select the Local Filesystem option from the drop-down and specify the directory you want to save the dump in.

    web console crashdump target
    • Alternatively, select the Remote over SSH option from the drop-down to send the vmcore to a remote machine using the SSH protocol.

      Fill the Server, ssh key, and Directory fields with the remote machine address, ssh key location, and a target directory.

    • Another choice is to select the Remote over NFS option from the drop-down and fill the Mount field to send the vmcore to a remote machine using the NFS protocol.

      Note

      Tick the Compression check box to reduce the size of the vmcore file.

  5. Test your configuration by crashing the kernel.

    web console test kdump config
    Warning

    This step disrupts execution of the kernel and results in a system crash and loss of data.

Additional resources

7.5. Supported kdump configurations and targets

7.5.1. Memory requirements for kdump

In order for kdump to be able to capture a kernel crash dump and save it for further analysis, a part of the system memory has to be permanently reserved for the capture kernel. When reserved, this part of the system memory is not available to the main kernel.

The memory requirements vary based on certain system parameters. One of the major factors is the system’s hardware architecture. To find out the exact machine architecture (such as Intel 64 and AMD64, also known as x86_64) and print it to standard output, use the following command:

$ uname -m

The table below contains a list of minimum memory requirements to automatically reserve a memory size for kdump. The size changes according to the system’s architecture and total available physical memory.

Table 7.1. Minimum Amount of Reserved Memory Required for kdump

ArchitectureAvailable MemoryMinimum Reserved Memory

AMD64 and Intel 64 (x86_64)

1 GB to 4 GB

160 MB of RAM.

4 GB to 64 GB

192 MB of RAM.

64 GB to 1 TB

256 MB of RAM.

1 TB and more

512 MB of RAM.

64-bit ARM architecture (arm64)

2 GB and more

448 MB of RAM.

IBM Power Systems (ppc64le)

2 GB to 4 GB

384 MB of RAM.

4 GB to 16 GB

512 MB of RAM.

16 GB to 64 GB

1 GB of RAM.

64 GB to 128 GB

2 GB of RAM.

128 GB and more

4 GB of RAM.

IBM Z (s390x)

1 GB to 4 GB

160 MB of RAM.

4 GB to 64 GB

192 MB of RAM.

64 GB to 1 TB

256 MB of RAM.

1 TB and more

512 MB of RAM.

On many systems, kdump is able to estimate the amount of required memory and reserve it automatically. This behavior is enabled by default, but only works on systems that have more than a certain amount of total available memory, which varies based on the system architecture.

Important

The automatic configuration of reserved memory based on the total amount of memory in the system is a best effort estimation. The actual required memory may vary due to other factors such as I/O devices. Using not enough of memory might cause that a debug kernel is not able to boot as a capture kernel in case of a kernel panic. To avoid this problem, sufficiently increase the crash kernel memory.

Additional resources

7.5.2. Minimum threshold for automatic memory reservation

On some systems, it is possible to allocate memory for kdump automatically, either by using the crashkernel=auto parameter in the boot loader configuration file, or by enabling this option in the graphical configuration utility. For this automatic reservation to work, however, a certain amount of total memory needs to be available in the system. The amount differs based on the system’s architecture.

The table below lists the thresholds for automatic memory allocation. If the system has less memory than specified in the table, the memory needs to be reserved manually.

Table 7.2. Minimum Amount of Memory Required for Automatic Memory Reservation

ArchitectureRequired Memory

AMD64 and Intel 64 (x86_64)

2 GB

IBM Power Systems (ppc64le)

2 GB

IBM  Z (s390x)

4 GB

Additional resources

7.5.3. Supported kdump targets

When a kernel crash is captured, the vmcore dump file can be either written directly to a device, stored as a file on a local file system, or sent over a network. The table below contains a complete list of dump targets that are currently supported or explicitly unsupported by kdump.

Table 7.3. Supported kdump Targets

TypeSupported TargetsUnsupported Targets

Raw device

All locally attached raw disks and partitions.

 

Local file system

ext2, ext3, ext4, and xfs file systems on directly attached disk drives, hardware RAID logical drives, LVM devices, and mdraid arrays.

Any local file system not explicitly listed as supported in this table, including the auto type (automatic file system detection).

Remote directory

Remote directories accessed using the NFS or SSH protocol over IPv4.

Remote directories on the rootfs file system accessed using the NFS protocol.

Remote directories accessed using the iSCSI protocol over both hardware and software initiators.

Remote directories accessed using the iSCSI protocol on be2iscsi hardware.

Multipath-based storages.

 

Remote directories accessed over IPv6.

 

Remote directories accessed using the SMB or CIFS protocol.

 

Remote directories accessed using the FCoE (Fibre Channel over Ethernet) protocol.

 

Remote directories accessed using wireless network interfaces.

Important

Utilizing firmware assisted dump (fadump) to capture a vmcore and store it to a remote machine using SSH or NFS protocol causes renaming of the network interface to kdump-<interface-name>. The renaming happens if the <interface-name> is generic, for example *eth#, net#, and so on. This problem occurs because the vmcore capture scripts in the initial RAM disk (initrd) add the kdump- prefix to the network interface name to secure persistent naming. Since the same initrd is used also for a regular boot, the interface name is changed for the production kernel too.

Additional resources

7.5.4. Supported kdump filtering levels

To reduce the size of the dump file, kdump uses the makedumpfile core collector to compress the data and optionally to omit unwanted information. The table below contains a complete list of filtering levels that are currently supported by the makedumpfile utility.

Table 7.4. Supported Filtering Levels

OptionDescription

1

Zero pages

2

Cache pages

4

Cache private

8

User pages

16

Free pages

Note

The makedumpfile command supports removal of transparent huge pages and hugetlbfs pages. Consider both these types of hugepages User Pages and remove them using the -8 level.

Additional resources

7.5.5. Supported default failure responses

By default, when kdump fails to create a core dump, the operating system reboots. You can, however, configure kdump to perform a different operation in case it fails to save the core dump to the primary target. The table below lists all default actions that are currently supported.

Table 7.5. Supported Default Actions

OptionDescription

dump_to_rootfs

Attempt to save the core dump to the root file system. This option is especially useful in combination with a network target: if the network target is unreachable, this option configures kdump to save the core dump locally. The system is rebooted afterwards.

reboot

Reboot the system, losing the core dump in the process.

halt

Halt the system, losing the core dump in the process.

poweroff

Power off the system, losing the core dump in the process.

shell

Run a shell session from within the initramfs, allowing the user to record the core dump manually.

Additional resources

7.5.6. Estimating kdump size

When planning and building your kdump environment, it is necessary to know how much space is required for the dump file before one is produced.

The makedumpfile --mem-usage command provides a useful report about excludable pages, and can be used to determine which dump level you want to assign. Run this command when the system is under representative load, otherwise makedumpfile --mem-usage returns a smaller value than is expected in your production environment.

[root@hostname ~]# makedumpfile --mem-usage /proc/kcore

TYPE            PAGES                   EXCLUDABLE      DESCRIPTION
----------------------------------------------------------------------
ZERO            501635                  yes             Pages filled with zero
CACHE           51657                   yes             Cache pages
CACHE_PRIVATE   5442                    yes             Cache pages + private
USER            16301                   yes             User process pages
FREE            77738211                yes             Free pages
KERN_DATA       1333192                 no              Dumpable kernel data
Important

The makedumpfile --mem-usage command reports in pages. This means that you have to calculate the size of memory in use against the kernel page size. By default the Red Hat Enterprise Linux kernel uses 4 KB sized pages for AMD64 and Intel 64 architectures, and 64 KB sized pages for IBM POWER architectures.

7.6. Testing the kdump configuration

The following procedure describes how to test that the kernel dump process works and is valid before the machine enters production.

Warning

The commands below cause the kernel to crash. Use caution when following these steps, and never carelessly use them on active production system.

Procedure

  1. Reboot the system with kdump enabled.
  2. Make sure that kdump is running:

    ~]# systemctl is-active kdump
    active
  3. Force the Linux kernel to crash:

    echo 1 > /proc/sys/kernel/sysrq
    echo c > /proc/sysrq-trigger
    Warning

    The command above crashes the kernel and a reboot is required.

    Once booted again, the address-YYYY-MM-DD-HH:MM:SS/vmcore file is created at the location you have specified in /etc/kdump.conf (by default to /var/crash/).

    Note

    In addition to confirming the validity of the configuration, it is possible to use this action to record how long it takes for a crash dump to complete, while a representative load was running.

7.7. Using kexec to reboot the kernel

The kexec system call enables loading and booting into another kernel from the currently running kernel, thus performing a function of a boot loader from within the kernel.

The kexec utility loads the kernel and the initramfs image for the kexec system call to boot into another kernel.

The following procedure describes how to manually invoke the kexec system call when using the kexec utility to reboot into another kernel.

Procedure

  1. Execute the kexec utility:

    # kexec -l /boot/vmlinuz-3.10.0-1040.el7.x86_64 --initrd=/boot/initramfs-3.10.0-1040.el7.x86_64.img --reuse-cmdline

    The command manually loads the kernel and the initramfs image for the kexec system call.

  2. Reboot the system:

    # reboot

    The command detects the kernel, shuts down all services and then calls the kexec system call to reboot into the kernel you provided in the previous step.

Warning

When you use the kexec -e command to reboot the kernel, the system does not go through the standard shutdown sequence before starting the next kernel, which may cause data loss or an unresponsive system.

7.8. Blacklisting kernel drivers for kdump

Blacklisting kernel drivers for kdump is a mechanism to prevent the intended kernel drivers from loading. Blacklisting kernel drivers prevents the oom killer or other crash kernel failures.

To blacklist the kernel drivers, you may update the KDUMP_COMMANDLINE_APPEND= variable in the /etc/sysconfig/kdump file and specify one of the following blacklisting option:

  • rd.driver.blacklist=<modules>
  • modprobe.blacklist=<modules>

When you blacklist drivers in /etc/sysconfig/kdump file, it prevents the kdump initramfs from loading the blacklisted modules.

The following procedure describes how to blacklist a kernel driver to prevent crash kernel failures.

Procedure

  1. Select the kernel module that you intend to blacklist:

    $ lsmod
    
    Module                  Size  Used by
    fuse                  126976  3
    xt_CHECKSUM            16384  1
    ipt_MASQUERADE         16384  1
    uinput                 20480  1
    xt_conntrack           16384  1

    The lsmod command displays a list of modules that are loaded to the currently running kernel.

  2. Update the KDUMP_COMMANDLINE_APPEND= line in the /etc/sysconfig/kdump file as follows:

    KDUMP_COMMANDLINE_APPEND="rd.driver.blacklist=hv_vmbus,hv_storvsc,hv_utils,hv_netvsc,hid-hyperv"
  3. You can also update the KDUMP_COMMANDLINE_APPEND= line in the /etc/sysconfig/kdump file as follows:

    KDUMP_COMMANDLINE_APPEND="modprobe.blacklist=emcp modprobe.blacklist=bnx2fc modprobe.blacklist=libfcoe modprobe.blacklist=fcoe"
  4. Restart the kdump service:

    $ systemctl restart kdump

Additional resources

  • For more information concerning the oom killer, see the following Knowledge Article.
  • The dracut.cmdline manpage for modules blacklist options.

7.9. Running kdump on systems with encrypted disk

When running an encrypted partition created by the Logical Volume Manager (LVM) tool, systems require a certain amount of available memory. If the system has less than the required amount of available memory, the cryptsetup utility fails to mount the partition. As a result, capturing the vmcore file to a local kdump target location (with LVM and enabled encryption), fails in the second kernel (capture kernel).

This procedure describes the running kdump mechanism by increasing the crashkernel= value, using a remote kdump target, or using a key derivation function (KDF).

Procedure

Run the kdump mechanism using one of the following procedures:

  • To run the kdump define one of the following:

    • Configure a remote kdump target.
    • Define the dump to an unencrypted partition.
    • Specify an increased crashkernel= value to the required level.
  • Add an extra key slot by using a key derivation function (KDF):

    1. cryptsetup luksAddKey --pbkdf pbkdf2 /dev/vda2
    2. cryptsetup config --key-slot 1 --priority prefer /dev/vda2
    3. cryptsetup luksDump /dev/vda2

Using the default KDF of the encrypted partition may consume a lot of memory. You must manually provide the password in the second kernel (capture), even if you encounter an Out of Memory (OOM) error message.

Warning

Adding an extra key slot can have a negative effect on security, as multiple keys can decrypt an encrypted volume. This may cause a potential risk to the volume.

7.10. Firmware assisted dump mechanisms

Firmware assisted dump (fadump) is a dump capturing mechanism, provided as an alternative to the kdump mechanism on IBM POWER systems. The kexec and kdump mechanisms are useful for capturing core dumps on AMD64 and Intel 64 systems. However, some hardware such as mini systems and mainframe computers, leverage the onboard firmware to isolate regions of memory and prevent any accidental overwriting of data that is important to the crash analysis. This section covers fadump mechanisms and how they integrate with RHEL. The fadump utility is optimized for these expanded dumping features on IBM POWER systems.

7.10.1. Firmware assisted dump on IBM PowerPC hardware

The fadump utility captures the vmcore file from a fully-reset system with PCI and I/O devices. This mechanism uses firmware to preserve memory regions during a crash and then reuses the kdump userspace scripts to save the vmcore file. The memory regions consist of all system memory contents, except the boot memory, system registers, and hardware Page Table Entries (PTEs).

The fadump mechanism offers improved reliability over the traditional dump type, by rebooting the partition and using a new kernel to dump the data from the previous kernel crash. The fadump requires an IBM POWER6 processor-based or later version hardware platform.

For further details about the fadump mechanism, including PowerPC specific methods of resetting hardware, see the /usr/share/doc/kexec-tools/fadump-howto.txt file.

Note

The area of memory that is not preserved, known as boot memory, is the amount of RAM required to successfully boot the kernel after a crash event. By default, the boot memory size is 256MB or 5% of total system RAM, whichever is larger.

Unlike kexec-initiated event, the fadump mechanism uses the production kernel to recover a crash dump. When booting after a crash, PowerPC hardware makes the device node /proc/device-tree/rtas/ibm.kernel-dump available to the proc filesystem (procfs). The fadump-aware kdump scripts, check for the stored vmcore, and then complete the system reboot cleanly.

7.10.2. Enabling firmware assisted dump mechanism

The crash dumping capabilities of IBM POWER systems can be enhanced by enabling the firmware assisted dump (fadump) mechanism.

Procedure

  1. Install and configure kdump as described in Chapter 7, "Installing and configuring kdump".
  2. Add fadump=on to the GRUB_CMDLINE_LINUX line in /etc/default/grub file:

    GRUB_CMDLINE_LINUX="rd.lvm.lv=rhel/swap crashkernel=auto
    rd.lvm.lv=rhel/root rhgb quiet fadump=on"
  3. (Optional) If you want to specify reserved boot memory instead of using the defaults, configure crashkernel=xxM to GRUB_CMDLINE_LINUX in /etc/default/grub, where xx is the amount of the memory required in megabytes:

    GRUB_CMDLINE_LINUX="rd.lvm.lv=rhel/swap crashkernel=xxM rd.lvm.lv=rhel/root rhgb quiet fadump=on"
    Important

    Red Hat recommends to test all boot configuration options before you execute them. If you observe Out of Memory (OOM) errors when booting from the crash kernel, increase the value specified in crashkernel= argument until the crash kernel can boot cleanly. Some trial and error may be required in this case.

7.10.3. Firmware assisted dump mechanisms on IBM Z hardware

IBM Z systems support the following firmware assisted dump mechanisms:

  • Stand-alone dump (sadump)
  • VMDUMP

The kdump infrastructure is supported and utilized on IBM Z systems. To configure kdump from RHEL, see, Chapter 7, "Installing and configuring kdump".

However, using one of the firmware assisted dump (fadump) methods for IBM Z can provide various benefits:

  • The sadump mechanism is initiated and controlled from the system console, and is stored on an IPL bootable device.
  • The VMDUMP mechanism is similar to sadump. This tool is also initiated from the system console, but retrieves the resulting dump from hardware and copies it to the system for analysis.
  • These methods (similarly to other hardware based dump mechanisms) have the ability to capture the state of a machine in the early boot phase, before the kdump service starts.
  • Although VMDUMP contains a mechanism to receive the dump file into a Red Hat Enterprise Linux system, the configuration and control of VMDUMP is managed from the IBM Z Hardware console.

IBM discusses sadump in detail in the Stand-alone dump program article and VMDUMP in the Creating dumps on z/VM with VMDUMP article.

IBM also has a documentation set for using the dump tools on Red Hat Enterprise Linux 7 in the Using the Dump Tools on Red Hat Enterprise Linux 7.4 article.

7.10.4. Using sadump on Fujitsu PRIMEQUEST systems

The Fujitsu sadump mechanism is designed to provide a fallback dump capture in an event when kdump is unable to complete successfully. The sadump mechanism is invoked manually from the system Management Board (MMB) interface. Using MMB, configure kdump like for an Intel 64 or AMD 64 server and then perform the following additional steps to enable sadump.

Procedure

  1. Add or edit the following lines in the /etc/sysctl.conf file to ensure that kdump starts as expected for sadump:

    kernel.panic=0
    kernel.unknown_nmi_panic=1
    Warning

    In particular, ensure that after kdump, the system does not reboot. If the system reboots after kdump has fails to save the vmcore file, then it is not possible to invoke the sadump.

  2. Set the failure_action parameter in /etc/kdump.conf appropriately as halt or shell.

    failure_action shell

Additional resources

For details on configuring your hardware for sadump, see the FUJITSU Server PRIMEQUEST 2000 Series Installation Manual.

7.11. Analyzing a core dump

To determine the cause of the system crash, you can use the crash utility, which provides an interactive prompt very similar to the GNU Debugger (GDB). This utility allows you to interactively analyze a core dump created by kdump, netdump, diskdump or xendump as well as a running Linux system. Alternatively, you have the option to use the Kdump Helper or Kernel Oops Analyzer.

7.11.1. Installing the crash utility

The following procedure describes how to install the crash analyzing tool.

Procedure

  1. Enable the relevant repositories:

    # subscription-manager repos --enable baseos repository
    # subscription-manager repos --enable appstream repository
    # subscription-manager repos --enable rhel-8-for-x86_64-baseos-debug-rpms
  2. Install the crash package:

    # yum install crash
  3. Install the kernel-debuginfo package:

    # yum install kernel-debuginfo

    The package corresponds to your running kernel and provides the data necessary for the dump analysis.

Additional resources

7.11.2. Running and exiting the crash utility

The following procedure describes how to start the crash utility for analyzing the cause of the system crash.

Prerequisites

  • Identify the currently running kernel (for example 4.18.0-5.el8.x86_64).

Procedure

  1. To start the crash utility, two necessary parameters need to be passed to the command:

    • The debug-info (a decompressed vmlinuz image), for example /usr/lib/debug/lib/modules/4.18.0-5.el8.x86_64/vmlinux provided through a specific kernel-debuginfo package.
    • The actual vmcore file, for example /var/crash/127.0.0.1-2018-10-06-14:05:33/vmcore

      The resulting crash command then looks like this:

      # crash /usr/lib/debug/lib/modules/4.18.0-5.el8.x86_64/vmlinux /var/crash/127.0.0.1-2018-10-06-14:05:33/vmcore

      Use the same <kernel> version that was captured by kdump.

      Example 7.1. Running the crash utility

      The following example shows analyzing a core dump created on October 6 2018 at 14:05 PM, using the 4.18.0-5.el8.x86_64 kernel.

      ...
      WARNING: kernel relocated [202MB]: patching 90160 gdb minimal_symbol values
      
            KERNEL: /usr/lib/debug/lib/modules/4.18.0-5.el8.x86_64/vmlinux
          DUMPFILE: /var/crash/127.0.0.1-2018-10-06-14:05:33/vmcore  [PARTIAL DUMP]
              CPUS: 2
              DATE: Sat Oct  6 14:05:16 2018
            UPTIME: 01:03:57
      LOAD AVERAGE: 0.00, 0.00, 0.00
             TASKS: 586
          NODENAME: localhost.localdomain
           RELEASE: 4.18.0-5.el8.x86_64
           VERSION: #1 SMP Wed Aug 29 11:51:55 UTC 2018
           MACHINE: x86_64  (2904 Mhz)
            MEMORY: 2.9 GB
             PANIC: "sysrq: SysRq : Trigger a crash"
               PID: 10635
           COMMAND: "bash"
              TASK: ffff8d6c84271800  [THREAD_INFO: ffff8d6c84271800]
               CPU: 1
             STATE: TASK_RUNNING (SYSRQ)
      
      crash>
  2. To exit the interactive prompt and terminate crash, type exit or q.

    Example 7.2. Exiting the crash utility

    crash> exit
    ~]#
Note

The crash command can also be used as a powerful tool for debugging a live system. However use it with caution so as not to break your system.

7.11.3. Displaying various indicators in the crash utility

The following procedures describe how to use the crash utility and display various indicators, such as a kernel message buffer, a backtrace, a process status, virtual memory information and open files.

Displaying the message buffer
  • To display the kernel message buffer, type the log command at the interactive prompt as displayed in the example below:
crash> log
... several lines omitted ...
EIP: 0060:[<c068124f>] EFLAGS: 00010096 CPU: 2
EIP is at sysrq_handle_crash+0xf/0x20
EAX: 00000063 EBX: 00000063 ECX: c09e1c8c EDX: 00000000
ESI: c0a09ca0 EDI: 00000286 EBP: 00000000 ESP: ef4dbf24
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process bash (pid: 5591, ti=ef4da000 task=f196d560 task.ti=ef4da000)
Stack:
 c068146b c0960891 c0968653 00000003 00000000 00000002 efade5c0 c06814d0
<0> fffffffb c068150f b7776000 f2600c40 c0569ec4 ef4dbf9c 00000002 b7776000
<0> efade5c0 00000002 b7776000 c0569e60 c051de50 ef4dbf9c f196d560 ef4dbfb4
Call Trace:
 [<c068146b>] ? __handle_sysrq+0xfb/0x160
 [<c06814d0>] ? write_sysrq_trigger+0x0/0x50
 [<c068150f>] ? write_sysrq_trigger+0x3f/0x50
 [<c0569ec4>] ? proc_reg_write+0x64/0xa0
 [<c0569e60>] ? proc_reg_write+0x0/0xa0
 [<c051de50>] ? vfs_write+0xa0/0x190
 [<c051e8d1>] ? sys_write+0x41/0x70
 [<c0409adc>] ? syscall_call+0x7/0xb
Code: a0 c0 01 0f b6 41 03 19 d2 f7 d2 83 e2 03 83 e0 cf c1 e2 04 09 d0 88 41 03 f3 c3 90 c7 05 c8 1b 9e c0 01 00 00 00 0f ae f8 89 f6 <c6> 05 00 00 00 00 01 c3 89 f6 8d bc 27 00 00 00 00 8d 50 d0 83
EIP: [<c068124f>] sysrq_handle_crash+0xf/0x20 SS:ESP 0068:ef4dbf24
CR2: 0000000000000000

Type help log for more information on the command usage.

Note

The kernel message buffer includes the most essential information about the system crash and, as such, it is always dumped first in to the vmcore-dmesg.txt file. This is useful when an attempt to get the full vmcore file failed, for example because of lack of space on the target location. By default, vmcore-dmesg.txt is located in the /var/crash/ directory.

Displaying a backtrace
  • To display the kernel stack trace, use the bt command.
crash> bt
PID: 5591   TASK: f196d560  CPU: 2   COMMAND: "bash"
 #0 [ef4dbdcc] crash_kexec at c0494922
 #1 [ef4dbe20] oops_end at c080e402
 #2 [ef4dbe34] no_context at c043089d
 #3 [ef4dbe58] bad_area at c0430b26
 #4 [ef4dbe6c] do_page_fault at c080fb9b
 #5 [ef4dbee4] error_code (via page_fault) at c080d809
    EAX: 00000063  EBX: 00000063  ECX: c09e1c8c  EDX: 00000000  EBP: 00000000
    DS:  007b      ESI: c0a09ca0  ES:  007b      EDI: 00000286  GS:  00e0
    CS:  0060      EIP: c068124f  ERR: ffffffff  EFLAGS: 00010096
 #6 [ef4dbf18] sysrq_handle_crash at c068124f
 #7 [ef4dbf24] __handle_sysrq at c0681469
 #8 [ef4dbf48] write_sysrq_trigger at c068150a
 #9 [ef4dbf54] proc_reg_write at c0569ec2
#10 [ef4dbf74] vfs_write at c051de4e
#11 [ef4dbf94] sys_write at c051e8cc
#12 [ef4dbfb0] system_call at c0409ad5
    EAX: ffffffda  EBX: 00000001  ECX: b7776000  EDX: 00000002
    DS:  007b      ESI: 00000002  ES:  007b      EDI: b7776000
    SS:  007b      ESP: bfcb2088  EBP: bfcb20b4  GS:  0033
    CS:  0073      EIP: 00edc416  ERR: 00000004  EFLAGS: 00000246

Type bt <pid> to display the backtrace of a specific process or type help bt for more information on bt usage.

Displaying a process status
  • To display the status of processes in the system, use the ps command.
crash> ps
   PID    PPID  CPU   TASK    ST  %MEM     VSZ    RSS  COMM
>     0      0   0  c09dc560  RU   0.0       0      0  [swapper]
>     0      0   1  f7072030  RU   0.0       0      0  [swapper]
      0      0   2  f70a3a90  RU   0.0       0      0  [swapper]
>     0      0   3  f70ac560  RU   0.0       0      0  [swapper]
      1      0   1  f705ba90  IN   0.0    2828   1424  init
... several lines omitted ...
   5566      1   1  f2592560  IN   0.0   12876    784  auditd
   5567      1   2  ef427560  IN   0.0   12876    784  auditd
   5587   5132   0  f196d030  IN   0.0   11064   3184  sshd
>  5591   5587   2  f196d560  RU   0.0    5084   1648  bash

Use ps <pid> to display the status of a single specific process. Use help ps for more information on ps usage.

Displaying virtual memory information
  • To display basic virtual memory information, type the vm command at the interactive prompt.
crash> vm
PID: 5591   TASK: f196d560  CPU: 2   COMMAND: "bash"
   MM       PGD      RSS    TOTAL_VM
f19b5900  ef9c6000  1648k    5084k
  VMA       START      END    FLAGS  FILE
f1bb0310    242000    260000 8000875  /lib/ld-2.12.so
f26af0b8    260000    261000 8100871  /lib/ld-2.12.so
efbc275c    261000    262000 8100873  /lib/ld-2.12.so
efbc2a18    268000    3ed000 8000075  /lib/libc-2.12.so
efbc23d8    3ed000    3ee000 8000070  /lib/libc-2.12.so
efbc2888    3ee000    3f0000 8100071  /lib/libc-2.12.so
efbc2cd4    3f0000    3f1000 8100073  /lib/libc-2.12.so
efbc243c    3f1000    3f4000 100073
efbc28ec    3f6000    3f9000 8000075  /lib/libdl-2.12.so
efbc2568    3f9000    3fa000 8100071  /lib/libdl-2.12.so
efbc2f2c    3fa000    3fb000 8100073  /lib/libdl-2.12.so
f26af888    7e6000    7fc000 8000075  /lib/libtinfo.so.5.7
f26aff2c    7fc000    7ff000 8100073  /lib/libtinfo.so.5.7
efbc211c    d83000    d8f000 8000075  /lib/libnss_files-2.12.so
efbc2504    d8f000    d90000 8100071  /lib/libnss_files-2.12.so
efbc2950    d90000    d91000 8100073  /lib/libnss_files-2.12.so
f26afe00    edc000    edd000 4040075
f1bb0a18   8047000   8118000 8001875  /bin/bash
f1bb01e4   8118000   811d000 8101873  /bin/bash
f1bb0c70   811d000   8122000 100073
f26afae0   9fd9000   9ffa000 100073
... several lines omitted ...

Use vm <pid> to display information on a single specific process, or use help vm for more information on vm usage.

Displaying open files
  • To display information about open files, use the files command.
crash> files
PID: 5591   TASK: f196d560  CPU: 2   COMMAND: "bash"
ROOT: /    CWD: /root
 FD    FILE     DENTRY    INODE    TYPE  PATH
  0  f734f640  eedc2c6c  eecd6048  CHR   /pts/0
  1  efade5c0  eee14090  f00431d4  REG   /proc/sysrq-trigger
  2  f734f640  eedc2c6c  eecd6048  CHR   /pts/0
 10  f734f640  eedc2c6c  eecd6048  CHR   /pts/0
255  f734f640  eedc2c6c  eecd6048  CHR   /pts/0

Use files <pid> to display files opened by only one selected process, or use help files for more information on files usage.

7.11.4. Using Kernel Oops Analyzer

The Kernel Oops Analyzer is a tool that analyzes the crash dump by comparing the oops messages with known issues in the knowledge base.

Prerequisites

  • Secure an oops message to feed the Kernel Oops Analyzer by following instructions in Red Hat Labs.

Procedure

  1. Follow the Kernel Oops Analyzer link to access the tool.
  2. Browse for the oops message by hitting the Browse button.

    Kernel oops analyzer
  3. Click the DETECT button to compare the oops message based on information from makedumpfile against known solutions.

7.12. Using early kdump to capture boot time crashes

As a system administrator, you can utilize the early kdump support of the kdump service to capture a vmcore file of the crashing kernel during the early stages of the booting process. This section describes what early kdump is, how to configure it, and how to check the status of this mechanism.

7.12.1. What is early kdump

Kernel crashes during the booting phase occur when the kdump service is not yet started, and cannot facilitate capturing and saving the contents of the crashed kernel’s memory. Therefore, the vital information for troubleshooting is lost.

To address this problem, RHEL 8 introduced the early kdump feature as a part of the kdump service.

Additional resources

7.12.2. Enabling early kdump

This section describes how to enable the early kdump feature to eliminate the risk of losing information about the early boot kernel crashes.

Prerequisites

  • An active Red Hat Enterprise Linux subscription
  • A repository containing the kexec-tools package for your system CPU architecture
  • Fulfilled kdump requirements

Procedure

  1. Verify that the kdump service is enabled and active:

    # systemctl is-enabled kdump.service && systemctl is-active kdump.service enabled active

    If kdump is not enabled and running see, Section 7.3.5, “Enabling and disabling the kdump service”.

  2. Rebuild the initramfs image of the booting kernel with the early kdump functionality:

    dracut -f --add earlykdump
  3. Add the rd.earlykdump kernel command line parameter:

    grubby --update-kernel=/boot/vmlinuz-$(uname -r) --args="rd.earlykdump"
  4. Reboot the system to reflect the changes

    reboot
  5. Optionally, verify that rd.earlykdump was successfully added and early kdump feature was enabled:

    # cat /proc/cmdline
    BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-187.el8.x86_64 root=/dev/mapper/rhel-root ro crashkernel=auto resume=/dev/mapper/rhel-swap rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet rd.earlykdump
    
    # journalctl -x | grep early-kdump
    Mar 20 15:44:41 redhat dracut-cmdline[304]: early-kdump is enabled.
    Mar 20 15:44:42 redhat dracut-cmdline[304]: kexec: loaded early-kdump kernel

Additional resources