Why is kdump hanging on a RHEL5 system with 1TB of RAM?

Updated -

Issue

  • The kdump facility is hanging when it runs on a Red Hat Enterprise Linux 5 system with 1TB of RAM.

Environment

  • Red Hat Enterprise Linux 5.5 or earlier without package kexec-tools-1.102pre-96.el5_5.2 or later

  • A system with 1TB of RAM installed

Resolution

  • The kexec-tools package needs to be updated to properly handle dumps when using Red Hat Enterprise Linux 5 (RHEL 5) on x86_64 systems with 1TB of memory.
  • The fixed package is kexec-tools-1.102pre-96.el5_5.2, which was released as part of errata http://rhn.redhat.com/errata/RHBA-2010-0438.html.
  • Once the package is updated, include
    crashkernel=512M@32M
    

    on the kernel command line of the boot manager.

  • The fix is discussed in Bugzilla ticket 590547 [kdump fails to save vmcore on machine with 1TB memory].
  • Note that RHEL 6 is not affected by this bug.

Root Cause

RHEL5 has a 40-bit address limit and will wrap if the operating system attempts to exceed this limit. Though a system may physically have 1TB of RAM, the BIOS can reserve some low memory for such things as dmidecode, video memory, PCI card addresses, etc., causing the operating system memory addresses to be offset. This can make a system with 1TB of RAM appear to have 1.1TB or more, depending on the amount of low memory that was reserved. Code was added to the operating system to prevent it from attempting to address memory above the 40-bit limit.

The kexec-tools package, which is used during a kdump process to save a vmcore, was not limited in the same way in RHEL 5.5 and below. This exposes this function to hanging while attempting a dump operation on a 1TB system.

Comments