Why does dynamically adding memory in vmware guest leads to kdump errors?

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 5
  • Red Hat Enterprise Linux 6
  • VMware as a host platform

Issue

  • When hot-adding memory to a Red Hat Enterprise Linux system running in a vmware environment, the system may attempt to reload the kdump kernel and regenerate a new kdump initrd.

  • After the hot-add, many mkdumprd processes may appear, all attempting to create the initrd file. The messages below may appear in the logs:

Oct 26 06:56:43 xxxx kdump: kexec: unloaded kdump kernel
Oct 26 06:56:43 xxxx kdump: stopped
Oct 26 07:04:56 xxxx kdump: mkdumprd: failed to make kdump initrd
Oct 26 07:06:50 xxxx kdump: failed to start up
  • It is also spawning lots of the following processes:
/bin/bash --norc /sbin/mkdumprd -d -f /boot/initrd-2.6.32-279.5.1.el6.x86_64kdump.img 2.6.32-279.5.1.el6.x86_64

Resolution

Workaround

  • Modify /etc/udev/rules.d/98-kexec.rules so new file looks like:
SUBSYSTEM=="cpu", ACTION=="online", PROGRAM="/etc/init.d/kdump condrestart"
SUBSYSTEM=="cpu", ACTION=="offline", PROGRAM="/etc/init.d/kdump condrestart"
SUBSYSTEM=="memory", ACTION=="add", PROGRAM="/etc/init.d/kdump condrestart"
SUBSYSTEM=="memory", ACTION=="remove", PROGRAM="/etc/init.d/kdump condrestart"

Root Cause

  • The problem is that multiple udev events are being fired for single modules of RAM that added. Each kernel event is intercepted and managed by udev, in the rules /etc/udev/rules.d/98-kexec.rules .
$ cat /etc/udev/rules.d/98-kexec.rules 
SUBSYSTEM=="cpu", ACTION=="online", PROGRAM="/etc/init.d/kdump restart"
SUBSYSTEM=="cpu", ACTION=="offline", PROGRAM="/etc/init.d/kdump restart"
SUBSYSTEM=="memory", ACTION=="add", PROGRAM="/etc/init.d/kdump restart"
SUBSYSTEM=="memory", ACTION=="remove", PROGRAM="/etc/init.d/kdump restart"
  • When memory is added, the systems rules are to restart kdump (service kdump restart). When kdump is restarted and hardware is changed, the initrd used by kdump is regenerated.

  • Kdump is restarted, regardless of if it was configured to to be run. This is unexpected behavior. The parameter "condrestart" was suggested and upstream to only restart the service if the service was set to start.

Diagnostic Steps

  • Messages similar to the below in the system logs:
kexec: unloaded kdump kernel
kdump: stopped
kdump: mkdumprd: failed to make kdump initrd
kdump: failed to start up
  • The system may become sluggish or appear overloaded
  • Many "mkdumprd" processes running
  • System recovers after all mkdumprd processes have terminated.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments