Why does my Atos/Bull system with large amounts of installed RAM reboot during kdump core collection?

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 6.x or 7.x
  • An Atos/Bull system with large amounts of RAM (tens of gigabytes or more)
  • An active watchdog timer on the system

Issue

Why does my Atos/Bull system with large amounts of installed RAM reboot during kdump core collection?

Resolution

Add the following options to the system's /etc/kdump.conf file to prevent the watchdog timer from triggering a reboot during the kdump operation:

 extra_modules ipmi_watchdog
 blacklist iTCO_wdt

Root Cause

The reboot is likely occurring due to the watchdog timer exceeding the specified limits. The purpose of the watchdog timer is to reboot the system when it becomes unresponsive. In the case of a system with large amounts of RAM that is performing kdump operations, it takes longer to dump the core file than the timer's timeout period, which causes the system to reboot before the core dump is complete.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments