[RHEL 4.6] The system does not halt after the system is gathering diskdump

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 4 update 6

  • kernel versions: 2.6.9-67.EL and 2.6.9-89.0.26.EL

  • diskdumputils-1.4.1-2

Issue

If the watchdog timer is set by the ipmi_watchdog module, although kernel.panic=0 is specified in /etc/sysctl.conf to make the system
halt after diskdump completes, the system does not shutdown. Instead, it will be rebooted 255 seconds later.

Resolution

diskdump is working as designed (ie. NOTABUG). impi_watchdog is also working as designed. Therefore, this issue does not require a fix.

Root Cause

In diskdump.c, start_disk_dump() calls notifier_call_chain(&panic_notifier_list) after it's finished dumping but before it gets to it's halt loop:

static void start_disk_dump(struct pt_regs *regs)
{

        ... [ snip ] ...

        platform_start_crashdump(diskdump_stack, disk_dump, regs);

        ... [ snip ] ...

        notifier_call_chain(&panic_notifier_list, 0, NULL);

        ... [ snip ] ...

        for (;;) {
                touch_nmi_watchdog();
                machine_halt();
                diskdump_mdelay(1000);
        }
}

In ipmi_watchdog.c, wdog_panic_handler() checks that the watchdog_user has been set, which is done when the watchdog is registered, and that the panic hasn't been handled already, and then will proced to reset the timeout to 255 seconds and reboot thereafter:

static int wdog_panic_handler(struct notifier_block *this,
                              unsigned long         event,
                              void                  *unused)
{
        static int panic_event_handled = 0;

        /* On a panic, if we have a panic timeout, make sure that the thing
           reboots, even if it hangs during that panic. */
        if (watchdog_user && !panic_event_handled) {
                /* Make sure the panic doesn't hang, and make sure we
                   do this only once. */
                panic_event_handled = 1;

                timeout = 255;
                pretimeout = 0;
                ipmi_watchdog_state = WDOG_TIMEOUT_RESET;
                panic_halt_ipmi_set_timeout();
        }

        return NOTIFY_OK;
}

Diagnostic Steps

Steps to reproduce:

  1. Set up diskdump.
  2. Set kernel.panic to 0 in /etc/sysctl.conf.
  3. Execute set.sh in the uncompressed "panic" directory to build a kernel module, panic.ko which causes a
    panic.
  4. Reboot the system.
  5. Execute the following commands:
    # /sbin/modprobe ipmi_si type=kcs ports=0xd80
    trydefaults=0
    # /sbin/modprobe ipmi_devintf
    # /sbin/modprobe ipmi_watchdog timeout=1800 start_now=1
    
  6. Execute the following commands(please use panic.tar.gz)
    # insmod panic.ko.
    
  7. Wait for a while (about 5 minutes or so) after diskdump finishes.

Attachments

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments