Kernel Panic with Megasas driver

Latest response

I am having an kernel panic with RHEL6 on an HP Proliant DL350 Gen8. I can boot 2.6.32-573.el6.x86_64 with no issues, but I get a kernel panic with 2.6.32-642.15.1.el6.x86_64.

The DL350 has 384G of RAM and four disks in a 1+0 hardware raid. The hardware raid is a SmartArray P420i v3.54. The HP BIOS is P71, 03/01/2013.

When I boot the -642 kernel, I get about a 30-second delay with the cursor blinking in the upper-left corner of the screen. (This delay is less than 15 seconds on the -573 kernel.) After the horizontal bars show up on the bottom of the screen, they ultimately produce a kernel panic.

Call Trace:
<IRQ> [ffffffff8107c6f1>] ? warn_slowpath_common+0x91/0xe0
[ffffffff8017c75a>] ? warn_slowpath_null+0x1a/0x20
[ffffffff8103739c>] ? native_smp_send_reschedule+0x5c/0x60
[ffffffff810685a8>] ? scheduler_tick+0x208/0x260
[ffffffff810b89e0>] ? tick_sched_timer+0x0/0xc0
[ffffffff8108f15e>] ? update_process_times+0x6e/0x90
[ffffffff810b8a46>] ? tick_sched_timer+0x66/0xc0
[ffffffff810ab12e>] ? __run_hrtimer+0x8e/0x1d0
[ffffffff810ab4ce>] ? hrtimer_interrupt+0xee/0x270
[ffffffff8103879d>] ? local_apic_timer_interrupt+0x13/0x70
[ffffffff8100bc13>] ? smp_apic_timer_interrupt+0x45/0x60
<EOI> [ffffffff81548590>] ? panic+0x156/0x179
[ffffffff8154851d>] ? panic+0xe3/0x179
[ffffffff8112b080>] ? perf_event_exit_task+0xc0/0x340
[ffffffff81081f97>] ? do_exit+0x867/0x870
[ffffffff8119b915>] ? fput+0x25/0x30
[ffffffff81081ff8>] ? do_group_exit+0x58/0xd0
[ffffffff81082087>] ? sys_exit_group+0x17/0x20
[ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x16
---[ end trace 823bbcc7f0cc64c9 ]---

Attempting to resolve the issue, I also installed the -642-firmware and headers, as well as perf and python-perf mentioned in RHSA-2017:0307. (My previous kernel has only the kernel rpm, not the firmware or the headers.) The errata indicates to me that the issue may be caused by changes in the MegaSAS driver. The updated initscripts (RHBA-0127:0313) includes the workaround for /etc/init.d/halt.

Update 1
Booting with rdinitdebug, the -573 kernel boots without any issues. The -642 kernel keeps failing at the line

udevadm settle --exit-if-exists=/dev/sdb1

Did the later kernel change how disk partitions are numbered or something? Why would it work on the -573 kernel and not the -642?

End Update 1

Thank you,
David

Responses