HBA controlled by smartpqi stuck in SHOST_RECOVERY
Issue
RHEL system hangs or crashes if hung_task_panic is enabled when the host bus adapter controlled by smartpqi driver enters SHOST_RECOVERY state and cannot complete error recovery.
The message from smartpqi driver about resetting scsi HBA, like:
[227855.373525] smartpqi 0000:38:00.0: resetting scsi 0:1:0:0
may be followed by hung task watchdog crashing the system when hung task panic is enabled, for example:
[228013.798259] INFO: task xfsaild/dm-0:2122 blocked for more than 120 seconds.
[228013.811168] Not tainted 4.18.0-147.rt24.93.el8.x86_64 #1
[228013.825325] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[228013.839620] xfsaild/dm-0 D 0 2122 2 0x80000000
...
[228013.843201] Kernel panic - not syncing: hung_task: blocked tasks
[228013.843203] CPU: 3 PID: 312 Comm: khungtaskd Kdump: loaded Not tainted 4.18.0-147.rt24.93.el8.x86_64 #1
[228013.843204] Hardware name: HPE ProLiant BL460c Gen10/ProLiant BL460c Gen10, BIOS I41 03/09/2020
[228013.843204] Call Trace:
[228013.843207] dump_stack+0x5c/0x80
[228013.843211] panic+0xe7/0x247
[228013.843214] watchdog+0x234/0x320
[228013.843216] ? hungtask_pm_notify+0x40/0x40
[228013.843218] kthread+0x112/0x130
[228013.843220] ? kthread_flush_work_fn+0x10/0x10
[228013.843222] ret_from_fork+0x35/0x40
Environment
- Red Hat Enterprise Linux 8.1 or earlier
(detected with RHEL8-RT kernel 4.18.0-147.rt24.93.el8.x86_64) - HBA controlled by the smartpqi driver
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.