HBA controlled by smartpqi stuck in SHOST_RECOVERY

Solution Verified - Updated -

Issue

RHEL system hangs or crashes if hung_task_panic is enabled when the host bus adapter controlled by smartpqi driver enters SHOST_RECOVERY state and cannot complete error recovery.

The message from smartpqi driver about resetting scsi HBA, like:

[227855.373525] smartpqi 0000:38:00.0: resetting scsi 0:1:0:0

may be followed by hung task watchdog crashing the system when hung task panic is enabled, for example:

[228013.798259] INFO: task xfsaild/dm-0:2122 blocked for more than 120 seconds.
[228013.811168]       Not tainted 4.18.0-147.rt24.93.el8.x86_64 #1
[228013.825325] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[228013.839620] xfsaild/dm-0    D    0  2122      2 0x80000000
...
[228013.843201] Kernel panic - not syncing: hung_task: blocked tasks
[228013.843203] CPU: 3 PID: 312 Comm: khungtaskd Kdump: loaded Not tainted 4.18.0-147.rt24.93.el8.x86_64 #1
[228013.843204] Hardware name: HPE ProLiant BL460c Gen10/ProLiant BL460c Gen10, BIOS I41 03/09/2020
[228013.843204] Call Trace:
[228013.843207]  dump_stack+0x5c/0x80
[228013.843211]  panic+0xe7/0x247
[228013.843214]  watchdog+0x234/0x320
[228013.843216]  ? hungtask_pm_notify+0x40/0x40
[228013.843218]  kthread+0x112/0x130
[228013.843220]  ? kthread_flush_work_fn+0x10/0x10
[228013.843222]  ret_from_fork+0x35/0x40

Environment

  • Red Hat Enterprise Linux 8.1 or earlier
    (detected with RHEL8-RT kernel 4.18.0-147.rt24.93.el8.x86_64)
  • HBA controlled by the smartpqi driver

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content