由 smartpqi 控制的 HBA 卡在 SHOST_RECOVERY 状态
Issue
当由 smartpqi 驱动程序控制的主机总线适配器(HBA)进入 SHOST_RECOVERY 状态且无法完成错误恢复时,RHEL 系统挂起或崩溃(如果启用了 hung_task_panic)。
来自 smartpqi 驱动程序的、有关重置 scsi HBA 的消息,如下所示:
[227855.373525] smartpqi 0000:38:00.0: resetting scsi 0:1:0:0
如果启用了 hung task panic,hung task watchdog 会崩溃系统,例如:
[228013.798259] INFO: task xfsaild/dm-0:2122 blocked for more than 120 seconds.
[228013.811168] Not tainted 4.18.0-147.rt24.93.el8.x86_64 #1
[228013.825325] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[228013.839620] xfsaild/dm-0 D 0 2122 2 0x80000000
...
[228013.843201] Kernel panic - not syncing: hung_task: blocked tasks
[228013.843203] CPU: 3 PID: 312 Comm: khungtaskd Kdump: loaded Not tainted 4.18.0-147.rt24.93.el8.x86_64 #1
[228013.843204] Hardware name: HPE ProLiant BL460c Gen10/ProLiant BL460c Gen10, BIOS I41 03/09/2020
[228013.843204] Call Trace:
[228013.843207] dump_stack+0x5c/0x80
[228013.843211] panic+0xe7/0x247
[228013.843214] watchdog+0x234/0x320
[228013.843216] ? hungtask_pm_notify+0x40/0x40
[228013.843218] kthread+0x112/0x130
[228013.843220] ? kthread_flush_work_fn+0x10/0x10
[228013.843222] ret_from_fork+0x35/0x40
Environment
- Red Hat Enterprise Linux 8.1 或更早版本
(出现在 RHEL8-RT 内核 4.18.0-147.rt24.93.el8.x86_64 中) - 由 smartpqi 驱动程序控制的 HBA
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.