System hang on stalled I/O with bad vring_virtqueue length
Issue
- System hang with stalled
virtio_scsiI/O with badvring_virtqueuelength:
INFO: task auditd:1419 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
auditd D ffff9f81cce026e0 0 1419 1 0x00000000
Call Trace:
[<ffffffff98f8a700>] ? bit_wait+0x50/0x50
[<ffffffff98f8c3f9>] schedule+0x29/0x70
[<ffffffff98f8a0c1>] schedule_timeout+0x221/0x2d0
[<ffffffff98b624ac>] ? blk_mq_flush_plug_list+0x19c/0x200
[<ffffffff9886d39e>] ? kvm_clock_get_cycles+0x1e/0x20
[<ffffffff98f8a700>] ? bit_wait+0x50/0x50
[<ffffffff98f8bcad>] io_schedule_timeout+0xad/0x130
[<ffffffff98f8bd48>] io_schedule+0x18/0x20
[<ffffffff98f8a711>] bit_wait_io+0x11/0x50
[<ffffffff98f8a237>] __wait_on_bit+0x67/0x90
[<ffffffff989bd411>] wait_on_page_bit+0x81/0xa0
[<ffffffff988c7140>] ? wake_bit_function+0x40/0x40
[<ffffffff989bd541>] __filemap_fdatawait_range+0x111/0x190
[<ffffffff989cb9f1>] ? do_writepages+0x21/0x50
[<ffffffff989bd5d4>] filemap_fdatawait_range+0x14/0x30
[<ffffffff989bffd6>] filemap_write_and_wait_range+0x56/0x90
[<ffffffffc04f59fa>] ext4_sync_file+0xba/0x320 [ext4]
[<ffffffff98a8409f>] generic_write_sync+0x4f/0x70
[<ffffffff989c0bd7>] generic_file_aio_write+0x77/0xa0
[<ffffffffc04f55c8>] ext4_file_write+0x348/0x600 [ext4]
[<ffffffff98b911d4>] ? timerqueue_del+0x24/0x70
[<ffffffff98a4da23>] do_sync_write+0x93/0xe0
[<ffffffff98a4e4b0>] vfs_write+0xc0/0x1f0
[<ffffffff98a4f235>] SyS_write+0x55/0xd0
[<ffffffff98f99f92>] system_call_fastpath+0x25/0x2a
F 4319475.332/230124170529 oracle_58007_de[58007] oracleafd:06:0837:Instance is fenced: [13] [6]
F 4319478.320/230124170532 oracle_58018_de[58018] oracleafd:06:0837:Instance is fenced: [13] [6]
F 4319531.574/230124170626 oracle_49225_de[49225] oracleafd:06:0837:Instance is fenced: [13] [6]
F 4319531.730/230124170626 oracle_49227_de[49227] oracleafd:06:0837:Instance is fenced: [13] [6]
F 4319531.870/230124170626 oracle_49229_de[49229] oracleafd:06:0837:Instance is fenced: [13] [6]
F 4319532.042/230124170626 oracle_49231_de[49231] oracleafd:06:0837:Instance is fenced: [13] [6]
SysRq : Trigger a crash
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff98c75396>] sysrq_handle_crash+0x16/0x20
PGD 0
Oops: 0002 [#1] SMP
....
CPU: 0 PID: 19080 Comm: cssdmonitor Kdump: loaded Tainted: P OE ------------ T 3.10.0-1160.76.1.el7.x86_64 #1
Hardware name: Nutanix AHV, BIOS nutanix-ahv-2.20220304.0.2429.el7 04/01/2014
task: ffff9f7e581ae300 ti: ffff9f7e3a498000 task.ti: ffff9f7e3a498000
RIP: 0010:[<ffffffff98c75396>] [<ffffffff98c75396>] sysrq_handle_crash+0x16/0x20
....
Call Trace:
[<ffffffff98c75bbd>] __handle_sysrq+0x10d/0x170
[<ffffffff98c76028>] write_sysrq_trigger+0x28/0x40
[<ffffffff98ac7560>] proc_reg_write+0x40/0x80
[<ffffffff98a4e4b0>] vfs_write+0xc0/0x1f0
[<ffffffff98a4f235>] SyS_write+0x55/0xd0
[<ffffffff98f99f92>] system_call_fastpath+0x25/0x2a
Code: eb 9b 45 01 f4 45 39 65 34 75 e5 4c 89 ef e8 e2 f7 ff ff eb db 0f 1f 44 00 00 55 48 89 e5 c7 05 41 2b 7d 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 0f 1f 44 00 00 55 31 c0 c7 05 be
RIP [<ffffffff98c75396>] sysrq_handle_crash+0x16/0x20
....
Environment
- Red Hat Enterprise Linux 7
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.