System hang on stalled I/O with bad vring_virtqueue length

Solution Unverified - Updated -

Issue

  • System hang with stalled virtio_scsi I/O with bad vring_virtqueue length:
INFO: task auditd:1419 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
auditd          D ffff9f81cce026e0     0  1419      1 0x00000000
Call Trace:
 [<ffffffff98f8a700>] ? bit_wait+0x50/0x50
 [<ffffffff98f8c3f9>] schedule+0x29/0x70
 [<ffffffff98f8a0c1>] schedule_timeout+0x221/0x2d0
 [<ffffffff98b624ac>] ? blk_mq_flush_plug_list+0x19c/0x200
 [<ffffffff9886d39e>] ? kvm_clock_get_cycles+0x1e/0x20
 [<ffffffff98f8a700>] ? bit_wait+0x50/0x50
 [<ffffffff98f8bcad>] io_schedule_timeout+0xad/0x130
 [<ffffffff98f8bd48>] io_schedule+0x18/0x20
 [<ffffffff98f8a711>] bit_wait_io+0x11/0x50
 [<ffffffff98f8a237>] __wait_on_bit+0x67/0x90
 [<ffffffff989bd411>] wait_on_page_bit+0x81/0xa0
 [<ffffffff988c7140>] ? wake_bit_function+0x40/0x40
 [<ffffffff989bd541>] __filemap_fdatawait_range+0x111/0x190
 [<ffffffff989cb9f1>] ? do_writepages+0x21/0x50
 [<ffffffff989bd5d4>] filemap_fdatawait_range+0x14/0x30
 [<ffffffff989bffd6>] filemap_write_and_wait_range+0x56/0x90
 [<ffffffffc04f59fa>] ext4_sync_file+0xba/0x320 [ext4]
 [<ffffffff98a8409f>] generic_write_sync+0x4f/0x70
 [<ffffffff989c0bd7>] generic_file_aio_write+0x77/0xa0
 [<ffffffffc04f55c8>] ext4_file_write+0x348/0x600 [ext4]
 [<ffffffff98b911d4>] ? timerqueue_del+0x24/0x70
 [<ffffffff98a4da23>] do_sync_write+0x93/0xe0
 [<ffffffff98a4e4b0>] vfs_write+0xc0/0x1f0
 [<ffffffff98a4f235>] SyS_write+0x55/0xd0
 [<ffffffff98f99f92>] system_call_fastpath+0x25/0x2a
F 4319475.332/230124170529 oracle_58007_de[58007] oracleafd:06:0837:Instance is fenced: [13] [6] 
F 4319478.320/230124170532 oracle_58018_de[58018] oracleafd:06:0837:Instance is fenced: [13] [6] 
F 4319531.574/230124170626 oracle_49225_de[49225] oracleafd:06:0837:Instance is fenced: [13] [6] 
F 4319531.730/230124170626 oracle_49227_de[49227] oracleafd:06:0837:Instance is fenced: [13] [6] 
F 4319531.870/230124170626 oracle_49229_de[49229] oracleafd:06:0837:Instance is fenced: [13] [6] 
F 4319532.042/230124170626 oracle_49231_de[49231] oracleafd:06:0837:Instance is fenced: [13] [6] 
SysRq : Trigger a crash
BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff98c75396>] sysrq_handle_crash+0x16/0x20
PGD 0 
Oops: 0002 [#1] SMP 
....
CPU: 0 PID: 19080 Comm: cssdmonitor Kdump: loaded Tainted: P           OE  ------------ T 3.10.0-1160.76.1.el7.x86_64 #1
Hardware name: Nutanix AHV, BIOS nutanix-ahv-2.20220304.0.2429.el7 04/01/2014
task: ffff9f7e581ae300 ti: ffff9f7e3a498000 task.ti: ffff9f7e3a498000
RIP: 0010:[<ffffffff98c75396>]  [<ffffffff98c75396>] sysrq_handle_crash+0x16/0x20
....
Call Trace:
 [<ffffffff98c75bbd>] __handle_sysrq+0x10d/0x170
 [<ffffffff98c76028>] write_sysrq_trigger+0x28/0x40
 [<ffffffff98ac7560>] proc_reg_write+0x40/0x80
 [<ffffffff98a4e4b0>] vfs_write+0xc0/0x1f0
 [<ffffffff98a4f235>] SyS_write+0x55/0xd0
 [<ffffffff98f99f92>] system_call_fastpath+0x25/0x2a
Code: eb 9b 45 01 f4 45 39 65 34 75 e5 4c 89 ef e8 e2 f7 ff ff eb db 0f 1f 44 00 00 55 48 89 e5 c7 05 41 2b 7d 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 0f 1f 44 00 00 55 31 c0 c7 05 be 
RIP  [<ffffffff98c75396>] sysrq_handle_crash+0x16/0x20
....

Environment

  • Red Hat Enterprise Linux 7

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content