KVM VMs crash frequently due to either of "kernel BUG at kernel/sched/rt.c:1798!" or "kernel BUG at kernel/sched/rt.c:1794!" where they are running with "passthrough" cache mode
Issue
- KVM VMs crash frequently due to either of "kernel BUG at kernel/sched/rt.c:1798!" or "kernel BUG at kernel/sched/rt.c:1794!" where they are running with "passthrough" cache mode.
<cpu mode='host-passthrough' check='none' migratable='on'>
<topology sockets='1' dies='1' cores='19' threads='2'/>
<cache mode='passthrough'/>
</cpu>
- The panic messages from the crashing VMs:
[13235.101159] kernel BUG at kernel/sched/rt.c:1798!
[13235.101782] invalid opcode: 0000 [#1] SMP NOPTI
[13235.102356] CPU: 24 PID: 94639 Comm: appLoader Kdump: loaded Tainted: G W --------- - - 4.18.0-477.43.1.el8_8.x86_64 #1
[13235.103823] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-3.module+el8.8.0+16781+9f4724c2 04/01/2014
[13235.104972] RIP: 0010:pick_next_pushable_task+0x61/0x70
[13235.105561] Code: 88 09 00 00 74 26 83 ba b0 fb ff ff 01 7e 1f 83 ba 38 f8 ff ff 01 75 18 83 ba 3c f8 ff ff 63 7e 04 0f 0b 31 c0 e9 7f 03 cc 00 <0f> 0b 0f 0b 0f 0b 0f 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 41 56
[13235.107613] RSP: 0018:ffffb764ad08fbd8 EFLAGS: 00010016
[13235.108210] RAX: ffff9409277d0000 RBX: ffff9412efa32fc0 RCX: 000000000000000e
[13235.109029] RDX: ffff9409277d0828 RSI: 0000000000000000 RDI: ffff9412efa32fc0
[13235.109812] RBP: ffff9412efa32fc0 R08: ffff9412ef7f3878 R09: ffff93ffc67e11b8
[13235.110680] R10: ffffb764b15ffed0 R11: 0000000000000000 R12: 0000000000000000
[13235.111496] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[13235.112308] FS: 00007f9375ffb700(0000) GS:ffff9412efa00000(0000) knlGS:0000000000000000
[13235.113341] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13235.114122] CR2: 00007f416c028e98 CR3: 0000000a0afe0002 CR4: 0000000000770ee0
[13235.114953] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[13235.115624] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[13235.116303] PKRU: 55555554
[13235.116582] Call Trace:
[13235.116847] push_rt_task.part.56+0x1c/0x200
[13235.117287] push_rt_tasks.part.57+0x1d/0x30
[13235.117797] finish_task_switch+0x114/0x2e0
[13235.118183] __schedule+0x2d9/0x870
[13235.118640] ? hrtimer_start_range_ns+0x11b/0x310
[13235.119166] schedule+0x55/0xf0
[13235.119569] futex_wait_queue_me+0xa3/0x100
[13235.120101] futex_wait+0x11f/0x210
[13235.120558] ? hrtimer_init_sleeper+0x90/0x90
[13235.121102] do_futex+0x143/0x4e0
[13235.121502] ? __x64_sys_futex+0x14e/0x200
[13235.122048] __x64_sys_futex+0x14e/0x200
[13235.122506] do_syscall_64+0x5b/0x1b0
[13235.123022] entry_SYSCALL_64_after_hwframe+0x61/0xc6
[13235.123631] RIP: 0033:0x7f93cc2777aa
[13235.124072] Code: 67 e8 9a 2f 00 00 89 ee 41 b9 ff ff ff ff 45 31 c0 89 44 24 40 81 f6 89 01 00 00 4d 89 f2 31 d2 4c 89 ef b8 ca 00 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 42 01 00 00 8b 7c 24 40 e8 c1 2f 00 00 48
[13235.126351] RSP: 002b:00007f9375ff84a0 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
[13235.127252] RAX: ffffffffffffffda RBX: 00007f93880032f0 RCX: 00007f93cc2777aa
[13235.128131] RDX: 0000000000000000 RSI: 0000000000000189 RDI: 00007f9388003318
[13235.129102] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffffffff
[13235.130040] R10: 00007f9375ff8590 R11: 0000000000000246 R12: 00007f93880032c8
[13235.130825] R13: 00007f9388003318 R14: 00007f9375ff8590 R15: 0000000000000000
[13235.131681] Modules linked in: ...
- Below is a strange mixture of multiple panic messages, "BUG: unable to handle kernel paging request at 0000000080000000" and "kernel BUG at kernel/sched/rt.c:1794!" observed in another crash incident:
[10784.498949] BUG: unable to handle kernel paging request at 0000000080000000
[10784.499179] ------------[ cut here ]------------
[10784.500642] PGD 844eca067
[10784.501214] kernel BUG at kernel/sched/rt.c:1794!
[10784.501215] P4D 844eca067 PUD 0
[10784.502414] Oops: 0010 [#1] SMP NOPTI
[10784.502816] CPU: 4 PID: 36088 Comm: appLoader Kdump: loaded Not tainted 4.18.0-477.10.1.el8_8.x86_64 #1
[10784.503833] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-3.module+el8.8.0+16781+9f4724c2 04/01/2014
[10784.504846] RIP: 0010:0x80000000
[10784.505222] Code: Unable to access opcode bytes at RIP 0x7fffffd6.
[10784.505898] RSP: 0018:ffffaecc49c67bf0 EFLAGS: 00057486
[10784.506493] RAX: ffff9c58868b4000 RBX: ffff9c5b1855f6b0 RCX: 0000000000000000
[10784.507303] RDX: 0000000000000000 RSI: ffff9c58868c3a00 RDI: 0000000000000000
[10784.508114] RBP: 0000000080000000 R08: 0000000000002ecc R09: 0000000000002ecc
[10784.508886] R10: ffffaecc6f807e40 R11: 0000000000000001 R12: 0000000000000000
[10784.509705] R13: 0000000000000000 R14: fffffc838442d600 R15: 0000000000000001
[10784.510533] FS: 00007fa6e7fff700(0000) GS:ffff9c6baf500000(0000) knlGS:0000000000000000
[10784.511464] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10784.512114] CR2: 000000007fffffd6 CR3: 00000009f5b68002 CR4: 0000000000770ee0
[10784.512929] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[10784.513700] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[10784.514393] PKRU: 55555554
[10784.514638] Call Trace:
[10784.514898] ? memcg_slab_free_hook+0x141/0x1b0
[10784.515361] ? unix_stream_read_generic+0x83f/0x8b0
[10784.515763] ? kmem_cache_free+0x2d6/0x300
[10784.516123] ? unix_stream_read_generic+0x409/0x8b0
[10784.516530] ? finish_wait+0x80/0x80
[10784.516828] ? unix_stream_recvmsg+0x53/0x80
[10784.517185] ? unix_set_peek_off+0x50/0x50
[10784.517521] ? update_curr_rt+0x210/0x3c0
[10784.517861] ? plist_add+0xc1/0x100
[10784.518161] ? plist_del+0x5f/0xc0
[10784.518442] ? __schedule+0x2d1/0x870
[10784.518744] ? syscall_trace_enter+0x1ff/0x2d0
[10784.519114] ? schedule+0x55/0xf0
[10784.519389] ? exit_to_usermode_loop+0x5c/0x100
[10784.519759] ? do_syscall_64+0x19c/0x1b0
[10784.520082] ? entry_SYSCALL_64_after_hwframe+0x61/0xc6
[10784.520513] Modules linked in: ...
[10784.526217] CR2: 0000000080000000
Environment
- Red Hat Enterprise Linux (RHEL) 8, 9 and 10
- kernels running on hypervisors and guests:
- rhel8: versions before
4.18.0-553.54.1.el8_10
are affected - rhel10: versions before
6.12.0-55.18.1.el10_0
are affected - issue observed on 8.8.z kernel version
4.18.0-477.10.1.el8_8
- issue observed on 8.8.z kernel version
4.18.0-477.43.1.el8_8
- issue observed on 8.8.z kernel version
4.18.0-553.44.1.el8_10
- rhel8: versions before
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.