KVM VMs crash frequently due to either of "kernel BUG at kernel/sched/rt.c:1798!" or "kernel BUG at kernel/sched/rt.c:1794!" where they are running with "passthrough" cache mode

Solution Verified - Updated -

Issue

  • KVM VMs crash frequently due to either of "kernel BUG at kernel/sched/rt.c:1798!" or "kernel BUG at kernel/sched/rt.c:1794!" where they are running with "passthrough" cache mode.
  <cpu mode='host-passthrough' check='none' migratable='on'>
    <topology sockets='1' dies='1' cores='19' threads='2'/>
    <cache mode='passthrough'/>
  </cpu>
  • The panic messages from the crashing VMs:
[13235.101159] kernel BUG at kernel/sched/rt.c:1798!
[13235.101782] invalid opcode: 0000 [#1] SMP NOPTI
[13235.102356] CPU: 24 PID: 94639 Comm: appLoader Kdump: loaded Tainted: G        W        --------- -  - 4.18.0-477.43.1.el8_8.x86_64 #1
[13235.103823] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-3.module+el8.8.0+16781+9f4724c2 04/01/2014
[13235.104972] RIP: 0010:pick_next_pushable_task+0x61/0x70
[13235.105561] Code: 88 09 00 00 74 26 83 ba b0 fb ff ff 01 7e 1f 83 ba 38 f8 ff ff 01 75 18 83 ba 3c f8 ff ff 63 7e 04 0f 0b 31 c0 e9 7f 03 cc 00 <0f> 0b 0f 0b 0f 0b 0f 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 41 56
[13235.107613] RSP: 0018:ffffb764ad08fbd8 EFLAGS: 00010016
[13235.108210] RAX: ffff9409277d0000 RBX: ffff9412efa32fc0 RCX: 000000000000000e
[13235.109029] RDX: ffff9409277d0828 RSI: 0000000000000000 RDI: ffff9412efa32fc0
[13235.109812] RBP: ffff9412efa32fc0 R08: ffff9412ef7f3878 R09: ffff93ffc67e11b8
[13235.110680] R10: ffffb764b15ffed0 R11: 0000000000000000 R12: 0000000000000000
[13235.111496] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[13235.112308] FS:  00007f9375ffb700(0000) GS:ffff9412efa00000(0000) knlGS:0000000000000000
[13235.113341] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13235.114122] CR2: 00007f416c028e98 CR3: 0000000a0afe0002 CR4: 0000000000770ee0
[13235.114953] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[13235.115624] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[13235.116303] PKRU: 55555554
[13235.116582] Call Trace:
[13235.116847]  push_rt_task.part.56+0x1c/0x200
[13235.117287]  push_rt_tasks.part.57+0x1d/0x30
[13235.117797]  finish_task_switch+0x114/0x2e0
[13235.118183]  __schedule+0x2d9/0x870
[13235.118640]  ? hrtimer_start_range_ns+0x11b/0x310
[13235.119166]  schedule+0x55/0xf0
[13235.119569]  futex_wait_queue_me+0xa3/0x100
[13235.120101]  futex_wait+0x11f/0x210
[13235.120558]  ? hrtimer_init_sleeper+0x90/0x90
[13235.121102]  do_futex+0x143/0x4e0
[13235.121502]  ? __x64_sys_futex+0x14e/0x200
[13235.122048]  __x64_sys_futex+0x14e/0x200
[13235.122506]  do_syscall_64+0x5b/0x1b0
[13235.123022]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[13235.123631] RIP: 0033:0x7f93cc2777aa
[13235.124072] Code: 67 e8 9a 2f 00 00 89 ee 41 b9 ff ff ff ff 45 31 c0 89 44 24 40 81 f6 89 01 00 00 4d 89 f2 31 d2 4c 89 ef b8 ca 00 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 42 01 00 00 8b 7c 24 40 e8 c1 2f 00 00 48
[13235.126351] RSP: 002b:00007f9375ff84a0 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
[13235.127252] RAX: ffffffffffffffda RBX: 00007f93880032f0 RCX: 00007f93cc2777aa
[13235.128131] RDX: 0000000000000000 RSI: 0000000000000189 RDI: 00007f9388003318
[13235.129102] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffffffff
[13235.130040] R10: 00007f9375ff8590 R11: 0000000000000246 R12: 00007f93880032c8
[13235.130825] R13: 00007f9388003318 R14: 00007f9375ff8590 R15: 0000000000000000
[13235.131681] Modules linked in: ...
  • Below is a strange mixture of multiple panic messages, "BUG: unable to handle kernel paging request at 0000000080000000" and "kernel BUG at kernel/sched/rt.c:1794!" observed in another crash incident:
[10784.498949] BUG: unable to handle kernel paging request at 0000000080000000
[10784.499179] ------------[ cut here ]------------
[10784.500642] PGD 844eca067 
[10784.501214] kernel BUG at kernel/sched/rt.c:1794!
[10784.501215] P4D 844eca067 PUD 0 
[10784.502414] Oops: 0010 [#1] SMP NOPTI
[10784.502816] CPU: 4 PID: 36088 Comm: appLoader Kdump: loaded Not tainted 4.18.0-477.10.1.el8_8.x86_64 #1
[10784.503833] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-3.module+el8.8.0+16781+9f4724c2 04/01/2014
[10784.504846] RIP: 0010:0x80000000
[10784.505222] Code: Unable to access opcode bytes at RIP 0x7fffffd6.
[10784.505898] RSP: 0018:ffffaecc49c67bf0 EFLAGS: 00057486
[10784.506493] RAX: ffff9c58868b4000 RBX: ffff9c5b1855f6b0 RCX: 0000000000000000
[10784.507303] RDX: 0000000000000000 RSI: ffff9c58868c3a00 RDI: 0000000000000000
[10784.508114] RBP: 0000000080000000 R08: 0000000000002ecc R09: 0000000000002ecc
[10784.508886] R10: ffffaecc6f807e40 R11: 0000000000000001 R12: 0000000000000000
[10784.509705] R13: 0000000000000000 R14: fffffc838442d600 R15: 0000000000000001
[10784.510533] FS:  00007fa6e7fff700(0000) GS:ffff9c6baf500000(0000) knlGS:0000000000000000
[10784.511464] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10784.512114] CR2: 000000007fffffd6 CR3: 00000009f5b68002 CR4: 0000000000770ee0
[10784.512929] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[10784.513700] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[10784.514393] PKRU: 55555554
[10784.514638] Call Trace:
[10784.514898]  ? memcg_slab_free_hook+0x141/0x1b0
[10784.515361]  ? unix_stream_read_generic+0x83f/0x8b0
[10784.515763]  ? kmem_cache_free+0x2d6/0x300
[10784.516123]  ? unix_stream_read_generic+0x409/0x8b0
[10784.516530]  ? finish_wait+0x80/0x80
[10784.516828]  ? unix_stream_recvmsg+0x53/0x80
[10784.517185]  ? unix_set_peek_off+0x50/0x50
[10784.517521]  ? update_curr_rt+0x210/0x3c0
[10784.517861]  ? plist_add+0xc1/0x100
[10784.518161]  ? plist_del+0x5f/0xc0
[10784.518442]  ? __schedule+0x2d1/0x870
[10784.518744]  ? syscall_trace_enter+0x1ff/0x2d0
[10784.519114]  ? schedule+0x55/0xf0
[10784.519389]  ? exit_to_usermode_loop+0x5c/0x100
[10784.519759]  ? do_syscall_64+0x19c/0x1b0
[10784.520082]  ? entry_SYSCALL_64_after_hwframe+0x61/0xc6
[10784.520513] Modules linked in: ...
[10784.526217] CR2: 0000000080000000

Environment

  • Red Hat Enterprise Linux (RHEL) 8, 9 and 10
  • kernels running on hypervisors and guests:
    • rhel8: versions before 4.18.0-553.54.1.el8_10 are affected
    • rhel10: versions before 6.12.0-55.18.1.el10_0 are affected
    • issue observed on 8.8.z kernel version 4.18.0-477.10.1.el8_8
    • issue observed on 8.8.z kernel version 4.18.0-477.43.1.el8_8
    • issue observed on 8.8.z kernel version 4.18.0-553.44.1.el8_10

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content