The kernel crashes in pde_subdir_find() due to RCU stall. nvidia driver is install and loaded.

Solution Unverified - Updated -

Issue

  • The kernel crashes in pde_subdir_find() due to RCU stall. nvidia driver is install and loaded.
[  942.185947] Kernel panic - not syncing: RCU Stall

[  942.260382] CPU: 46 PID: 138498 Comm: rmmod Kdump: loaded Tainted: P           OE    --------- -  - 4.18.0-305.25.1.el8_4.x86_64 #1
[  942.402943] Hardware name: HPE ProLiant XL675d Gen10 Plus/ProLiant XL675d Gen10 Plus, BIOS A47 02/23/2021
[  942.518252] Call Trace:
[  942.547614]  <IRQ>
[  942.571741]  dump_stack+0x5c/0x80
[  942.611588]  panic+0xe7/0x2a9
[  942.647247]  rcu_sched_clock_irq.cold.92+0x266/0x3d3
[  942.707006]  ? timekeeping_advance+0x372/0x5a0
[  942.760474]  ? tick_sched_do_timer+0x60/0x60
[  942.811844]  update_process_times+0x24/0x60
[  942.862166]  tick_sched_handle+0x22/0x60
[  942.909341]  tick_sched_timer+0x37/0x70
[  942.955474]  __hrtimer_run_queues+0x100/0x280
[  943.007893]  hrtimer_interrupt+0x100/0x220
[  943.057172]  smp_apic_timer_interrupt+0x6a/0x130
[  943.112735]  apic_timer_interrupt+0xf/0x20
[  943.162009]  </IRQ>
[  943.187182] RIP: 0010:pde_subdir_find+0x2d/0x70
[  943.241696] Code: 00 00 41 55 41 54 55 53 48 8b 9f 80 00 00 00 48 85 db 74 3d 49 89 f5 89 d5 eb 0b 74 37 48 8b 5b 08 48 85 db 74 2b 0f b6 43 22 <39> c5 72 1a 77 ed 4c 8d a3 78 ff ff ff 89 ea 4c 89 ef 4c 89 e6 e8
[  943.468107] RSP: 0018:ffffb60fb469fd58 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
[  943.559307] RAX: 0000000000000004 RBX: ffff8c2c9d3ad308 RCX: 0000000000000000
[  943.645262] RDX: 000000000000000c RSI: ffff8c4d153198ab RDI: ffff8c2c9d3aeb00
[  943.731217] RBP: 000000000000000c R08: ffff8c2cbfed86d8 R09: ffff8c2cbfed8730
[  943.817173] R10: 0000000000000000 R11: ffffffffb985f308 R12: dead000000000200
[  943.903135] R13: ffff8c4d153198ab R14: ffff8c4c2f258140 R15: ffff8c4c2f258140
[  943.989097]  remove_proc_subtree+0x74/0x160
[  944.039494]  nvswitch_procfs_device_remove+0x25/0x60 [nvidia]
[  944.108776]  nvswitch_remove.cold.31+0x149/0x177 [nvidia]
[  944.173774]  pci_device_remove+0x3b/0xc0
[  944.220955]  device_release_driver_internal+0x103/0x1f0
[  944.283854]  driver_detach+0x54/0x88
[  944.326842]  bus_remove_driver+0x77/0xc9
[  944.374022]  pci_unregister_driver+0x2d/0xb0
[  944.425472]  nvswitch_exit+0x2c/0x70 [nvidia]
[  944.477964]  nv_module_exit+0x47/0x60 [nvidia]
[  944.531502]  nvidia_exit_module+0x2b/0x50 [nvidia]
[  944.589165]  __x64_sys_delete_module+0x139/0x280
[  944.644733]  do_syscall_64+0x5b/0x1a0
[  944.688766]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[  944.749572] RIP: 0033:0x154fe2d9687b
[  944.792558] Code: 73 01 c3 48 8b 0d 0d f6 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd f5 2b 00 f7 d8 64 89 01 48
[  945.018969] RSP: 002b:00007ffde3c4f1f8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[  945.110170] RAX: ffffffffffffffda RBX: 00005562134ce7c0 RCX: 0000154fe2d9687b
[  945.196129] RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005562134ce828
[  945.282085] RBP: 0000000000000000 R08: 00007ffde3c4e171 R09: 0000000000000000
[  945.368040] R10: 0000154fe2e079a0 R11: 0000000000000206 R12: 00007ffde3c4f420
[  945.454000] R13: 00007ffde3c50a53 R14: 00005562134ce2a0 R15: 00005562134ce7c0

Environment

  • Red Hat Enterprise Linux 8.4.z - kernel-4.18.0-305.25.1.el8_4
  • Nvidia driver

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content