Kernel panic observed during module unload with "kernel BUG at kernel/cpu.c:1955!" error message

Solution Verified - Updated -

Issue

  • We are getting panic on kernel-4.18.0-372.36.1.el8_6.x86_64 while unloading the VxFS kernel module with the following messages:
PANIC: "kernel BUG at kernel/cpu.c:1955!"
  • In VxFS during post module load processing we call the cpuhp_setup_satet(). The cpuhp_setup_state() is returning very large value:
vx_cpu_state_key = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
                                             "vxfs:pvec_init",
                                             delayed_pvec_online_call,
                                             delayed_pvec_offline_call);
  • During module unload we pass the returned value to cpuhp_remove_state() but system is getting panic due to BUG_ON() in __cpuhp_remove_state_cpuslocked() with below stack trace:
crash> bt
PID: 26211  TASK: ffff95f77e3dc000  CPU: 7   COMMAND: "rmmod"
 #0 [ffffaa0506447bf8] machine_kexec at ffffffff89867e8e
 #1 [ffffaa0506447c50] __crash_kexec at ffffffff899ae65a
 #2 [ffffaa0506447d10] crash_kexec at ffffffff899af591
 #3 [ffffaa0506447d28] oops_end at ffffffff898274f1
 #4 [ffffaa0506447d48] do_trap at ffffffff898239a7
 #5 [ffffaa0506447d90] do_invalid_op at ffffffff898244b6
 #6 [ffffaa0506447db0] invalid_op at ffffffff8a200d64
    [exception RIP: __cpuhp_remove_state_cpuslocked+246]
    RIP: ffffffff898f3906  RSP: ffffaa0506447e60  RFLAGS: 00010286
    RAX: 00000000ffffffef  RBX: 0000000000000001  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 0000000000000001  RDI: 00000000fffffff0
    RBP: fffffffffffffd80   R8: 00000000000095bb   R9: 00000000000095bb
    R10: 0000000000000005  R11: 000000b6673b7e00  R12: 0000000000000000
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffffaa0506447e80] __cpuhp_remove_state at ffffffff898f393e
 #8 [ffffaa0506447e98] vx_delayed_pvec_deinit_v2 at ffffffffc13fc0cd [vxfs]
 #9 [ffffaa0506447ea8] vx_osdep_deinit at ffffffffc13a71fa [vxfs]
#10 [ffffaa0506447eb8] cleanup_module at ffffffffc1498999 [vxfs]
#11 [ffffaa0506447ee0] __x64_sys_delete_module at ffffffff899a8d2d
#12 [ffffaa0506447f38] do_syscall_64 at ffffffff898043ab
#13 [ffffaa0506447f50] entry_SYSCALL_64_after_hwframe at ffffffff8a2000a9
    RIP: 00007f8dba1b20ab  RSP: 00007fff91f6ea18  RFLAGS: 00000206
    RAX: ffffffffffffffda  RBX: 00005589689637c0  RCX: 00007f8dba1b20ab
    RDX: 000000000000000a  RSI: 0000000000000800  RDI: 0000558968963828
    RBP: 0000000000000000   R8: 00007fff91f6d991   R9: 0000000000000000
    R10: 00007f8dba2e9460  R11: 0000000000000206  R12: 00007fff91f6ec40
    R13: 00007fff91f6ef11  R14: 00005589689632a0  R15: 00005589689637c0
    ORIG_RAX: 00000000000000b0  CS: 0033  SS: 002b
  • Value of vx_cpu_state_key:
crash> rd vx_cpu_state_key
ffffffffc160d560:  00000000fffffff0                    ........

crash> rd -d vx_cpu_state_key
ffffffffc160d560:       4294967280 
  • BUG_ON() hit as cpuhp_cb_check() returning EINVAL.
1616 static int cpuhp_cb_check(enum cpuhp_state state)
1617 {      
1618         if (state <= CPUHP_OFFLINE || state >= CPUHP_ONLINE)
1619                 return -EINVAL;
1620         return 0;
1621 }

Environment

  • Red Hat Enterprise Linux
    • RHEL 8.6 kernel versions from 4.18.0-372.36.1.el8_6 to 4.18.0-372.43.1.el8_6
    • RHEL 8.7 kernel versions from GA up to 4.18.0-425.13.1.el8_7
  • Eventually a vxfs module built against one of the broken kernels above will work correctly with these but will break with fixed kernels as well as with older not yet broken ones
  • Any other 3rd party module using the CPU hotplug kernel features, this issue is not limited just to vxfs

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content