kernel BUG at lib/list_debug.c:51!
Description: Kernel crashes under under heavy load from application. Once application pods are deployed and begin I/O on CephFS-backed PVCs the kernel Ceph client is stressed and hits this issue on the current kernel version
kernel crash with the following log:
[22804.148350] list_del corruption. prev->next should be ffff8975aa023438, but was ffff896fce4cefe8
[22804.151071] ------------[ cut here ]------------
[22804.151075] kernel BUG at lib/list_debug.c:51!
[22804.153355] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[22804.155567] CPU: 6 PID: 37577 Comm: 10_dirty_io_sch Kdump: loaded Not tainted 5.14.0-284.59.1.el9_2.x86_64 #1
[22804.157309] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.21100432.B64.2301110304 01/11/2023
[22804.159243] RIP: 0010:__list_del_entry_valid.cold+0x31/0x47
Kernel version: 5.14.0-284.59.1.el9_2.x86_64
Additional Info:
According to Red Hat Solution 7050813, this issue is currently unresolved.
[+] https://access.redhat.com/solutions/7050813
Interestingly, the Red Hat article mentions this affecting kernel-5.14.0-362.13.1.el9_3.x86_64, while I am running an earlier version (5.14.0-284.59.1.el9_2.x86_64).
Question:
Would changing the kernel version help mitigate or resolve this issue?
Responses