Bare-metal nodes in the RHOCP cluster start to hang after upgrading from v4.19.14 to v4.19.18
Issue
- Bare-metal nodes in the RHOCP cluster start to hang frequently after upgrading from 4.19.14 to 4.19.18.
...
#0 [ff7d29bf8e7eb900] __schedule at ffffffff96d328d9
#1 [ff7d29bf8e7eb968] schedule at ffffffff96d32b8e
#2 [ff7d29bf8e7eb980] schedule_preempt_disabled at ffffffff96d32e21
#3 [ff7d29bf8e7eb988] rwsem_down_write_slowpath at ffffffff96d3602d
#4 [ff7d29bf8e7eba20] down_write at ffffffff96d36358
#5 [ff7d29bf8e7eba30] prealloc_memcg_shrinker at ffffffff9638010a
#6 [ff7d29bf8e7eba70] __prealloc_shrinker at ffffffff96380448
#7 [ff7d29bf8e7eba80] prealloc_shrinker at ffffffff96380b3e
#8 [ff7d29bf8e7eba90] alloc_super at ffffffff9646abf9
#9 [ff7d29bf8e7ebac0] sget_fc at ffffffff9646bcb7
#10 [ff7d29bf8e7ebaf8] get_tree_nodev at ffffffff9646c6a3
#11 [ff7d29bf8e7ebb28] vfs_get_tree at ffffffff9646a052
#12 [ff7d29bf8e7ebb48] do_new_mount at ffffffff9649863a
#13 [ff7d29bf8e7ebb98] __x64_sys_mount at ffffffff96499aa7
#14 [ff7d29bf8e7ebbe0] do_syscall_64 at ffffffff96d228dc
#15 [ff7d29bf8e7ebf50] entry_SYSCALL_64_after_hwframe at ffffffff96e00130
RIP: 000055b32c21348e RSP: 000000c0002bb410 RFLAGS: 00000216
RAX: ffffffffffffffda RBX: 000000c0003b5216 RCX: 000055b32c21348e
RDX: 000000c0003b5220 RSI: 000000c0003f25a0 RDI: 000000c0003b5216
RBP: 000000c0002bb450 R8: 000000c00036bf40 R9: 0000000000000000
R10: 0000000000000001 R11: 0000000000000216 R12: 000000c0003f25a0
R13: 00000000000000aa R14: 000000c000002380 R15: 0000000000000000
ORIG_RAX: 00000000000000a5 CS: 0033 SS: 002b
Environment
- Red Hat OpenShift Container Platform v4.19.18 and newer
- Bare-metal RHCOS nodes - kernel-5.14.0-570.64.1.el9_6.x86_64 - with 128 CPUs
- The issue starts occurring after upgrading from RHOCP v4.19.14 to v4.19.18–v4.19.19.
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.