System hangs with rcu_sched CPU stall, hung task, and soft lockup after updating the kernel to version 4.18.0-425.10.1.el8_7 or higher.

Solution Verified - Updated -

Issue

  • System hangs with rcu_sched CPU stall, hung task, and soft lockup after updating the kernel from version 4.18.0-425.3.1.el8 to 4.18.0-425.10.1.el8_7 or higher
  • RHEL8.7 system with the kernel version 4.18.0-425.10.1.el8_7 or higher hangs with the following call traces as soon as the drbd driver is loaded.
[   34.033101] drbd: loading out-of-tree module taints kernel.
[   34.072966] drbd: initialized. Version: 9.1.12 (api:2/proto:86-121)
[   34.072971] drbd: GIT-hash: bfc1691453a36de688e013239ce5c586263933fa build by @buildsystem, 2022-11-14 13:30:59
[   34.072975] drbd: registered as block device major 147
[   34.690710] drbd was02d: Starting worker thread (from drbdsetup [4786])
[   34.726826] drbd msg01tqm01: Starting worker thread (from drbdsetup [4794])
[   94.691893] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[   94.691906]  (detected by 3, t=60002 jiffies, g=20409, q=2556)
[   94.691911] rcu: All QSes seen, last rcu_sched kthread activity 59998 (4294762429-4294702431), jiffies_till_next_fqs=3, root ->qsmask 0x0
[   94.691918] rcu: rcu_sched kthread timer wakeup didn't happen for 59995 jiffies! g20409 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200
[   94.691925] rcu:     Possible timer handling issue on cpu=0 timer-softirq=1664
[   94.691929] rcu: rcu_sched kthread starved for 59998 jiffies! g20409 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200 ->cpu=0
[   94.691936] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[   94.691939] rcu: RCU grace-period kthread stack dump:
[   94.691942] task:rcu_sched       state:R stack:    0 pid:   13 ppid:     2 flags:0x80004000
[   94.691950] Call Trace:
[   94.691956]  __schedule+0x2d1/0x860
[   94.691968]  schedule+0x35/0xa0
[   94.691974]  schedule_timeout+0x197/0x300
[   94.691984]  ? __next_timer_interrupt+0xf0/0xf0
[   94.691993]  ? __prepare_to_swait+0x4f/0x80
[   94.692003]  rcu_gp_kthread+0x4ee/0x790
[   94.692014]  ? rcu_gp_cleanup+0x370/0x370
[   94.692021]  kthread+0x10b/0x130
[   94.692030]  ? set_kthread_struct+0x50/0x50
[   94.692039]  ret_from_fork+0x1f/0x40
[   94.692053] rcu: Stack dump where RCU GP kthread last ran:
[   94.692056] Sending NMI from CPU 3 to CPUs 0:
[   94.693065] NMI backtrace for cpu 0
[   94.693068] CPU: 0 PID: 4787 Comm: drbd_w_was02d Kdump: loaded Tainted: POE ----------- 4.18.0-425.10.1.el8_7.x86_64 #1
[   94.693073] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
[   94.693076] RIP: 0010:native_queued_spin_lock_slowpath+0x24/0x1c0
[   94.693080] Code: ff ff 0f 1f 40 00 0f 1f 44 00 00 0f 1f 44 00 00 ba 01 00 00 00 8b 07 85 c0 75 0d f0 0f b1 17 85 c0 75 f2 e9 6e b7 aa 00 f3 90 <eb> e9 81 fe 00 01 00 00 74 44 81 e6 00 ff ff ff 75 71 f0 0f ba 2f
[   94.693085] RSP: 0018:ffffb16fc6fabe10 EFLAGS: 00000002
[   94.693089] RAX: 0000000000000001 RBX: ffff9dd1d3a77330 RCX: ffffffffc0d2dd70
[   94.693092] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9dd1d3a77340
[   94.693095] RBP: ffffb16fc6fabe68 R08: 0000000000000c0d R09: ffff9dd1d3a774b0
[   94.693098] R10: 00000000fffbf55f R11: 000000000000b992 R12: ffff9dd1d3a77340
[   94.693101] R13: ffff9dd1d3a77360 R14: ffff9dd1d3a77468 R15: ffff9dd1d3a77470
[   94.693104] FS:  0000000000000000(0000) GS:ffff9dd6a5e00000(0000) knlGS:0000000000000000
[   94.693107] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   94.693110] CR2: 00007fdd3f01c1a0 CR3: 0000000444410005 CR4: 00000000001706f0
[   94.693112] Call Trace:
[   94.693114]  _raw_spin_lock_irq+0x25/0x2c
[   94.693116]  dequeue_work_batch+0x1b/0x70 [drbd]
[   94.693118]  drbd_worker+0x2fe/0x490 [drbd]
[   94.693121]  ? __switch_to_asm+0x43/0x80
[   94.693122]  ? __switch_to+0x10c/0x450
[   94.693125]  ? __drbd_next_peer_device_ref+0x140/0x140 [drbd]
[   94.693127]  drbd_thread_setup+0x5e/0x160 [drbd]
[   94.693129]  ? __drbd_next_peer_device_ref+0x140/0x140 [drbd]
[   94.693132]  kthread+0x10b/0x130
[   94.693134]  ? set_kthread_struct+0x50/0x50
[   94.693136]  ret_from_fork+0x1f/0x40
[  246.310379] INFO: task rpm:2874 blocked for more than 120 seconds.
[  246.310387]       Tainted: POE  ----------- 4.18.0-425.10.1.el8_7.x86_64 #1
[  246.310392] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.310394] task:rpm             state:D stack:    0 pid: 2874 ppid:  2867 flags:0x80000182
[  246.310403] Call Trace:
[  246.310410]  __schedule+0x2d1/0x860
[  246.310421]  ? select_task_rq_fair+0x355/0x990
[  246.310433]  schedule+0x35/0xa0
[  246.310439]  schedule_timeout+0x278/0x300
[  246.310449]  ? try_to_wake_up+0x1b4/0x4e0
[  246.310460]  wait_for_completion+0x96/0x100
[  246.310468]  __wait_rcu_gp+0x10a/0x130
[  246.310477]  synchronize_rcu+0x6c/0x80
[  246.310486]  ? rcu_cpu_kthread+0x200/0x200
[  246.310493]  ? __bpf_trace_rcu_stall_warning+0x10/0x10
[  246.310501]  namespace_unlock+0x6b/0x80
[  246.310511]  put_mnt_ns+0x21/0x30
[  246.310518]  free_nsproxy+0x17/0xe0
[  246.310525]  do_exit+0x348/0xb10
[  246.310536]  ? syscall_trace_enter+0x1ff/0x2d0
[  246.310548]  do_group_exit+0x3a/0xa0
[  246.310557]  __x64_sys_exit_group+0x14/0x20
[  246.310566]  do_syscall_64+0x5b/0x1b0
[  246.310575]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[  246.310585] RIP: 0033:0x7f6463a3ec56
[  246.310596] Code: Unable to access opcode bytes at RIP 0x7f6463a3ec2c.
[  246.310599] RSP: 002b:00007ffee2f2f818 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[  246.310606] RAX: ffffffffffffffda RBX: 00007f6463d00860 RCX: 00007f6463a3ec56
[  246.310611] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
[  246.310615] RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffff60
[  246.310619] R10: 0000000000000017 R11: 0000000000000246 R12: 00007f6463d00860
[  246.310623] R13: 0000000000000003 R14: 00007f6463d09508 R15: 0000000000000000
...
[  567.583696] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [sshd:4823]
[  567.583708] Modules linked in: drbd(OE) nf_conntrack_netlink nft_counter nft_log nfnetlink_queue nfnetlink_log xt_LOG nf_log_syslog nf_conntrack_tftp nf_conntrack_ftp falcon_lsm_serviceable(PE) falcon_nf_netcontain(PE) falcon_kal(E) falcon_lsm_pinned_14812(E) nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_tables_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock ext4 mbcache jbd2 intel_rapl_msr intel_rapl_common sb_edac crct10dif_pclmul crc32_pclmul vmw_balloon ghash_clmulni_intel rapl joydev pcspkr vmw_vmci i2c_piix4 auth_rpcgss sunrpc xfs libcrc32c ata_generic vmwgfx sd_mod t10_pi drm_ttm_helper ttm sg drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops crc32c_intel drm ahci libahci serio_raw ata_piix vmxnet3 vmw_pvscsi libata dm_mirror dm_region_hash dm_log dm_mod fuse
[  567.583917] CPU: 3 PID: 4823 Comm: sshd Kdump: loaded Tainted: POE ----------- 4.18.0-425.10.1.el8_7.x86_64 #1
[  567.583925] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
[  567.583935] RIP: 0010:smp_call_function_many_cond+0x256/0x290
[  567.583948] Code: 89 c7 e8 ed 15 82 00 3b 05 fb be c0 01 0f 83 34 fe ff ff 48 63 d0 49 8b 0e 48 03 0c d5 40 c8 3a 83 8b 11 83 e2 01 74 09 f3 90 <8b> 11 83 e2 01 75 f7 eb c9 48 c7 c2 20 0e fb 83 4c 89 ee 44 89 e7
[  567.583954] RSP: 0018:ffffb16fc39dbb60 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[  567.583961] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9dd6a5e32980
[  567.583965] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9dd1836c9c00
[  567.583969] RBP: 0000000000000000 R08: 0000000080000000 R09: 0000000000000000
[  567.583974] R10: 000000000000000f R11: 0000000000000000 R12: ffff9dd6a5eb2980
[  567.583978] R13: 0000000000000002 R14: ffff9dd6a5eec100 R15: 0000000000000080
[  567.583982] FS:  00007f1958b9b980(0000) GS:ffff9dd6a5ec0000(0000) knlGS:0000000000000000
[  567.583988] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  567.583992] CR2: 00007ffc964b2221 CR3: 00000001ca2a6005 CR4: 00000000001706e0
[  567.584028] Call Trace:
[  567.584034]  ? load_new_mm_cr3+0xe0/0xe0
[  567.584045]  ? load_new_mm_cr3+0xe0/0xe0
[  567.584054]  on_each_cpu+0x2b/0x60
[  567.584061]  flush_tlb_kernel_range+0x48/0x90
[  567.584072]  ? cpumask_next+0x17/0x20
[  567.584081]  __purge_vmap_area_lazy+0x70/0x730
[  567.584093]  _vm_unmap_aliases+0xe5/0x120
[  567.584102]  change_page_attr_set_clr+0xa5/0x1a0
[  567.584113]  set_memory_ro+0x26/0x30
[  567.584120]  bpf_int_jit_compile+0x486/0x4c0
[  567.584131]  bpf_prog_select_runtime+0xca/0x130
[  567.584139]  bpf_prepare_filter+0x52c/0x5a0
[  567.584150]  bpf_prog_create_from_user+0xbb/0x110
[  567.584157]  ? hardlockup_detector_perf_cleanup.cold.9+0x12/0x12
[  567.584167]  do_seccomp+0x24e/0x6d0
[  567.584176]  __x64_sys_prctl+0x35f/0x600
[  567.584187]  do_syscall_64+0x5b/0x1b0
[  567.584197]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[  567.584208] RIP: 0033:0x7f19563a641e
[  567.584215] Code: 48 8b 0d 6d 5a 38 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 9d 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3a 5a 38 00 f7 d8 64 89 01 48
[  567.584221] RSP: 002b:00007ffc964afc48 EFLAGS: 00000246 ORIG_RAX: 000000000000009d
[  567.584227] RAX: ffffffffffffffda RBX: 00007ffc964afc50 RCX: 00007f19563a641e
[  567.584231] RDX: 000055f3a55dfdc0 RSI: 0000000000000002 RDI: 0000000000000016
[  567.584236] RBP: 000055f3a75a50a0 R08: 0000000000000000 R09: 00007f19564ecd40
[  567.584240] R10: 00007f19563a641e R11: 0000000000000246 R12: 000055f3a75a0c80
[  567.584244] R13: 00007ffc964afd20 R14: 0000000000000000 R15: 000055f3a53a1355

Environment

  • Red Hat Enterprise Linux release 8.7 (Ootpa)
  • kernel-4.18.0-425.10.1.el8_7 or higher
  • Out-of-tree (O) kernel module: [drbd]

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content