System hangs with rcu_sched CPU stall, hung task, and soft lockup after updating the kernel to version 4.18.0-425.10.1.el8_7 or higher.
Issue
- System hangs with rcu_sched CPU stall, hung task, and soft lockup after updating the kernel from version
4.18.0-425.3.1.el8
to4.18.0-425.10.1.el8_7
or higher - RHEL8.7 system with the kernel version
4.18.0-425.10.1.el8_7
or higher hangs with the following call traces as soon as thedrbd
driver is loaded.
[ 34.033101] drbd: loading out-of-tree module taints kernel.
[ 34.072966] drbd: initialized. Version: 9.1.12 (api:2/proto:86-121)
[ 34.072971] drbd: GIT-hash: bfc1691453a36de688e013239ce5c586263933fa build by @buildsystem, 2022-11-14 13:30:59
[ 34.072975] drbd: registered as block device major 147
[ 34.690710] drbd was02d: Starting worker thread (from drbdsetup [4786])
[ 34.726826] drbd msg01tqm01: Starting worker thread (from drbdsetup [4794])
[ 94.691893] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 94.691906] (detected by 3, t=60002 jiffies, g=20409, q=2556)
[ 94.691911] rcu: All QSes seen, last rcu_sched kthread activity 59998 (4294762429-4294702431), jiffies_till_next_fqs=3, root ->qsmask 0x0
[ 94.691918] rcu: rcu_sched kthread timer wakeup didn't happen for 59995 jiffies! g20409 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200
[ 94.691925] rcu: Possible timer handling issue on cpu=0 timer-softirq=1664
[ 94.691929] rcu: rcu_sched kthread starved for 59998 jiffies! g20409 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200 ->cpu=0
[ 94.691936] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[ 94.691939] rcu: RCU grace-period kthread stack dump:
[ 94.691942] task:rcu_sched state:R stack: 0 pid: 13 ppid: 2 flags:0x80004000
[ 94.691950] Call Trace:
[ 94.691956] __schedule+0x2d1/0x860
[ 94.691968] schedule+0x35/0xa0
[ 94.691974] schedule_timeout+0x197/0x300
[ 94.691984] ? __next_timer_interrupt+0xf0/0xf0
[ 94.691993] ? __prepare_to_swait+0x4f/0x80
[ 94.692003] rcu_gp_kthread+0x4ee/0x790
[ 94.692014] ? rcu_gp_cleanup+0x370/0x370
[ 94.692021] kthread+0x10b/0x130
[ 94.692030] ? set_kthread_struct+0x50/0x50
[ 94.692039] ret_from_fork+0x1f/0x40
[ 94.692053] rcu: Stack dump where RCU GP kthread last ran:
[ 94.692056] Sending NMI from CPU 3 to CPUs 0:
[ 94.693065] NMI backtrace for cpu 0
[ 94.693068] CPU: 0 PID: 4787 Comm: drbd_w_was02d Kdump: loaded Tainted: POE ----------- 4.18.0-425.10.1.el8_7.x86_64 #1
[ 94.693073] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
[ 94.693076] RIP: 0010:native_queued_spin_lock_slowpath+0x24/0x1c0
[ 94.693080] Code: ff ff 0f 1f 40 00 0f 1f 44 00 00 0f 1f 44 00 00 ba 01 00 00 00 8b 07 85 c0 75 0d f0 0f b1 17 85 c0 75 f2 e9 6e b7 aa 00 f3 90 <eb> e9 81 fe 00 01 00 00 74 44 81 e6 00 ff ff ff 75 71 f0 0f ba 2f
[ 94.693085] RSP: 0018:ffffb16fc6fabe10 EFLAGS: 00000002
[ 94.693089] RAX: 0000000000000001 RBX: ffff9dd1d3a77330 RCX: ffffffffc0d2dd70
[ 94.693092] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9dd1d3a77340
[ 94.693095] RBP: ffffb16fc6fabe68 R08: 0000000000000c0d R09: ffff9dd1d3a774b0
[ 94.693098] R10: 00000000fffbf55f R11: 000000000000b992 R12: ffff9dd1d3a77340
[ 94.693101] R13: ffff9dd1d3a77360 R14: ffff9dd1d3a77468 R15: ffff9dd1d3a77470
[ 94.693104] FS: 0000000000000000(0000) GS:ffff9dd6a5e00000(0000) knlGS:0000000000000000
[ 94.693107] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 94.693110] CR2: 00007fdd3f01c1a0 CR3: 0000000444410005 CR4: 00000000001706f0
[ 94.693112] Call Trace:
[ 94.693114] _raw_spin_lock_irq+0x25/0x2c
[ 94.693116] dequeue_work_batch+0x1b/0x70 [drbd]
[ 94.693118] drbd_worker+0x2fe/0x490 [drbd]
[ 94.693121] ? __switch_to_asm+0x43/0x80
[ 94.693122] ? __switch_to+0x10c/0x450
[ 94.693125] ? __drbd_next_peer_device_ref+0x140/0x140 [drbd]
[ 94.693127] drbd_thread_setup+0x5e/0x160 [drbd]
[ 94.693129] ? __drbd_next_peer_device_ref+0x140/0x140 [drbd]
[ 94.693132] kthread+0x10b/0x130
[ 94.693134] ? set_kthread_struct+0x50/0x50
[ 94.693136] ret_from_fork+0x1f/0x40
[ 246.310379] INFO: task rpm:2874 blocked for more than 120 seconds.
[ 246.310387] Tainted: POE ----------- 4.18.0-425.10.1.el8_7.x86_64 #1
[ 246.310392] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 246.310394] task:rpm state:D stack: 0 pid: 2874 ppid: 2867 flags:0x80000182
[ 246.310403] Call Trace:
[ 246.310410] __schedule+0x2d1/0x860
[ 246.310421] ? select_task_rq_fair+0x355/0x990
[ 246.310433] schedule+0x35/0xa0
[ 246.310439] schedule_timeout+0x278/0x300
[ 246.310449] ? try_to_wake_up+0x1b4/0x4e0
[ 246.310460] wait_for_completion+0x96/0x100
[ 246.310468] __wait_rcu_gp+0x10a/0x130
[ 246.310477] synchronize_rcu+0x6c/0x80
[ 246.310486] ? rcu_cpu_kthread+0x200/0x200
[ 246.310493] ? __bpf_trace_rcu_stall_warning+0x10/0x10
[ 246.310501] namespace_unlock+0x6b/0x80
[ 246.310511] put_mnt_ns+0x21/0x30
[ 246.310518] free_nsproxy+0x17/0xe0
[ 246.310525] do_exit+0x348/0xb10
[ 246.310536] ? syscall_trace_enter+0x1ff/0x2d0
[ 246.310548] do_group_exit+0x3a/0xa0
[ 246.310557] __x64_sys_exit_group+0x14/0x20
[ 246.310566] do_syscall_64+0x5b/0x1b0
[ 246.310575] entry_SYSCALL_64_after_hwframe+0x61/0xc6
[ 246.310585] RIP: 0033:0x7f6463a3ec56
[ 246.310596] Code: Unable to access opcode bytes at RIP 0x7f6463a3ec2c.
[ 246.310599] RSP: 002b:00007ffee2f2f818 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[ 246.310606] RAX: ffffffffffffffda RBX: 00007f6463d00860 RCX: 00007f6463a3ec56
[ 246.310611] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
[ 246.310615] RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffff60
[ 246.310619] R10: 0000000000000017 R11: 0000000000000246 R12: 00007f6463d00860
[ 246.310623] R13: 0000000000000003 R14: 00007f6463d09508 R15: 0000000000000000
...
[ 567.583696] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [sshd:4823]
[ 567.583708] Modules linked in: drbd(OE) nf_conntrack_netlink nft_counter nft_log nfnetlink_queue nfnetlink_log xt_LOG nf_log_syslog nf_conntrack_tftp nf_conntrack_ftp falcon_lsm_serviceable(PE) falcon_nf_netcontain(PE) falcon_kal(E) falcon_lsm_pinned_14812(E) nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_tables_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock ext4 mbcache jbd2 intel_rapl_msr intel_rapl_common sb_edac crct10dif_pclmul crc32_pclmul vmw_balloon ghash_clmulni_intel rapl joydev pcspkr vmw_vmci i2c_piix4 auth_rpcgss sunrpc xfs libcrc32c ata_generic vmwgfx sd_mod t10_pi drm_ttm_helper ttm sg drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops crc32c_intel drm ahci libahci serio_raw ata_piix vmxnet3 vmw_pvscsi libata dm_mirror dm_region_hash dm_log dm_mod fuse
[ 567.583917] CPU: 3 PID: 4823 Comm: sshd Kdump: loaded Tainted: POE ----------- 4.18.0-425.10.1.el8_7.x86_64 #1
[ 567.583925] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
[ 567.583935] RIP: 0010:smp_call_function_many_cond+0x256/0x290
[ 567.583948] Code: 89 c7 e8 ed 15 82 00 3b 05 fb be c0 01 0f 83 34 fe ff ff 48 63 d0 49 8b 0e 48 03 0c d5 40 c8 3a 83 8b 11 83 e2 01 74 09 f3 90 <8b> 11 83 e2 01 75 f7 eb c9 48 c7 c2 20 0e fb 83 4c 89 ee 44 89 e7
[ 567.583954] RSP: 0018:ffffb16fc39dbb60 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[ 567.583961] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9dd6a5e32980
[ 567.583965] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9dd1836c9c00
[ 567.583969] RBP: 0000000000000000 R08: 0000000080000000 R09: 0000000000000000
[ 567.583974] R10: 000000000000000f R11: 0000000000000000 R12: ffff9dd6a5eb2980
[ 567.583978] R13: 0000000000000002 R14: ffff9dd6a5eec100 R15: 0000000000000080
[ 567.583982] FS: 00007f1958b9b980(0000) GS:ffff9dd6a5ec0000(0000) knlGS:0000000000000000
[ 567.583988] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 567.583992] CR2: 00007ffc964b2221 CR3: 00000001ca2a6005 CR4: 00000000001706e0
[ 567.584028] Call Trace:
[ 567.584034] ? load_new_mm_cr3+0xe0/0xe0
[ 567.584045] ? load_new_mm_cr3+0xe0/0xe0
[ 567.584054] on_each_cpu+0x2b/0x60
[ 567.584061] flush_tlb_kernel_range+0x48/0x90
[ 567.584072] ? cpumask_next+0x17/0x20
[ 567.584081] __purge_vmap_area_lazy+0x70/0x730
[ 567.584093] _vm_unmap_aliases+0xe5/0x120
[ 567.584102] change_page_attr_set_clr+0xa5/0x1a0
[ 567.584113] set_memory_ro+0x26/0x30
[ 567.584120] bpf_int_jit_compile+0x486/0x4c0
[ 567.584131] bpf_prog_select_runtime+0xca/0x130
[ 567.584139] bpf_prepare_filter+0x52c/0x5a0
[ 567.584150] bpf_prog_create_from_user+0xbb/0x110
[ 567.584157] ? hardlockup_detector_perf_cleanup.cold.9+0x12/0x12
[ 567.584167] do_seccomp+0x24e/0x6d0
[ 567.584176] __x64_sys_prctl+0x35f/0x600
[ 567.584187] do_syscall_64+0x5b/0x1b0
[ 567.584197] entry_SYSCALL_64_after_hwframe+0x61/0xc6
[ 567.584208] RIP: 0033:0x7f19563a641e
[ 567.584215] Code: 48 8b 0d 6d 5a 38 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 9d 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3a 5a 38 00 f7 d8 64 89 01 48
[ 567.584221] RSP: 002b:00007ffc964afc48 EFLAGS: 00000246 ORIG_RAX: 000000000000009d
[ 567.584227] RAX: ffffffffffffffda RBX: 00007ffc964afc50 RCX: 00007f19563a641e
[ 567.584231] RDX: 000055f3a55dfdc0 RSI: 0000000000000002 RDI: 0000000000000016
[ 567.584236] RBP: 000055f3a75a50a0 R08: 0000000000000000 R09: 00007f19564ecd40
[ 567.584240] R10: 00007f19563a641e R11: 0000000000000246 R12: 000055f3a75a0c80
[ 567.584244] R13: 00007ffc964afd20 R14: 0000000000000000 R15: 000055f3a53a1355
Environment
- Red Hat Enterprise Linux release 8.7 (Ootpa)
- kernel-4.18.0-425.10.1.el8_7 or higher
- Out-of-tree (O) kernel module:
[drbd]
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.