ppc64le kernel is getting hung up with soft lockups and rcu_sched CPU stalls. One CPU is stuck waiting on rq.lock spinlock of another CPU but the spinlock is not locked
Issue
- ppc64le kernel is getting hung up with soft lockups and rcu_sched CPU stalls. One CPU is stuck waiting on rq.lock spinlock of another CPU but the spinlock is not locked
[3849417.502681] watchdog: BUG: soft lockup - CPU#19 stuck for 23s! [migration/19:127]
[3849417.502702] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables_set rpadlpar_io rpaphp mptcp_diag xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag nfsv3 nfs_acl nfs lockd grace fscache bonding nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables libcrc32c nfnetlink sunrpc pseries_rng xts vmx_crypto binfmt_misc ext4 mbcache jbd2 dm_service_time sr_mod cdrom sd_mod t10_pi sg ibmvfc ibmveth scsi_transport_fc ibmvscsi scsi_transport_srp dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse [last unloaded: nft_reject]
[3849417.502754] CPU: 19 PID: 127 Comm: migration/19 Kdump: loaded Not tainted 4.18.0-305.65.1.el8_4.ppc64le #1
[3849417.502757] NIP: c0000000002b25bc LR: c0000000002b2680 CTR: c0000000002b2520
[3849417.502761] REGS: c0000017fcb6b990 TRAP: 0901 Not tainted (4.18.0-305.65.1.el8_4.ppc64le)
[3849417.502762] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 44002422 XER: 20040000
[3849417.502768] CFAR: c0000000002b2684 IRQMASK: 0
GPR00: c0000000002b2680 c0000017fcb6bc20 c000000001c11500 0000000000000100
GPR04: c000001fc7ef38e8 c000001fc7ef38e8 0000000000000000 c0000017feffad58
GPR08: 0000000000000004 c0000017fcbfc000 00000000056083d2 0000000000000001
GPR12: 0000000000000000 c0000017ffffb700
[3849417.502784] NIP [c0000000002b25bc] multi_cpu_stop+0x9c/0x220
[3849417.502787] LR [c0000000002b2680] multi_cpu_stop+0x160/0x220
[3849417.502789] Call Trace:
[3849417.502792] [c0000017fcb6bc20] [c0000017fcb6bc90] 0xc0000017fcb6bc90 (unreliable)
[3849417.502795] [c0000017fcb6bc90] [c0000000002b28ec] cpu_stopper_thread+0x14c/0x240
[3849417.502798] [c0000017fcb6bd40] [c0000000001ab5f8] smpboot_thread_fn+0x1e8/0x2a0
[3849417.502802] [c0000017fcb6bdb0] [c0000000001a3520] kthread+0x1b0/0x1c0
[3849417.502806] [c0000017fcb6be20] [c00000000000b7d8] ret_from_kernel_thread+0x5c/0x64
...
[3849429.512727] watchdog: BUG: soft lockup - CPU#48 stuck for 22s! [lparstat:103399]
[3849429.512749] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables_set rpadlpar_io rpaphp mptcp_diag xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag nfsv3 nfs_acl nfs lockd grace fscache bonding nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables libcrc32c nfnetlink sunrpc pseries_rng xts vmx_crypto binfmt_misc ext4 mbcache jbd2 dm_service_time sr_mod cdrom sd_mod t10_pi sg ibmvfc ibmveth scsi_transport_fc ibmvscsi scsi_transport_srp dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse [last unloaded: nft_reject]
[3849429.512809] CPU: 48 PID: 103399 Comm: lparstat Kdump: loaded Tainted: G L --------- - - 4.18.0-305.65.1.el8_4.ppc64le #1
[3849429.512813] NIP: c000000000273804 LR: c000000000273824 CTR: c0000000000bf1b0
[3849429.512816] REGS: c0000016846af810 TRAP: 0901 Tainted: G L --------- - - (4.18.0-305.65.1.el8_4.ppc64le)
[3849429.512819] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44024244 XER: 00000000
[3849429.512823] CFAR: c000000000273810 IRQMASK: 0
GPR00: c000000000273824 c0000016846afaa0 c000000001c11500 0000000000000008
GPR04: 0000000000000008 0000000000000008 0000000000000037 c000000001c42e00
GPR08: 0000000000000008 0000000000000001 c000000ffedeee00 c000000001c47058
GPR12: c0000000000bf1b0 c0000017ffff5300
[3849429.512840] NIP [c000000000273804] smp_call_function_many_cond+0x464/0x4e0
[3849429.512842] LR [c000000000273824] smp_call_function_many_cond+0x484/0x4e0
[3849429.512844] Call Trace:
[3849429.512847] [c0000016846afaa0] [c000000000273824] smp_call_function_many_cond+0x484/0x4e0 (unreliable)
[3849429.512849] [c0000016846afb30] [c000000000273958] on_each_cpu+0x58/0xb0
[3849429.512853] [c0000016846afb70] [c00000000011d904] pseries_lparcfg_data.isra.0+0xa74/0x1040
[3849429.512857] [c0000016846afcf0] [c00000000059390c] seq_read+0x1cc/0x720
[3849429.512860] [c0000016846afd90] [c00000000062a3e0] proc_reg_read+0x90/0x1a0
[3849429.512863] [c0000016846afdc0] [c000000000545ef8] sys_read+0x118/0x320
[3849429.512867] [c0000016846afe20] [c00000000000b408] system_call+0x5c/0x70
...
PID: 770004 TASK: c000000c25b0d800 CPU: 8 COMMAND: "GC Thread#49"
R0: 0000000044022482 R1: c000000be2183340 R2: c000000001c11500
R3: 0000000000000000 R4: 0000000000000038 R5: 0000000027b2cd29
R6: 000000000000003f R7: c000000001c46e00 R8: 00000000000001c0
R9: c000001fffff2700 R10: 0000000080000038 R11: 0000002000000014
R12: 0000000000000000 R13: c000000ffffff300 R14: c000000001c42e00
R15: 00000000000001f8 R16: 000000000000003f R17: c000000be21836f0
R18: 0000000000000002 R19: 0000000000000003 R20: 0000000000000004
R21: c000000c18e5d000 R22: 0000000000000001 R23: 0000000000000005
R24: c000000001dfdb80 R25: c000000fa2308e00 R26: c000000001c47514
R27: c000000fa2308e30 R28: c000000001c42e00 R29: 0000000000000001
R30: 0000000000000001 R31: c000001ffda69d80
NIP: c000000000104ea8 MSR: 8000000000181033 OR3: 000000000000011c
CTR: 0000000000000000 LR: c0000000000b5684 XER: 0000000000000000
CCR: 0000000024022482 MQ: 0000000000000001 DAR: 000000000000000c
DSISR: 0000000000000000 Syscall Result: 0000000000000000
[NIP : plpar_hcall_norets+28]
[LR : __spin_yield+148]
#0 [c000000be2183340] (null) at c000000be2183370 (unreliable)
#1 [c000000be21833a0] _raw_spin_lock_irqsave at c000000000ee5708
#2 [c000000be21833e0] update_blocked_averages at c0000000001c6f80
#3 [c000000be2183480] find_busiest_group at c0000000001db070
#4 [c000000be2183660] load_balance at c0000000001db7ec
#5 [c000000be21837f0] newidle_balance at c0000000001dd3b0
#6 [c000000be21838b0] pick_next_task_fair at c0000000001dd85c
#7 [c000000be2183960] __schedule at c000000000edd984
#8 [c000000be2183a30] schedule at c000000000ede2a8
#9 [c000000be2183a60] futex_wait_queue_me at c00000000026d3b8
#10 [c000000be2183ab0] futex_wait at c00000000026da08
#11 [c000000be2183c00] do_futex at c000000000271350
#12 [c000000be2183d90] sys_futex at c000000000272374
#13 [c000000be2183e20] system_call at c00000000000b408
System Call [c00] exception frame:
R0: 00000000000000dd R1: 00007fff2a38e140 R2: 00007fff9a937f00
R3: 000000014b79c078 R4: 0000000000000080 R5: 0000000000000000
R6: 0000000000000000 R7: 00007fff2a38f278 R8: 0000000000000002
R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000
R12: 0000000000000000 R13: 00007fff2a3968e0 R14: 0000000000000000
R15: 0000000000000001 R16: 0000000000000000 R17: 000000014b79c078
R18: 0000000000000000 R19: 00007fff2a38e198 R20: 00007fff9a9014b0
R21: 0000000000000080 R22: 0000000000107f3e R23: 0000000000000000
R24: 000000000020fe7c R25: 000000014b79c028 R26: 00007fff2a38e178
R27: 0000000000000000 R28: 0000000000000002 R29: 000000014b79c078
R30: 000000014b79c050 R31: 000000014b79c060
NIP: 00007fff9a90174c MSR: 800000000280f033 OR3: 000000014b79c078
CTR: 0000000000000000 LR: 00007fff9a90172c XER: 0000000000000000
CCR: 0000000044024888 MQ: 0000000000000000 DAR: 00007fff3d94db30
DSISR: 0000000008000000 Syscall Result: 0000000000000000
Environment
- Red Hat Enterprise Linux 8.4.z for Power, Little Endian
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.