A soft lockup happens on one CPU where one task is calling qed_mcp_trace_dump() on another CPU
Issue
- A soft lockup happens on one CPU where one task is calling qed_mcp_trace_dump() on another CPU.
[ 2864.907964] [qed_dbg_dump:8021(ens10f1)]Collecting a debug feature ["idle_chk"]
[ 2864.917265] [qed_dbg_dump:8021(ens10f1)]Collecting a debug feature ["idle_chk"]
[ 2864.926660] [qed_dbg_dump:8021(ens10f1)]Collecting a debug feature ["reg_fifo"]
[ 2864.926696] [qed_dbg_dump:8021(ens10f1)]Collecting a debug feature ["igu_fifo"]
[ 2864.926726] [qed_dbg_dump:8021(ens10f1)]Collecting a debug feature ["protection_override"]
[ 2864.926778] [qed_dbg_dump:8021(ens10f1)]Collecting a debug feature ["fw_asserts"]
[ 2864.927005] [qed_dbg_dump:8021(ens10f1)]Collecting a debug feature ["ilt"]
[ 2864.927487] [qed_dbg_dump:8021(ens10f1)]Collecting a debug feature ["grc"]
[ 2865.284942] [qed_dbg_dump:8021(ens10f1)]Collecting a debug feature ["mcp_trace"]
[ 2892.037354] watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [migration/5:45]
[ 2892.037401] Modules linked in: binfmt_misc mptcp_diag tcp_diag udp_diag raw_diag inet_diag bonding tls intel_rapl_msr intel_rapl_common ipmi_ssif nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate vfat fat intel_uncore pcspkr isst_if_mbox_pci acpi_ipmi ses isst_if_mmio ioatdma enclosure isst_if_common scsi_transport_sas hpilo hpwdt intel_vsec dca wmi ipmi_si acpi_tad acpi_power_meter xfs libcrc32c sd_mod t10_pi sg mgag200 i2c_algo_bit drm_shmem_helper drm_kms_helper qede syscopyarea sysfillrect sysimgblt fb_sys_fops qed crc32c_intel drm megaraid_sas crc8 dm_mirror dm_region_hash dm_log dm_mod ipmi_devintf ipmi_msghandler
[ 2892.037463] CPU: 5 PID: 45 Comm: migration/5 Kdump: loaded Not tainted 4.18.0-477.13.1.el8_8.x86_64 #1
[ 2892.037466] Hardware name: HPE ProLiant DL360 Gen10 Plus/ProLiant DL360 Gen10 Plus, BIOS U46 02/02/2023
[ 2892.037467] RIP: 0010:stop_machine_yield+0x2/0x10
[ 2892.037477] Code: 75 14 48 8d 65 f0 5b 41 5c 5d e9 19 10 c3 00 b8 fe ff ff ff eb dc e8 ed 67 f2 ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 90 <e9> f9 0f c3 00 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57 41
[ 2892.037479] RSP: 0000:ff4429a18c843e68 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[ 2892.037482] RAX: ff336254ffb73d40 RBX: ff4429a1aef3fb50 RCX: dead000000000200
[ 2892.037483] RDX: 0000000000000002 RSI: ff4429a1aef3fad0 RDI: ffffffff9aa17de0
[ 2892.037484] RBP: ff4429a1aef3fb74 R08: 0000000000000001 R09: 0000000000000001
[ 2892.037486] R10: 0000000000000012 R11: 000000000000002f R12: 0000000000000001
[ 2892.037487] R13: ffffffff9aa17de0 R14: 0000000000000000 R15: 0000000000000001
[ 2892.037488] FS: 0000000000000000(0000) GS:ff336254ffb40000(0000) knlGS:0000000000000000
[ 2892.037490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2892.037491] CR2: 00007f9907743a08 CR3: 0000004e01010005 CR4: 0000000000771ee0
[ 2892.037493] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2892.037494] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2892.037496] PKRU: 55555554
[ 2892.037497] Call Trace:
[ 2892.037500] multi_cpu_stop+0x54/0x110
[ 2892.037505] ? stop_machine_yield+0x10/0x10
[ 2892.037507] cpu_stopper_thread+0x48/0x100
[ 2892.037511] ? sort_range+0x20/0x20
[ 2892.037515] smpboot_thread_fn+0xb5/0x150
[ 2892.037517] kthread+0x134/0x150
[ 2892.037520] ? set_kthread_struct+0x50/0x50
[ 2892.037523] ret_from_fork+0x1f/0x40
[ 2892.037529] Kernel panic - not syncing: softlockup: hung tasks
[ 2892.037553] CPU: 5 PID: 45 Comm: migration/5 Kdump: loaded Tainted: G L --------- - - 4.18.0-477.13.1.el8_8.x86_64 #1
[ 2892.037596] Hardware name: HPE ProLiant DL360 Gen10 Plus/ProLiant DL360 Gen10 Plus, BIOS U46 02/02/2023
[ 2892.037621] Call Trace:
[ 2892.037630] <IRQ>
[ 2892.037638] dump_stack+0x41/0x60
[ 2892.037652] panic+0xe7/0x2ac
[ 2892.037666] ? __switch_to_asm+0x51/0x80
[ 2892.037679] watchdog_timer_fn.cold.10+0x85/0x9e
[ 2892.037697] ? watchdog+0x30/0x30
[ 2892.037709] __hrtimer_run_queues+0x101/0x280
[ 2892.037726] hrtimer_interrupt+0x100/0x220
[ 2892.037741] smp_apic_timer_interrupt+0x6a/0x130
[ 2892.037757] apic_timer_interrupt+0xf/0x20
[ 2892.037772] </IRQ>
[ 2892.037779] RIP: 0010:stop_machine_yield+0x2/0x10
[ 2892.037796] Code: 75 14 48 8d 65 f0 5b 41 5c 5d e9 19 10 c3 00 b8 fe ff ff ff eb dc e8 ed 67 f2 ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 90 <e9> f9 0f c3 00 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57 41
[ 2892.037843] RSP: 0000:ff4429a18c843e68 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[ 2892.037865] RAX: ff336254ffb73d40 RBX: ff4429a1aef3fb50 RCX: dead000000000200
[ 2892.037889] RDX: 0000000000000002 RSI: ff4429a1aef3fad0 RDI: ffffffff9aa17de0
[ 2892.037917] RBP: ff4429a1aef3fb74 R08: 0000000000000001 R09: 0000000000000001
[ 2892.037945] R10: 0000000000000012 R11: 000000000000002f R12: 0000000000000001
[ 2892.037974] R13: ffffffff9aa17de0 R14: 0000000000000000 R15: 0000000000000001
[ 2892.038003] multi_cpu_stop+0x54/0x110
[ 2892.038021] ? stop_machine_yield+0x10/0x10
[ 2892.038039] cpu_stopper_thread+0x48/0x100
[ 2892.038057] ? sort_range+0x20/0x20
[ 2892.038348] smpboot_thread_fn+0xb5/0x150
[ 2892.038590] kthread+0x134/0x150
[ 2892.038776] ? set_kthread_struct+0x50/0x50
[ 2892.038952] ret_from_fork+0x1f/0x40
Environment
- Red Hat Enterprise Linux 9 - kernel-5.14.0-427.4.1.el9_4
- Red Hat Enterprise Linux 8 - kernel-4.18.0-477.13.1.el8_8
- Red Hat Enterprise Linux 7 - kernel-3.10.0-1160.92.1.el7
- qed NIC driver
- HPE ProLiant DL360 Gen10 Plus
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.