A soft lockup happens on one CPU where one task is calling qed_mcp_trace_dump() on another CPU

Solution Verified - Updated -

Issue

  • A soft lockup happens on one CPU where one task is calling qed_mcp_trace_dump() on another CPU.
[ 2864.907964] [qed_dbg_dump:8021(ens10f1)]Collecting a debug feature ["idle_chk"]
[ 2864.917265] [qed_dbg_dump:8021(ens10f1)]Collecting a debug feature ["idle_chk"]
[ 2864.926660] [qed_dbg_dump:8021(ens10f1)]Collecting a debug feature ["reg_fifo"]
[ 2864.926696] [qed_dbg_dump:8021(ens10f1)]Collecting a debug feature ["igu_fifo"]
[ 2864.926726] [qed_dbg_dump:8021(ens10f1)]Collecting a debug feature ["protection_override"]
[ 2864.926778] [qed_dbg_dump:8021(ens10f1)]Collecting a debug feature ["fw_asserts"]
[ 2864.927005] [qed_dbg_dump:8021(ens10f1)]Collecting a debug feature ["ilt"]
[ 2864.927487] [qed_dbg_dump:8021(ens10f1)]Collecting a debug feature ["grc"]
[ 2865.284942] [qed_dbg_dump:8021(ens10f1)]Collecting a debug feature ["mcp_trace"]
[ 2892.037354] watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [migration/5:45]
[ 2892.037401] Modules linked in: binfmt_misc mptcp_diag tcp_diag udp_diag raw_diag inet_diag bonding tls intel_rapl_msr intel_rapl_common ipmi_ssif nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate vfat fat intel_uncore pcspkr isst_if_mbox_pci acpi_ipmi ses isst_if_mmio ioatdma enclosure isst_if_common scsi_transport_sas hpilo hpwdt intel_vsec dca wmi ipmi_si acpi_tad acpi_power_meter xfs libcrc32c sd_mod t10_pi sg mgag200 i2c_algo_bit drm_shmem_helper drm_kms_helper qede syscopyarea sysfillrect sysimgblt fb_sys_fops qed crc32c_intel drm megaraid_sas crc8 dm_mirror dm_region_hash dm_log dm_mod ipmi_devintf ipmi_msghandler
[ 2892.037463] CPU: 5 PID: 45 Comm: migration/5 Kdump: loaded Not tainted 4.18.0-477.13.1.el8_8.x86_64 #1
[ 2892.037466] Hardware name: HPE ProLiant DL360 Gen10 Plus/ProLiant DL360 Gen10 Plus, BIOS U46 02/02/2023
[ 2892.037467] RIP: 0010:stop_machine_yield+0x2/0x10
[ 2892.037477] Code: 75 14 48 8d 65 f0 5b 41 5c 5d e9 19 10 c3 00 b8 fe ff ff ff eb dc e8 ed 67 f2 ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 90 <e9> f9 0f c3 00 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57 41
[ 2892.037479] RSP: 0000:ff4429a18c843e68 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[ 2892.037482] RAX: ff336254ffb73d40 RBX: ff4429a1aef3fb50 RCX: dead000000000200
[ 2892.037483] RDX: 0000000000000002 RSI: ff4429a1aef3fad0 RDI: ffffffff9aa17de0
[ 2892.037484] RBP: ff4429a1aef3fb74 R08: 0000000000000001 R09: 0000000000000001
[ 2892.037486] R10: 0000000000000012 R11: 000000000000002f R12: 0000000000000001
[ 2892.037487] R13: ffffffff9aa17de0 R14: 0000000000000000 R15: 0000000000000001
[ 2892.037488] FS:  0000000000000000(0000) GS:ff336254ffb40000(0000) knlGS:0000000000000000
[ 2892.037490] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2892.037491] CR2: 00007f9907743a08 CR3: 0000004e01010005 CR4: 0000000000771ee0
[ 2892.037493] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2892.037494] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2892.037496] PKRU: 55555554
[ 2892.037497] Call Trace:
[ 2892.037500]  multi_cpu_stop+0x54/0x110
[ 2892.037505]  ? stop_machine_yield+0x10/0x10
[ 2892.037507]  cpu_stopper_thread+0x48/0x100
[ 2892.037511]  ? sort_range+0x20/0x20
[ 2892.037515]  smpboot_thread_fn+0xb5/0x150
[ 2892.037517]  kthread+0x134/0x150
[ 2892.037520]  ? set_kthread_struct+0x50/0x50
[ 2892.037523]  ret_from_fork+0x1f/0x40
[ 2892.037529] Kernel panic - not syncing: softlockup: hung tasks
[ 2892.037553] CPU: 5 PID: 45 Comm: migration/5 Kdump: loaded Tainted: G             L   --------- -  - 4.18.0-477.13.1.el8_8.x86_64 #1
[ 2892.037596] Hardware name: HPE ProLiant DL360 Gen10 Plus/ProLiant DL360 Gen10 Plus, BIOS U46 02/02/2023
[ 2892.037621] Call Trace:
[ 2892.037630]  <IRQ>
[ 2892.037638]  dump_stack+0x41/0x60
[ 2892.037652]  panic+0xe7/0x2ac
[ 2892.037666]  ? __switch_to_asm+0x51/0x80
[ 2892.037679]  watchdog_timer_fn.cold.10+0x85/0x9e
[ 2892.037697]  ? watchdog+0x30/0x30
[ 2892.037709]  __hrtimer_run_queues+0x101/0x280
[ 2892.037726]  hrtimer_interrupt+0x100/0x220
[ 2892.037741]  smp_apic_timer_interrupt+0x6a/0x130
[ 2892.037757]  apic_timer_interrupt+0xf/0x20
[ 2892.037772]  </IRQ>
[ 2892.037779] RIP: 0010:stop_machine_yield+0x2/0x10
[ 2892.037796] Code: 75 14 48 8d 65 f0 5b 41 5c 5d e9 19 10 c3 00 b8 fe ff ff ff eb dc e8 ed 67 f2 ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 90 <e9> f9 0f c3 00 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57 41
[ 2892.037843] RSP: 0000:ff4429a18c843e68 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[ 2892.037865] RAX: ff336254ffb73d40 RBX: ff4429a1aef3fb50 RCX: dead000000000200
[ 2892.037889] RDX: 0000000000000002 RSI: ff4429a1aef3fad0 RDI: ffffffff9aa17de0
[ 2892.037917] RBP: ff4429a1aef3fb74 R08: 0000000000000001 R09: 0000000000000001
[ 2892.037945] R10: 0000000000000012 R11: 000000000000002f R12: 0000000000000001
[ 2892.037974] R13: ffffffff9aa17de0 R14: 0000000000000000 R15: 0000000000000001
[ 2892.038003]  multi_cpu_stop+0x54/0x110
[ 2892.038021]  ? stop_machine_yield+0x10/0x10
[ 2892.038039]  cpu_stopper_thread+0x48/0x100
[ 2892.038057]  ? sort_range+0x20/0x20
[ 2892.038348]  smpboot_thread_fn+0xb5/0x150
[ 2892.038590]  kthread+0x134/0x150
[ 2892.038776]  ? set_kthread_struct+0x50/0x50
[ 2892.038952]  ret_from_fork+0x1f/0x40

Environment

  • Red Hat Enterprise Linux 9 - kernel-5.14.0-427.4.1.el9_4
  • Red Hat Enterprise Linux 8 - kernel-4.18.0-477.13.1.el8_8
  • Red Hat Enterprise Linux 7 - kernel-3.10.0-1160.92.1.el7
  • qed NIC driver
  • HPE ProLiant DL360 Gen10 Plus

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content