Hard lockup or soft lockup occurs in the system due to an issue with IOMMU
Issue
Soft lockup
occurred on the system upon a stress test.
Oct 4 23:17:34 localhost kernel: NMI watchdog: BUG: soft lockup - CPU#61 stuck for 21s! [iomonkey:10809]
[....]
Oct 4 23:17:34 localhost kernel: CPU: 61 PID: 10809 Comm: iomonkey Kdump: loaded Not tainted 3.10.0-1062.el7.x86_64 #1
Oct 4 23:17:34 localhost kernel: Hardware name: Dell Inc. PowerEdge R7525/04D5GJ, BIOS 0.40.10 08/28/2019
Oct 4 23:17:34 localhost kernel: task: ffff93dcfde89070 ti: ffff93dd785d0000 task.ti: ffff93dd785d0000
Oct 4 23:17:34 localhost kernel: RIP: 0010:[<ffffffffb67817c5>] [<ffffffffb67817c5>] _raw_spin_unlock_irqrestore+0x15/0x20
Oct 4 23:17:34 localhost kernel: RSP: 0018:ffff93dcfe443e28 EFLAGS: 00000286
Oct 4 23:17:34 localhost kernel: RAX: 0000000000000000 RBX: ffffffffb65f2432 RCX: 000000018040003f
Oct 4 23:17:34 localhost kernel: RDX: 0000000180400040 RSI: 0000000000000286 RDI: 0000000000000286
Oct 4 23:17:34 localhost kernel: RBP: ffff93dcfe443e28 R08: ffff93dadeeae800 R09: 000000018040003f
Oct 4 23:17:34 localhost kernel: R10: 0000000000000001 R11: ffff93dadeeae800 R12: ffff93dcfe443d98
Oct 4 23:17:34 localhost kernel: R13: ffffffffb678cef2 R14: ffff93dcfe443e28 R15: ffffd732df802218
Oct 4 23:17:34 localhost kernel: FS: 00007fb82e7fc700(0000) GS:ffff93dcfe440000(0000) knlGS:0000000000000000
Oct 4 23:17:34 localhost kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 4 23:17:34 localhost kernel: CR2: 0000000012999000 CR3: 0000000278fd0000 CR4: 0000000000340fe0
Oct 4 23:17:34 localhost kernel: Call Trace:
Oct 4 23:17:34 localhost kernel: <IRQ>
Oct 4 23:17:34 localhost kernel: [<ffffffffb65f4ed6>] queue_flush_timeout+0x66/0xa0
Oct 4 23:17:34 localhost kernel: [<ffffffffb65f4e70>] ? dma_ops_domain_flush_tlb+0x40/0x40
Oct 4 23:17:34 localhost kernel: [<ffffffffb60ab238>] call_timer_fn+0x38/0x110
Oct 4 23:17:34 localhost kernel: [<ffffffffb65f4e70>] ? dma_ops_domain_flush_tlb+0x40/0x40
Oct 4 23:17:34 localhost kernel: [<ffffffffb60ad69d>] run_timer_softirq+0x24d/0x300
Oct 4 23:17:34 localhost kernel: [<ffffffffb60a41e5>] __do_softirq+0xf5/0x280
Oct 4 23:17:34 localhost kernel: [<ffffffffb678f42c>] call_softirq+0x1c/0x30
Oct 4 23:17:34 localhost kernel: [<ffffffffb602f675>] do_softirq+0x65/0xa0
Oct 4 23:17:34 localhost kernel: [<ffffffffb60a4565>] irq_exit+0x105/0x110
Oct 4 23:17:34 localhost kernel: [<ffffffffb67907f8>] smp_apic_timer_interrupt+0x48/0x60
Oct 4 23:17:34 localhost kernel: [<ffffffffb678cef2>] apic_timer_interrupt+0x162/0x170
Oct 4 23:17:34 localhost kernel: <EOI>
Oct 4 23:17:34 localhost kernel: [<ffffffffb67817c5>] ? _raw_spin_unlock_irqrestore+0x15/0x20
Oct 4 23:17:34 localhost kernel: [<ffffffffb65f20d3>] alloc_iova+0x153/0x180
Oct 4 23:17:34 localhost kernel: [<ffffffffb65f2dbb>] alloc_iova_fast+0x4b/0xb0
Oct 4 23:17:34 localhost kernel: [<ffffffffb65f457a>] dma_ops_alloc_iova.isra.23+0x7a/0x90
Oct 4 23:17:34 localhost kernel: [<ffffffffb65f78b5>] map_sg+0x75/0x2f0
Oct 4 23:17:34 localhost kernel: [<ffffffffc0284490>] nvme_queue_rq+0x320/0x820 [nvme]
Oct 4 23:17:34 localhost kernel: [<ffffffffb63b3dbd>] ? sbitmap_get+0x5d/0xb0
Oct 4 23:17:34 localhost kernel: [<ffffffffb635d461>] ? __blk_mq_get_tag+0x21/0x90
Oct 4 23:17:34 localhost kernel: [<ffffffffb635aad5>] __blk_mq_try_issue_directly+0x135/0x1a0
Oct 4 23:17:34 localhost kernel: [<ffffffffb635ab6d>] blk_mq_try_issue_directly+0x2d/0xb0
Oct 4 23:17:34 localhost kernel: [<ffffffffb635af75>] blk_mq_make_request+0x385/0x630
Oct 4 23:17:34 localhost kernel: [<ffffffffb634ebf7>] generic_make_request+0x147/0x380
Oct 4 23:17:34 localhost kernel: [<ffffffffb634eea0>] submit_bio+0x70/0x150
Oct 4 23:17:34 localhost kernel: [<ffffffffc0527cd1>] xfs_submit_ioend.isra.12+0x61/0xe0 [xfs]
Oct 4 23:17:34 localhost kernel: [<ffffffffc052800f>] xfs_vm_writepages+0x7f/0xa0 [xfs]
Oct 4 23:17:34 localhost kernel: [<ffffffffb61c8d31>] do_writepages+0x21/0x50
Oct 4 23:17:34 localhost kernel: [<ffffffffb61bd4b5>] __filemap_fdatawrite_range+0x65/0x80
Oct 4 23:17:34 localhost kernel: [<ffffffffb61bd601>] filemap_write_and_wait_range+0x41/0x90
Oct 4 23:17:34 localhost kernel: [<ffffffffc05329a6>] xfs_file_fsync+0x66/0x1c0 [xfs]
Oct 4 23:17:34 localhost kernel: [<ffffffffb627d9f7>] do_fsync+0x67/0xb0
Oct 4 23:17:34 localhost kernel: [<ffffffffb627dce0>] SyS_fsync+0x10/0x20
Oct 4 23:17:34 localhost kernel: [<ffffffffb678bede>] system_call_fastpath+0x25/0x2a
- And followed by a
kernel panic
due to ahard lockup
:
[12664.516901] Kernel panic - not syncing: Hard LOCKUP
[12664.516917] CPU: 129 PID: 0 Comm: swapper/129 Kdump: loaded Tainted: G W L ------------ 3.10.0-1127.el7.x86_64 #1
[12664.516945] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/18/2019
[12664.516968] Call Trace:
[12664.516976] <NMI> [<ffffffffb2b7ff85>] dump_stack+0x19/0x1b
[12664.516998] [<ffffffffb2b79521>] panic+0xe8/0x21f
[12664.517015] [<ffffffffb242f958>] ? show_regs+0x58/0x290
[12664.517033] [<ffffffffb249ba6f>] nmi_panic+0x3f/0x40
[12664.517050] [<ffffffffb254f751>] watchdog_overflow_callback+0x121/0x140
[12664.517071] [<ffffffffb25a9037>] __perf_event_overflow+0x57/0x100
[12664.517091] [<ffffffffb25b2834>] perf_event_overflow+0x14/0x20
[12664.517112] [<ffffffffb2405585>] x86_pmu_handle_irq+0x125/0x180
[12664.517130] [<ffffffffb24066e5>] amd_pmu_handle_irq+0x35/0x80
[12664.517147] [<ffffffffb2b89031>] perf_event_nmi_handler+0x31/0x50
[12664.517166] [<ffffffffb2b8a93c>] nmi_handle.isra.0+0x8c/0x150
[12664.517182] [<ffffffffb2b8ac18>] do_nmi+0x218/0x460
[12664.517197] [<ffffffffb2b89d9c>] end_repeat_nmi+0x1e/0x81
[12664.517215] [<ffffffffb2517fd2>] ? native_queued_spin_lock_slowpath+0x122/0x200
[12664.517429] [<ffffffffb2517fd2>] ? native_queued_spin_lock_slowpath+0x122/0x200
[12664.517448] [<ffffffffb2517fd2>] ? native_queued_spin_lock_slowpath+0x122/0x200
[12664.517469] <EOE> <IRQ> [<ffffffffb2b7a004>] queued_spin_lock_slowpath+0xb/0xf
[12664.517493] [<ffffffffb2b88737>] _raw_spin_lock_irqsave+0x37/0x40
[12664.517512] [<ffffffffb29f7fe9>] find_iova+0x19/0x40
[12664.517526] [<ffffffffb29f87b2>] free_iova+0x12/0x30
[12664.517541] [<ffffffffb29f8a56>] free_iova_fast+0x26/0x30
[12664.517556] [<ffffffffb29fa744>] queue_ring_free_flushed+0x64/0xb0
[12664.517574] [<ffffffffb29fc855>] __unmap_single.isra.22+0x1e5/0x200
[12664.517591] [<ffffffffb29fdbdf>] unmap_sg+0x5f/0x70
[12664.517610] [<ffffffffb28ed341>] scsi_dma_unmap+0x61/0x80
[12664.517630] [<ffffffffc03558df>] pqi_aio_io_complete+0x1f/0x130 [smartpqi]
[12664.517651] [<ffffffffc034df18>] pqi_irq_handler+0x128/0x740 [smartpqi]
[12664.517671] [<ffffffffb25507c4>] __handle_irq_event_percpu+0x44/0x1c0
[12664.517690] [<ffffffffb2550972>] handle_irq_event_percpu+0x32/0x80
[12664.517708] [<ffffffffb25509fc>] handle_irq_event+0x3c/0x60
[12664.517725] [<ffffffffb25537ef>] handle_edge_irq+0x7f/0x150
[12664.517742] [<ffffffffb242f5f4>] handle_irq+0xe4/0x1a0
[12664.517757] [<ffffffffb2b9786d>] do_IRQ+0x4d/0xf0
[12664.517772] [<ffffffffb2b8936a>] common_interrupt+0x16a/0x16a
[12664.517789] [<ffffffffb29fb2dd>] ? queue_flush_timeout+0x6d/0xa0
[12664.518514] [<ffffffffb29fb2d6>] ? queue_flush_timeout+0x66/0xa0
[12664.519026] [<ffffffffb29fb270>] ? dma_ops_domain_flush_tlb+0x40/0x40
[12664.519537] [<ffffffffb24ac7c8>] call_timer_fn+0x38/0x110
[12664.520048] [<ffffffffb29fb270>] ? dma_ops_domain_flush_tlb+0x40/0x40
[12664.520548] [<ffffffffb24aec5d>] run_timer_softirq+0x24d/0x300
[12664.521039] [<ffffffffb24a5695>] __do_softirq+0xf5/0x280
[12664.521506] [<ffffffffb2b9642c>] call_softirq+0x1c/0x30
[12664.521951] [<ffffffffb242f715>] do_softirq+0x65/0xa0
[12664.522389] [<ffffffffb24a5a15>] irq_exit+0x105/0x110
[12664.522820] [<ffffffffb2b979c8>] smp_apic_timer_interrupt+0x48/0x60
[12664.523244] [<ffffffffb2b93efa>] apic_timer_interrupt+0x16a/0x170
[12664.523651] <EOI> [<ffffffffb2b87c20>] ? __cpuidle_text_start+0x8/0x8
[12664.524061] [<ffffffffb2b87e6b>] ? native_safe_halt+0xb/0x20
[12664.524470] [<ffffffffb2b87c3e>] default_idle+0x1e/0xc0
[12664.524872] [<ffffffffb2437c80>] arch_cpu_idle+0x20/0xc0
[12664.525275] [<ffffffffb2501c2a>] cpu_startup_entry+0x14a/0x1e0
[12664.525680] [<ffffffffb245a517>] start_secondary+0x1f7/0x270
[12664.526083] [<ffffffffb24000d5>] start_cpu+0x5/0x14
Environment
- Red Hat Enterprise Linux 7
AMD
processors withIOMMU
enabled
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.