Hard lockup while scheduling out the 'migration/x' thread or while handling page fault with migrate_swap()->stop_two_cpus()
Issue
Machine panicked due to hard lockup of a CPU with one of the following messages:
NMI watchdog: Watchdog detected hard LOCKUP on cpu 18
...
CPU: 18 PID: 98 Comm: migration/18 Not tainted 3.10.0-693.2.1.el7.x86_64 #1
...
Call Trace:
<NMI> [<ffffffff816a3db1>] dump_stack+0x19/0x1b
...
[<ffffffff810fa3d0>] ? native_queued_spin_lock_slowpath+0x1c0/0x1e0
<<EOE>> [<ffffffff8169e61f>] queued_spin_lock_slowpath+0xb/0xf
[<ffffffff816abbf7>] _raw_spin_lock_irqsave+0x37/0x40
[<ffffffff81116730>] cpu_stop_queue_work+0x30/0x70
[<ffffffff81116f00>] stop_one_cpu_nowait+0x30/0x40
[<ffffffff810d227d>] load_balance+0x81d/0x9a0
[<ffffffff810d2a61>] idle_balance+0x1d1/0x250
[<ffffffff816a948a>] __schedule+0x87a/0x8b0
[<ffffffff816a94e9>] schedule+0x29/0x70
[<ffffffff810b9045>] smpboot_thread_fn+0xd5/0x180
...
[<ffffffff810b098f>] kthread+0xcf/0xe0
...
[<ffffffff816b4f58>] ret_from_fork+0x58/0x90
...
Kernel panic - not syncing: Hard LOCKUP
or
NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
...
CPU: 1 PID: 13 Comm: migration/1 Tainted: G OE ------------ 3.10.0-514.36.5.el7.x86_64 #1
...
Call Trace:
<NMI> [<ffffffff8168b98f>] dump_stack+0x19/0x1b
...
[<ffffffff81693350>] ? _raw_spin_lock_irqsave+0x40/0x60
<<EOE>> [<ffffffff811196c0>] cpu_stop_queue_work+0x30/0x70
[<ffffffff81119ef0>] stop_one_cpu_nowait+0x30/0x40
[<ffffffff810d5f3d>] load_balance+0x80d/0x990
[<ffffffff810d6749>] idle_balance+0x1c9/0x250
[<ffffffff81690f1a>] __schedule+0x87a/0x940
[<ffffffff81691009>] schedule+0x29/0x70
[<ffffffff810bc23d>] smpboot_thread_fn+0xdd/0x180
...
[<ffffffff810b371f>] kthread+0xcf/0xe0
...
[<ffffffff8169d2d8>] ret_from_fork+0x58/0x90
...
Kernel panic - not syncing: Hard LOCKUP
or
NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
...
CPU: 1 PID: 101010 Comm: stress Not tainted 3.10.0-693.25.2.el7.x86_64 #1
...
Call Trace:
<NMI> [<ffffffff816ae7d8>] dump_stack+0x19/0x1b
...
[<ffffffff810c784d>] ? try_to_wake_up+0x6d/0x350
<<EOE>> [<ffffffff810c7b45>] wake_up_process+0x15/0x20
[<ffffffff8111a3d5>] __cpu_stop_queue_work+0x25/0x30
[<ffffffff8111ab57>] stop_two_cpus+0x197/0x230
...
[<ffffffff810c5257>] migrate_swap+0xb7/0x130
[<ffffffff810ce19a>] task_numa_migrate+0x1ea/0x5b0
[<ffffffff810ce5b3>] numa_migrate_preferred+0x53/0x60
[<ffffffff810d0680>] task_numa_fault+0x8d0/0xbb0
[<ffffffff811b5399>] do_numa_page+0x159/0x1e0
[<ffffffff811b6787>] handle_mm_fault+0x6b7/0xfa0
...
[<ffffffff816bb504>] __do_page_fault+0x154/0x450
[<ffffffff816bb835>] do_page_fault+0x35/0x90
[<ffffffff816b7768>] page_fault+0x28/0x30
Kernel panic - not syncing: Hard LOCKUP
or
NMI watchdog: Watchdog detected hard LOCKUP on cpu 3
...
CPU: 3 PID: 9100 Comm: java Kdump: loaded Tainted: P W OE ------------ 3.10.0-862.14.4.el7.x86_64 #1
...
Call Trace:
<NMI> [<ffffffff93f13754>] dump_stack+0x19/0x1b
...
[<ffffffff938d1c6a>] ? try_to_wake_up+0x6a/0x350
<EOE> [<ffffffff938d1f65>] wake_up_process+0x15/0x20
[<ffffffff93929775>] __cpu_stop_queue_work+0x25/0x30
[<ffffffff93929ef7>] stop_two_cpus+0x197/0x230
...
[<ffffffff938cf277>] migrate_swap+0xb7/0x130
[<ffffffff938d82fa>] task_numa_migrate+0x1ea/0x5b0
...
[<ffffffff938d8713>] numa_migrate_preferred+0x53/0x60
[<ffffffff938da7f0>] task_numa_fault+0x8d0/0xbb0
[<ffffffff939c81be>] do_numa_page+0x1be/0x250
[<ffffffff939c8566>] handle_pte_fault+0x316/0xd10
[<ffffffff939caefd>] handle_mm_fault+0x39d/0x9b0
...
[<ffffffff93f20547>] __do_page_fault+0x197/0x4f0
[<ffffffff93f208d5>] do_page_fault+0x35/0x90
[<ffffffff93f1c758>] page_fault+0x28/0x30
Kernel panic - not syncing: Hard LOCKUP
The functions above are important, the actual numbers not that much - they can differ based on kernel version and configuration.
Environment
- Red Hat Enterprise Linux 7
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.