Hard lockup while scheduling out the 'migration/x' thread or while handling page fault with migrate_swap()->stop_two_cpus()

Solution Verified - Updated -

Issue

Machine panicked due to hard lockup of a CPU with one of the following messages:

NMI watchdog: Watchdog detected hard LOCKUP on cpu 18
...
CPU: 18 PID: 98 Comm: migration/18 Not tainted 3.10.0-693.2.1.el7.x86_64 #1
...
Call Trace:
 <NMI>  [<ffffffff816a3db1>] dump_stack+0x19/0x1b
...
 [<ffffffff810fa3d0>] ? native_queued_spin_lock_slowpath+0x1c0/0x1e0
 <<EOE>>  [<ffffffff8169e61f>] queued_spin_lock_slowpath+0xb/0xf
 [<ffffffff816abbf7>] _raw_spin_lock_irqsave+0x37/0x40
 [<ffffffff81116730>] cpu_stop_queue_work+0x30/0x70
 [<ffffffff81116f00>] stop_one_cpu_nowait+0x30/0x40
 [<ffffffff810d227d>] load_balance+0x81d/0x9a0
 [<ffffffff810d2a61>] idle_balance+0x1d1/0x250
 [<ffffffff816a948a>] __schedule+0x87a/0x8b0
 [<ffffffff816a94e9>] schedule+0x29/0x70
 [<ffffffff810b9045>] smpboot_thread_fn+0xd5/0x180
...
 [<ffffffff810b098f>] kthread+0xcf/0xe0
...
 [<ffffffff816b4f58>] ret_from_fork+0x58/0x90
...
Kernel panic - not syncing: Hard LOCKUP

or

NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
...
CPU: 1 PID: 13 Comm: migration/1 Tainted: G           OE  ------------   3.10.0-514.36.5.el7.x86_64 #1
...
Call Trace:
 <NMI>  [<ffffffff8168b98f>] dump_stack+0x19/0x1b
...
 [<ffffffff81693350>] ? _raw_spin_lock_irqsave+0x40/0x60
 <<EOE>>  [<ffffffff811196c0>] cpu_stop_queue_work+0x30/0x70
 [<ffffffff81119ef0>] stop_one_cpu_nowait+0x30/0x40
 [<ffffffff810d5f3d>] load_balance+0x80d/0x990
 [<ffffffff810d6749>] idle_balance+0x1c9/0x250
 [<ffffffff81690f1a>] __schedule+0x87a/0x940
 [<ffffffff81691009>] schedule+0x29/0x70
 [<ffffffff810bc23d>] smpboot_thread_fn+0xdd/0x180
...
 [<ffffffff810b371f>] kthread+0xcf/0xe0
...
 [<ffffffff8169d2d8>] ret_from_fork+0x58/0x90
...
Kernel panic - not syncing: Hard LOCKUP

or

NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
...
CPU: 1 PID: 101010 Comm: stress Not tainted 3.10.0-693.25.2.el7.x86_64 #1
...
Call Trace:
 <NMI>  [<ffffffff816ae7d8>] dump_stack+0x19/0x1b
...
 [<ffffffff810c784d>] ? try_to_wake_up+0x6d/0x350
 <<EOE>>  [<ffffffff810c7b45>] wake_up_process+0x15/0x20
 [<ffffffff8111a3d5>] __cpu_stop_queue_work+0x25/0x30
 [<ffffffff8111ab57>] stop_two_cpus+0x197/0x230
...
 [<ffffffff810c5257>] migrate_swap+0xb7/0x130
 [<ffffffff810ce19a>] task_numa_migrate+0x1ea/0x5b0
 [<ffffffff810ce5b3>] numa_migrate_preferred+0x53/0x60
 [<ffffffff810d0680>] task_numa_fault+0x8d0/0xbb0
 [<ffffffff811b5399>] do_numa_page+0x159/0x1e0
 [<ffffffff811b6787>] handle_mm_fault+0x6b7/0xfa0
...
 [<ffffffff816bb504>] __do_page_fault+0x154/0x450
 [<ffffffff816bb835>] do_page_fault+0x35/0x90
 [<ffffffff816b7768>] page_fault+0x28/0x30
Kernel panic - not syncing: Hard LOCKUP

or

NMI watchdog: Watchdog detected hard LOCKUP on cpu 3
...
CPU: 3 PID: 9100 Comm: java Kdump: loaded Tainted: P        W  OE  ------------   3.10.0-862.14.4.el7.x86_64 #1
...
Call Trace:
 <NMI>  [<ffffffff93f13754>] dump_stack+0x19/0x1b
...
 [<ffffffff938d1c6a>] ? try_to_wake_up+0x6a/0x350
 <EOE>  [<ffffffff938d1f65>] wake_up_process+0x15/0x20
 [<ffffffff93929775>] __cpu_stop_queue_work+0x25/0x30
 [<ffffffff93929ef7>] stop_two_cpus+0x197/0x230
...
 [<ffffffff938cf277>] migrate_swap+0xb7/0x130
 [<ffffffff938d82fa>] task_numa_migrate+0x1ea/0x5b0
...
 [<ffffffff938d8713>] numa_migrate_preferred+0x53/0x60
 [<ffffffff938da7f0>] task_numa_fault+0x8d0/0xbb0
 [<ffffffff939c81be>] do_numa_page+0x1be/0x250
 [<ffffffff939c8566>] handle_pte_fault+0x316/0xd10
 [<ffffffff939caefd>] handle_mm_fault+0x39d/0x9b0
...
 [<ffffffff93f20547>] __do_page_fault+0x197/0x4f0
 [<ffffffff93f208d5>] do_page_fault+0x35/0x90
 [<ffffffff93f1c758>] page_fault+0x28/0x30
Kernel panic - not syncing: Hard LOCKUP

The functions above are important, the actual numbers not that much - they can differ based on kernel version and configuration.

Environment

  • Red Hat Enterprise Linux 7

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content