内核会因为等待 kthreadd 创建 worker 的阻塞任务而崩溃。负责创建 worker 的 kthreadd 忙于为区域压缩隔离页面。
Issue
- 由于等待 kthreadd 创建 worker 的阻塞任务,内核崩溃。
[1140818.458383] INFO: task kworker/2:0:70006 blocked for more than 120 seconds.
[1140818.458453] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1140818.458511] kworker/2:0 D ffff8ebf56c05780 0 70006 2 0x00000080
[1140818.458523] Call Trace:
[1140818.458534] [<ffffffffb2f89199>] schedule+0x29/0x70
[1140818.458538] [<ffffffffb2f86e61>] schedule_timeout+0x221/0x2d0
[1140818.458544] [<ffffffffb28d7402>] ? check_preempt_curr+0x92/0xa0
[1140818.458546] [<ffffffffb28d7429>] ? ttwu_do_wakeup+0x19/0xe0
[1140818.458548] [<ffffffffb2f8954d>] wait_for_completion+0xfd/0x140
[1140818.458551] [<ffffffffb28dadd0>] ? wake_up_state+0x20/0x20
[1140818.458556] [<ffffffffb28c5c9a>] kthread_create_on_node+0xaa/0x140
[1140818.458562] [<ffffffffb28bee60>] ? manage_workers.isra.26+0x2a0/0x2a0
[1140818.458564] [<ffffffffb28bea2b>] create_worker+0xeb/0x200
[1140818.458566] [<ffffffffb28becb6>] manage_workers.isra.26+0xf6/0x2a0
[1140818.458568] [<ffffffffb28bf1e3>] worker_thread+0x383/0x3c0
[1140818.458571] [<ffffffffb28bee60>] ? manage_workers.isra.26+0x2a0/0x2a0
[1140818.458572] [<ffffffffb28c5e41>] kthread+0xd1/0xe0
[1140818.458575] [<ffffffffb28c5d70>] ? insert_kthread_work+0x40/0x40
[1140818.458580] [<ffffffffb2f95ddd>] ret_from_fork_nospec_begin+0x7/0x21
[1140818.458582] [<ffffffffb28c5d70>] ? insert_kthread_work+0x40/0x40
...
- 此时,负责创建 worker 的 kthreadd 忙于为区域压缩隔离页面。
[1140818.459880] CPU: 5 PID: 813 Comm: systemd-udevd Kdump: loaded Tainted: P OE ------------ 3.10.0-1160.36.2.el7.x86_64 #1
[1140818.459881] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/19/2018
[1140818.459882] task: ffff8ed030cb2100 ti: ffff8ed030e68000 task.ti: ffff8ed030e68000
[1140818.459883] RIP: 0010:[<ffffffffb29e6e63>] [<ffffffffb29e6e63>] isolate_freepages_block+0x83/0x320
[1140818.459884] RSP: 0018:ffff8ed030e6b9a8 EFLAGS: 00000206
[1140818.459886] RAX: 00000000ffffffff RBX: 0000000000cfd9d1 RCX: ffff8ed030e6bbf0
[1140818.459887] RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffff8ed07ffda068
[1140818.459887] RBP: ffff8ed030e6ba40 R08: ffff8ed030e6bc08 R09: 0000000000cfda00
[1140818.459888] R10: ffff8ed07ffda000 R11: ffff8ed030e6bfd8 R12: ffffc9a3f3f60000
[1140818.459889] R13: ffffc9a3f3f67440 R14: 0000000000000000 R15: 00000000000001d1
[1140818.459890] FS: 00007fb3be3348c0(0000) GS:ffff8ed035740000(0000) knlGS:0000000000000000
[1140818.459891] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1140818.459892] CR2: 00007f1bcb8a5000 CR3: 0000001120442000 CR4: 00000000003607e0
[1140818.459893] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1140818.459893] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[1140818.459894] Call Trace:
[1140818.459895] [<ffffffffb29e7468>] compaction_alloc+0x248/0x360
[1140818.459896] [<ffffffffb2a305ac>] migrate_pages+0x21c/0x7f0
[1140818.459897] [<ffffffffb29e7220>] ? __reset_isolation_suitable+0x120/0x120
[1140818.459904] [<ffffffffb29e838d>] compact_zone+0x34d/0x4d0
[1140818.459904] [<ffffffffb29e857d>] compact_zone_order+0x6d/0xa0
[1140818.459905] [<ffffffffb29e8921>] try_to_compact_pages+0x121/0x1a0
[1140818.459906] [<ffffffffb2f7ea53>] __alloc_pages_direct_compact+0xac/0x193
[1140818.459907] [<ffffffffb29c951c>] __alloc_pages_nodemask+0x7bc/0xbe0
[1140818.459908] [<ffffffffb289880d>] copy_process+0x1dd/0x1a80
[1140818.459909] [<ffffffffb289a261>] do_fork+0x91/0x330
[1140818.459910] [<ffffffffb289a586>] SyS_clone+0x16/0x20
[1140818.459911] [<ffffffffb2f96374>] stub_clone+0x44/0x70
[1140818.459911] [<ffffffffb2f95f92>] ? system_call_fastpath+0x25/0x2a
[1140818.459913] Code: 4f 58 48 83 c0 40 45 89 c6 45 31 e4 c7 45 b4 00 00 00 00 45 31 ff 48 89 4d a0 48 89 45 98 f6 c3 1f 0f 84 b1 00 00 00 41 8b 45 18 <41> 83 c7 01 4d 85 e4 4d 0f 44 e5 83 f8 80 0f 84 d1 00 00 00 45
...
- 最后,由于阻塞的任务,内核崩溃。
[1140818.459915] Kernel panic - not syncing: hung_task: blocked tasks
[1140818.459981] CPU: 0 PID: 40 Comm: khungtaskd Kdump: loaded Tainted: P OE ------------ 3.10.0-1160.36.2.el7.x86_64 #1
[1140818.460114] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/19/2018
[1140818.460159] Call Trace:
[1140818.460205] [<ffffffffb2f83559>] dump_stack+0x19/0x1b
[1140818.460254] [<ffffffffb2f7d261>] panic+0xe8/0x21f
[1140818.460299] [<ffffffffb294e93e>] watchdog+0x26e/0x2c0
[1140818.460358] [<ffffffffb294e6d0>] ? reset_hung_task_detector+0x20/0x20
[1140818.460419] [<ffffffffb28c5e41>] kthread+0xd1/0xe0
[1140818.460475] [<ffffffffb28c5d70>] ? insert_kthread_work+0x40/0x40
[1140818.460500] [<ffffffffb2f95ddd>] ret_from_fork_nospec_begin+0x7/0x21
[1140818.471874] [<ffffffffb28c5d70>] ? insert_kthread_work+0x40/0x40
Environment
- Red Hat Enterprise Linux 7
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.