RHEL7 kernel-rt crashes upon a blocked task detection where a cgroup reparenting is being stuck in a loop indefinitely.
Issue
- RHEL7 kernel-rt crashes upon a blocked task detection:
[2985306.437976] Kernel panic - not syncing: hung_task: blocked tasks
[2985306.437978] CPU: 60 PID: 798 Comm: khungtaskd Kdump: loaded Tainted: G W OE ------------ T 3.10.0-1160.24.1.rt56.1161.el7.x86_64 #1
[2985306.437979] Hardware name: Quanta Cloud Technology Inc. QuantaGrid D52BE-2U 1S5BU9Z003W/S5BE-MB 3UPI (LBG-1G), BIOS 3B13.RTN05 04/20/2020
[2985306.437979] Call Trace:
[2985306.437983] [<ffffffffb6177fe5>] dump_stack+0x19/0x1b
[2985306.437987] [<ffffffffb6172145>] panic+0xe8/0x21f
[2985306.437991] [<ffffffffb5b44d50>] watchdog+0x2b0/0x330
[2985306.437994] [<ffffffffb5b44aa0>] ? reset_hung_task_detector+0x20/0x20
[2985306.437997] [<ffffffffb5ab9271>] kthread+0xd1/0xe0
[2985306.437999] [<ffffffffb5ab91a0>] ? kthread_worker_fn+0x170/0x170
[2985306.438001] [<ffffffffb618a077>] ret_from_fork_nospec_begin+0x21/0x21
[2985306.438004] [<ffffffffb5ab91a0>] ? kthread_worker_fn+0x170/0x170
- A cgroup reparenting is stuck in a loop in mem_cgroup_reparent_charges() with cgroup_mutex being held:
PID: 164110 TASK: ffff9900b0a1a2c0 CPU: 0 COMMAND: "runc"
#0 [ffff9907477a7af0] __schedule at ffffffffb617d256
#1 [ffff9907477a7b88] schedule at ffffffffb617d790
#2 [ffff9907477a7ba0] schedule_timeout at ffffffffb617b63c
#3 [ffff9907477a7c48] wait_for_completion at ffffffffb617c5c4
#4 [ffff9907477a7c98] wait_rcu_gp at ffffffffb5ab5abe
#5 [ffff9907477a7cf8] synchronize_rcu at ffffffffb5b503df
#6 [ffff9907477a7d08] synchronize_rcu at ffffffffb5b50418
#7 [ffff9907477a7d18] mem_cgroup_start_move at ffffffffb5c2985c
#8 [ffff9907477a7d28] mem_cgroup_reparent_charges at ffffffffb5c2d215
#9 [ffff9907477a7d98] mem_cgroup_css_offline at ffffffffb5c2d6f3
#10 [ffff9907477a7dd0] cgroup_destroy_locked at ffffffffb5b21d87
#11 [ffff9907477a7e18] cgroup_rmdir at ffffffffb5b22065
...
- Since the reparenting is stuck in the loop with cgroup_mutex being held, many tasks are stuck waiting for that cgroup_mutex with backtraces like this:
PID: 1 TASK: ffff990763d20000 CPU: 1 COMMAND: "systemd"
#0 [ffff98ef68fbbc00] __schedule at ffffffffb617d256
#1 [ffff98ef68fbbc98] schedule at ffffffffb617d790
#2 [ffff98ef68fbbcb0] __rt_mutex_slowlock at ffffffffb617e31d
#3 [ffff98ef68fbbd10] rt_mutex_slowlock_locked at ffffffffb617e7c3
#4 [ffff98ef68fbbd60] rt_mutex_slowlock at ffffffffb617e93c
#5 [ffff98ef68fbbdf8] rt_mutex_lock at ffffffffb617ea2f
#6 [ffff98ef68fbbe10] _mutex_lock at ffffffffb618018e
#7 [ffff98ef68fbbe20] proc_cgroup_show at ffffffffb5b232c6
#8 [ffff98ef68fbbe68] seq_read at ffffffffb5c661f0
...
Environment
- Red Hat Enterprise Linux 7 Realtime (kernel-rt-3.10.0-1160.24.1.rt56.1161.el7.x86_64)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.