RHEL7 kernel-rt crashes upon a blocked task detection where a cgroup reparenting is being stuck in a loop indefinitely.

Solution Unverified - Updated -

Issue

  • RHEL7 kernel-rt crashes upon a blocked task detection:
[2985306.437976] Kernel panic - not syncing: hung_task: blocked tasks
[2985306.437978] CPU: 60 PID: 798 Comm: khungtaskd Kdump: loaded Tainted: G        W  OE  ------------ T 3.10.0-1160.24.1.rt56.1161.el7.x86_64 #1
[2985306.437979] Hardware name: Quanta Cloud Technology Inc. QuantaGrid D52BE-2U 1S5BU9Z003W/S5BE-MB 3UPI (LBG-1G), BIOS 3B13.RTN05 04/20/2020
[2985306.437979] Call Trace:
[2985306.437983]  [<ffffffffb6177fe5>] dump_stack+0x19/0x1b
[2985306.437987]  [<ffffffffb6172145>] panic+0xe8/0x21f
[2985306.437991]  [<ffffffffb5b44d50>] watchdog+0x2b0/0x330
[2985306.437994]  [<ffffffffb5b44aa0>] ? reset_hung_task_detector+0x20/0x20
[2985306.437997]  [<ffffffffb5ab9271>] kthread+0xd1/0xe0
[2985306.437999]  [<ffffffffb5ab91a0>] ? kthread_worker_fn+0x170/0x170
[2985306.438001]  [<ffffffffb618a077>] ret_from_fork_nospec_begin+0x21/0x21
[2985306.438004]  [<ffffffffb5ab91a0>] ? kthread_worker_fn+0x170/0x170
  • A cgroup reparenting is stuck in a loop in mem_cgroup_reparent_charges() with cgroup_mutex being held:
PID: 164110  TASK: ffff9900b0a1a2c0  CPU: 0   COMMAND: "runc"
 #0 [ffff9907477a7af0] __schedule at ffffffffb617d256
 #1 [ffff9907477a7b88] schedule at ffffffffb617d790
 #2 [ffff9907477a7ba0] schedule_timeout at ffffffffb617b63c
 #3 [ffff9907477a7c48] wait_for_completion at ffffffffb617c5c4
 #4 [ffff9907477a7c98] wait_rcu_gp at ffffffffb5ab5abe
 #5 [ffff9907477a7cf8] synchronize_rcu at ffffffffb5b503df
 #6 [ffff9907477a7d08] synchronize_rcu at ffffffffb5b50418
 #7 [ffff9907477a7d18] mem_cgroup_start_move at ffffffffb5c2985c
 #8 [ffff9907477a7d28] mem_cgroup_reparent_charges at ffffffffb5c2d215
 #9 [ffff9907477a7d98] mem_cgroup_css_offline at ffffffffb5c2d6f3
#10 [ffff9907477a7dd0] cgroup_destroy_locked at ffffffffb5b21d87
#11 [ffff9907477a7e18] cgroup_rmdir at ffffffffb5b22065
 ...
  • Since the reparenting is stuck in the loop with cgroup_mutex being held, many tasks are stuck waiting for that cgroup_mutex with backtraces like this:
PID: 1      TASK: ffff990763d20000  CPU: 1   COMMAND: "systemd"
 #0 [ffff98ef68fbbc00] __schedule at ffffffffb617d256
 #1 [ffff98ef68fbbc98] schedule at ffffffffb617d790
 #2 [ffff98ef68fbbcb0] __rt_mutex_slowlock at ffffffffb617e31d
 #3 [ffff98ef68fbbd10] rt_mutex_slowlock_locked at ffffffffb617e7c3
 #4 [ffff98ef68fbbd60] rt_mutex_slowlock at ffffffffb617e93c
 #5 [ffff98ef68fbbdf8] rt_mutex_lock at ffffffffb617ea2f
 #6 [ffff98ef68fbbe10] _mutex_lock at ffffffffb618018e
 #7 [ffff98ef68fbbe20] proc_cgroup_show at ffffffffb5b232c6
 #8 [ffff98ef68fbbe68] seq_read at ffffffffb5c661f0
 ...

Environment

  • Red Hat Enterprise Linux 7 Realtime (kernel-rt-3.10.0-1160.24.1.rt56.1161.el7.x86_64)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In