RHEL7 kernel-rt crashes upon a blocked task detection where a cgroup reparenting is being stuck in a loop indefinitely.

Solution Verified - Updated -

Issue

  • RHEL7 kernel-rt crashes upon a blocked task detection:
[2985306.437976] Kernel panic - not syncing: hung_task: blocked tasks
[2985306.437978] CPU: 60 PID: 798 Comm: khungtaskd Kdump: loaded Tainted: G        W  OE  ------------ T 3.10.0-1160.24.1.rt56.1161.el7.x86_64 #1
[2985306.437979] Hardware name: Quanta Cloud Technology Inc. QuantaGrid D52BE-2U 1S5BU9Z003W/S5BE-MB 3UPI (LBG-1G), BIOS 3B13.RTN05 04/20/2020
[2985306.437979] Call Trace:
[2985306.437983]  [<ffffffffb6177fe5>] dump_stack+0x19/0x1b
[2985306.437987]  [<ffffffffb6172145>] panic+0xe8/0x21f
[2985306.437991]  [<ffffffffb5b44d50>] watchdog+0x2b0/0x330
[2985306.437994]  [<ffffffffb5b44aa0>] ? reset_hung_task_detector+0x20/0x20
[2985306.437997]  [<ffffffffb5ab9271>] kthread+0xd1/0xe0
[2985306.437999]  [<ffffffffb5ab91a0>] ? kthread_worker_fn+0x170/0x170
[2985306.438001]  [<ffffffffb618a077>] ret_from_fork_nospec_begin+0x21/0x21
[2985306.438004]  [<ffffffffb5ab91a0>] ? kthread_worker_fn+0x170/0x170
  • A cgroup reparenting is stuck in a loop in mem_cgroup_reparent_charges() with cgroup_mutex being held:
PID: 164110  TASK: ffff9900b0a1a2c0  CPU: 0   COMMAND: "runc"
 #0 [ffff9907477a7af0] __schedule at ffffffffb617d256
 #1 [ffff9907477a7b88] schedule at ffffffffb617d790
 #2 [ffff9907477a7ba0] schedule_timeout at ffffffffb617b63c
 #3 [ffff9907477a7c48] wait_for_completion at ffffffffb617c5c4
 #4 [ffff9907477a7c98] wait_rcu_gp at ffffffffb5ab5abe
 #5 [ffff9907477a7cf8] synchronize_rcu at ffffffffb5b503df
 #6 [ffff9907477a7d08] synchronize_rcu at ffffffffb5b50418
 #7 [ffff9907477a7d18] mem_cgroup_start_move at ffffffffb5c2985c
 #8 [ffff9907477a7d28] mem_cgroup_reparent_charges at ffffffffb5c2d215
 #9 [ffff9907477a7d98] mem_cgroup_css_offline at ffffffffb5c2d6f3
#10 [ffff9907477a7dd0] cgroup_destroy_locked at ffffffffb5b21d87
#11 [ffff9907477a7e18] cgroup_rmdir at ffffffffb5b22065
 ...
  • Since the reparenting is stuck in the loop with cgroup_mutex being held, many tasks are stuck waiting for that cgroup_mutex with backtraces like this:
PID: 1      TASK: ffff990763d20000  CPU: 1   COMMAND: "systemd"
 #0 [ffff98ef68fbbc00] __schedule at ffffffffb617d256
 #1 [ffff98ef68fbbc98] schedule at ffffffffb617d790
 #2 [ffff98ef68fbbcb0] __rt_mutex_slowlock at ffffffffb617e31d
 #3 [ffff98ef68fbbd10] rt_mutex_slowlock_locked at ffffffffb617e7c3
 #4 [ffff98ef68fbbd60] rt_mutex_slowlock at ffffffffb617e93c
 #5 [ffff98ef68fbbdf8] rt_mutex_lock at ffffffffb617ea2f
 #6 [ffff98ef68fbbe10] _mutex_lock at ffffffffb618018e
 #7 [ffff98ef68fbbe20] proc_cgroup_show at ffffffffb5b232c6
 #8 [ffff98ef68fbbe68] seq_read at ffffffffb5c661f0
 ...

Environment

  • Red Hat Enterprise Linux 7 Realtime (kernel-rt-3.10.0-1160.24.1.rt56.1161.el7.x86_64)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content