Translated message

A translation of this page exists in English.

Warning message

This translation is outdated. For the most up-to-date information, please refer to the English version.

RHEL7: mem_cgroup_idr の破損によるシステムクラッシュ

Solution Verified - Updated 2022-12-22T07:55:36+00:00 -

Environment

Red Hat Enterprise Linux 7

Issue

mem_cgroup_idr オブジェクトは、破損や未定義の動作を引き起こす可能性のある調整されていない方法で更新されることがあります。Oops からの、以下のコールトレースが確認されます。

[1367899.105815] Call Trace:
[1367899.106437]  [<ffffffffb3dd418b>] shrink_zone+0x6b/0x1a0
[1367899.107072]  [<ffffffffb3dd4680>] do_try_to_free_pages+0xf0/0x520
[1367899.107789]  [<ffffffffb3dd4d0a>] try_to_free_mem_cgroup_pages+0xda/0x190
[1367899.108446]  [<ffffffffb3e3c7ce>] mem_cgroup_reclaim+0x4e/0x120
[1367899.109156]  [<ffffffffb3e3d19c>] __mem_cgroup_try_charge+0x4ec/0x670
[1367899.109871]  [<ffffffffb3e3e9cb>] __mem_cgroup_try_charge_swapin+0x9b/0xd0
[1367899.110544]  [<ffffffffb3e3f117>] mem_cgroup_try_charge_swapin+0x57/0x70
[1367899.111235]  [<ffffffffb3df1401>] handle_pte_fault+0x471/0xe20
[1367899.111948]  [<ffffffffb3df3ecd>] handle_mm_fault+0x39d/0x9b0
[1367899.112765]  [<ffffffffb4388653>] __do_page_fault+0x213/0x500
[1367899.113488]  [<ffffffffb4388975>] do_page_fault+0x35/0x90
[1367899.114269]  [<ffffffffb4384778>] page_fault+0x28/0x30

dmesg に以下のような (または類似の) メッセージが表示されることがあります。

  [617070.629636] <86>CPU: 19 PID: 33803 Comm: kworker/19:1 Kdump: loaded Not tainted 3.10.0-1062.9.1.el7.x86_64 #1
  [617070.629637] <86>Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 11/13/2019
  [617070.629646] <82>Workqueue: events free_work
  [617070.629647] <82>Call Trace:
  [617070.629658] <82> [<ffffffff9bb7ac23>] dump_stack+0x19/0x1b
  [617070.629664] <82> [<ffffffff9b782812>] idr_remove+0x282/0x290
  [617070.629666] <82> [<ffffffff9b639052>] __mem_cgroup_free+0x122/0x250
  [617070.629668] <82> [<ffffffff9b639195>] free_work+0x15/0x20
  [617070.629673] <82> [<ffffffff9b4be21f>] process_one_work+0x17f/0x440
  [617070.629676] <82> [<ffffffff9b4bf336>] worker_thread+0x126/0x3c0
  [617070.629678] <82> [<ffffffff9b4bf210>] ? manage_workers.isra.26+0x2a0/0x2a0
  [617070.629681] <82> [<ffffffff9b4c61f1>] kthread+0xd1/0xe0
  [617070.629684] <82> [<ffffffff9b4c6120>] ? insert_kthread_work+0x40/0x40
  [617070.629688] <82> [<ffffffff9bb8dd1d>] ret_from_fork_nospec_begin+0x7/0x21
  [617070.629690] <82> [<ffffffff9b4c6120>] ? insert_kthread_work+0x40/0x40

[3945264.164925] idr_remove called for id=1 which is not allocated.
[3945264.164936] CPU: 16 PID: 6827 Comm: kworker/16:2 Kdump: loaded Tainted: P           OE  -
----------- T 3.10.0-957.el7.x86_64 #1
[3945264.164940] Hardware name: HPE ProLiant ML350 Gen10/ProLiant ML350 Gen10, BIOS U41 07/16/
2020
[3945264.164953] Workqueue: events free_work
[3945264.164957] Call Trace:
[3945264.164972]  [<ffffffffbe161dc1>] dump_stack+0x19/0x1b
[3945264.164980]  [<ffffffffbdd76520>] idr_remove+0x160/0x290
[3945264.164988]  [<ffffffffbdc2ff22>] __mem_cgroup_free+0x122/0x250
[3945264.164995]  [<ffffffffbdc30065>] free_work+0x15/0x20
[3945264.165004]  [<ffffffffbdab9d4f>] process_one_work+0x17f/0x440
[3945264.165011]  [<ffffffffbdabade6>] worker_thread+0x126/0x3c0
[3945264.165018]  [<ffffffffbdabacc0>] ? manage_workers.isra.25+0x2a0/0x2a0
[3945264.165024]  [<ffffffffbdac1c31>] kthread+0xd1/0xe0
[3945264.165033]  [<ffffffffbdb0fda0>] ? SyS_futex+0x80/0x190
[3945264.165039]  [<ffffffffbdac1b60>] ? insert_kthread_work+0x40/0x40
[3945264.165048]  [<ffffffffbe174c1d>] ret_from_fork_nospec_begin+0x7/0x21
[3945264.165054]  [<ffffffffbdac1b60>] ? insert_kthread_work+0x40/0x40

Resolution

この問題は、次の Red Hat Enterprise Linux (RHEL) バージョンで修正されました。

RHEL のバージョン	エラータ	カーネルバージョン
7	RHSA-2020:4060	kernel-3.10.0-1160.el7
7.7 (EUS)	RHSA-2021:1531	kernel-3.10.0-1062.49.1.el7
7.6 (TUS)	RHSA-2021:2355	kernel-3.10.0-957.76.1.el7

Root Cause

mem_cgroup_idr オブジェクトが壊れていました。これには、単一のエントリーのみが含まれます。エントリーはそれ自体を指しています。つまり、実際の struct mem_cgroup オブジェクトではありません。

ソースコードレビューの結果、mm/memcontrol.c のコンテキストにおいて、mem_cgroup_idr を変更できる操作が適切にシリアライズされていないことが明らかになりました。そこで、指定された struct idr オブジェクトを変更できるすべての操作の排他的同期を確実にすることが、lib/idr.c コードユーザーの単独の責任となります。

以下は、調整されていない方法で mem_cgroup_idr を変更する方法の例です。


                            Thread 0                                            Thread 1

    cgroup_create
      for_each_subsys(root, ss)
        //ss->css_alloc(cgrp)
        mem_cgroup_alloc
        {
          id = idr_alloc(&mem_cgroup_idr, NULL,
                         1, MEM_CGROUP_ID_MAX,
                         GFP_KERNEL)
          if (id < 0)
            goto fail

          memcg->id = id

          memcg->stat = alloc_percpu(struct mem_cgroup_stat_cpu)

          if (!memcg->stat)
            goto out_free
                                                                        free_work
        out_free:                                                         __mem_cgroup_free
          if (memcg->id > 0) {                                              mem_cgroup_id_put
            idr_remove(&mem_cgroup_idr, memcg->id)                            idr_remove(&mem_cgroup_idr, memcg->id)
          }
        }

Diagnostic Steps

mem_cgroup_id_put() を参照してください。mem_cgroup_id_put () のコンテキストでは、memcg CSS ID と対応する mem_cgroup エントリーが mem_cgroup_idr から削除された後、指定された mem_cgroup オブジェクトの id フィールドが 0 に設定されます。ここで、"java" (つまり PID 316) というタスクを例として考えると、それがメモリーグループ内にあることを確認できますが、mem_cgroup_idr にはエントリーがありません。

crash> ps -p 316
PID: 0      TASK: ffffffff9c018480  CPU: 0   COMMAND: "swapper/0"
 PID: 1      TASK: ffff9d2f53928000  CPU: 2   COMMAND: "systemd"
  PID: 10238  TASK: ffff9d4e6b2820e0  CPU: 34  COMMAND: "dockerd-current"
   PID: 23091  TASK: ffff9d4e6eb941c0  CPU: 13  COMMAND: "docker-containe"
    PID: 40799  TASK: ffff9d24536041c0  CPU: 25  COMMAND: "docker-containe"
     PID: 316    TASK: ffff9d238d70c1c0  CPU: 35  COMMAND: "java"

crash> enum mem_cgroup_subsys_id
enum cgroup_subsys_id = 3

crash> p ((struct task_struct *)0xffff9d238d70c1c0)->cgroups.subsys[3].cgroup.dentry
$3 = (struct dentry *) 0xffff9d1836a36e40

crash> files -d 0xffff9d1836a36e40
     DENTRY           INODE           SUPERBLK     TYPE PATH
ffff9d1836a36e40 ffff9d2ca8acfa90 ffff9d4e7d6a6800 DIR  /sys/fs/cgroup/memory/system.slice/docker-2bba4e4ecfb057067701715bd458a17213013059e53b2bf3b09f3f1bf4dd7cf7.scope

mem_cgroup_from_css() を参照してください。

crash> p &((struct mem_cgroup *)0x0)->css
$4 = (struct cgroup_subsys_state *) 0x0

crash> p ((struct task_struct *)0xffff9d238d70c1c0)->cgroups.subsys[3]
$5 = (struct cgroup_subsys_state *) 0xffff9d3179a4f400

crash> pd ((struct mem_cgroup *)0xffff9d3179a4f400)->id
$6 = 157

番号 157 は idr_layer::bitmap に存在しません。番号 1 のみが存在します。ida_remove() を参照してください。

crash> sym mem_cgroup_idr
ffffffff9c6277e0 (b) mem_cgroup_idr

crash> p *(struct idr *)0xffffffff9c6277e0
$7 = {
  hint = 0x0,
  top = ffff9d2e68672940,
  id_free = 0x0,
  layers = 0x2,
  id_free_cnt = 0x0,
  cur = 0x0,
  lock = {
    {
      rlock = {
        raw_lock = {
          val = {
            counter = 0x0
          }
        }
      }
    }
  }
}

crash> p *(struct idr_layer *)0xffff9d2e68672940
$8 = {
  prefix = 0x100,
  bitmap = {0x2, 0x0, 0x0, 0x0},
  ary = {0x0, 0xffff9d2e68672940, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
  count = 0x2,
  layer = 0x0,
  callback_head = {
    next = 0x0,
    func = 0x0
  }
}

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Select Your Language

Translated message

Warning message

RHEL7: mem_cgroup_idr の破損によるシステムクラッシュ

Environment

Issue

Resolution

Root Cause

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Translated message

Warning message

Environment

Issue

Resolution

Root Cause

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links