Chapter 17. Using cgroups-v2 to control distribution of CPU time for applications

Some applications use too much CPU time, which can negatively impact the overall health of your environment. You can put your applications into control groups version 2 (cgroups-v2) and configure CPU limits for those control groups. As a result, you can regulate your applications in CPU consumption.

The user has two methods how to regulate distribution of CPU time allocated to a control group:

  • Setting CPU bandwidth (editing the cpu.max controller file)
  • Setting CPU weight (editing the cpu.weight controller file)

17.1. Mounting cgroups-v2

During the boot process, RHEL 9 mounts the cgroup-v2 virtual filesystem by default. To utilize cgroup-v1 functionality in limiting resources for your applications, manually configure the system.

Prerequisites

  • You have root permissions.

Verification steps

  1. Optionally, verify that the cgroups-v2 filesystem was mounted:

    # mount -l | grep cgroup
    cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate)

    The cgroups-v2 filesystem was successfully mounted on the /sys/fs/cgroup/ directory.

  2. Optionally, inspect the contents of the /sys/fs/cgroup/ directory:

    # ll /sys/fs/cgroup/
    -r—​r—​r--.  1 root root 0 Apr 29 12:03 cgroup.controllers
    -rw-r—​r--.  1 root root 0 Apr 29 12:03 cgroup.max.depth
    -rw-r—​r--.  1 root root 0 Apr 29 12:03 cgroup.max.descendants
    -rw-r—​r--.  1 root root 0 Apr 29 12:03 cgroup.procs
    -r—​r—​r--.  1 root root 0 Apr 29 12:03 cgroup.stat
    -rw-r—​r--.  1 root root 0 Apr 29 12:18 cgroup.subtree_control
    -rw-r—​r--.  1 root root 0 Apr 29 12:03 cgroup.threads
    -rw-r—​r--.  1 root root 0 Apr 29 12:03 cpu.pressure
    -r—​r—​r--.  1 root root 0 Apr 29 12:03 cpuset.cpus.effective
    -r—​r—​r--.  1 root root 0 Apr 29 12:03 cpuset.mems.effective
    -r—​r—​r--.  1 root root 0 Apr 29 12:03 cpu.stat
    drwxr-xr-x.  2 root root 0 Apr 29 12:03 init.scope
    -rw-r—​r--.  1 root root 0 Apr 29 12:03 io.pressure
    -r—​r—​r--.  1 root root 0 Apr 29 12:03 io.stat
    -rw-r—​r--.  1 root root 0 Apr 29 12:03 memory.pressure
    -r—​r—​r--.  1 root root 0 Apr 29 12:03 memory.stat
    drwxr-xr-x. 69 root root 0 Apr 29 12:03 system.slice
    drwxr-xr-x.  3 root root 0 Apr 29 12:18 user.slice

    The /sys/fs/cgroup/ directory, also called the root control group, by default, contains interface files (starting with cgroup) and controller-specific files such as cpuset.cpus.effective. In addition, there are some directories related to systemd, such as, /sys/fs/cgroup/init.scope, /sys/fs/cgroup/system.slice, and /sys/fs/cgroup/user.slice.

Additional resources

17.2. Preparing the cgroup for distribution of CPU time

To control CPU consumption of your applications, you need to enable specific CPU controllers and create two levels of child control groups inside the /sys/fs/cgroup/ root control group. The root control group already contains some of the resource controllers by default. Therefore two levels of child control groups are advisable to ensure organizational clarity of cgroup files.

Prerequisites

  • You have at least two applications whose CPU consumption you want to regulate.
  • You have root permissions.
  • You have mounted cgroups-v2 filesystem.

Procedure

  1. Identify the process IDs (PIDs) of applications whose CPU consumption you want to constrict:

    # top
    Tasks: 104 total,   3 running, 101 sleeping,   0 stopped,   0 zombie
    %Cpu(s): 17.6 us, 81.6 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.8 hi,  0.0 si,  0.0 st
    MiB Mem :   3737.4 total,   3312.7 free,    133.3 used,    291.4 buff/cache
    MiB Swap:   4060.0 total,   4060.0 free,      0.0 used.   3376.1 avail Mem
    
        PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
      34578 root      20   0   18720   1756   1468 R  99.0   0.0   0:31.09 sha1sum
      34579 root      20   0   18720   1772   1480 R  99.0   0.0   0:30.54 sha1sum
          1 root      20   0  186192  13940   9500 S   0.0   0.4   0:01.60 systemd
          2 root      20   0       0      0      0 S   0.0   0.0   0:00.01 kthreadd
          3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp
          4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp
    ...

    The example output reveals that PID 34578 and 34579 (two illustrative applications of sha1sum) consume a lot of resources, namely CPU. Both are example applications used to demonstrate managing the cgroups-v2 functionality.

  2. Verify that the cpu and cpuset controllers are available in the /sys/fs/cgroup/cgroup.controllers file:

    # cat /sys/fs/cgroup/cgroup.controllers
    cpuset cpu io memory hugetlb pids rdma
  3. Enable CPU-related controllers:

    # echo "+cpu" >> /sys/fs/cgroup/cgroup.subtree_control
    # echo "+cpuset" >> /sys/fs/cgroup/cgroup.subtree_control

    These commands enable the cpu and cpuset controllers for the immediate children groups of the /sys/fs/cgroup/ root control group. A child group is where you can specify processes and apply control checks to each of the processes based on your criteria.

    Users can read the contents of the cgroup.subtree_control file at any level to get an idea of what controllers are going to be available for enablement in the immediate child group.

    Note

    By default, the /sys/fs/cgroup/cgroup.subtree_control file in the root control group contains memory and pids controllers.

  4. Create the /sys/fs/cgroup/Example/ directory:

    # mkdir /sys/fs/cgroup/Example/

    The /sys/fs/cgroup/Example/ directory defines a child group. Also, the previous step enabled the cpu and cpuset controllers for this child group.

    When you create the /sys/fs/cgroup/Example/ directory, some cgroups-v2 interface files and cpu and cpuset controller-specific files are automatically created in the directory. The /sys/fs/cgroup/Example/ directory contains also controller-specific files for the memory and pids controllers.

  5. Optionally, inspect the newly created child control group:

    # ll /sys/fs/cgroup/Example/
    -r—​r—​r--. 1 root root 0 Jun  1 10:33 cgroup.controllers
    -r—​r—​r--. 1 root root 0 Jun  1 10:33 cgroup.events
    -rw-r—​r--. 1 root root 0 Jun  1 10:33 cgroup.freeze
    -rw-r—​r--. 1 root root 0 Jun  1 10:33 cgroup.max.depth
    -rw-r—​r--. 1 root root 0 Jun  1 10:33 cgroup.max.descendants
    -rw-r—​r--. 1 root root 0 Jun  1 10:33 cgroup.procs
    -r—​r—​r--. 1 root root 0 Jun  1 10:33 cgroup.stat
    -rw-r—​r--. 1 root root 0 Jun  1 10:33 cgroup.subtree_control
    …​
    -rw-r—​r--. 1 root root 0 Jun  1 10:33 cpuset.cpus
    -r—​r—​r--. 1 root root 0 Jun  1 10:33 cpuset.cpus.effective
    -rw-r—​r--. 1 root root 0 Jun  1 10:33 cpuset.cpus.partition
    -rw-r—​r--. 1 root root 0 Jun  1 10:33 cpuset.mems
    -r—​r—​r--. 1 root root 0 Jun  1 10:33 cpuset.mems.effective
    -r—​r—​r--. 1 root root 0 Jun  1 10:33 cpu.stat
    -rw-r—​r--. 1 root root 0 Jun  1 10:33 cpu.weight
    -rw-r—​r--. 1 root root 0 Jun  1 10:33 cpu.weight.nice
    …​
    -r—​r—​r--. 1 root root 0 Jun  1 10:33 memory.events.local
    -rw-r—​r--. 1 root root 0 Jun  1 10:33 memory.high
    -rw-r—​r--. 1 root root 0 Jun  1 10:33 memory.low
    …​
    -r—​r—​r--. 1 root root 0 Jun  1 10:33 pids.current
    -r—​r—​r--. 1 root root 0 Jun  1 10:33 pids.events
    -rw-r—​r--. 1 root root 0 Jun  1 10:33 pids.max

    The example output shows files such as cpuset.cpus and cpu.max. These files are specific to the cpuset and cpu controllers. The cpuset and cpu controllers are manually enabled for the root’s (/sys/fs/cgroup/) direct child control groups using the /sys/fs/cgroup/cgroup.subtree_control file.

    The directory also includes general cgroup control interface files such as cgroup.procs or cgroup.controllers, which are common to all control groups, regardless of enabled controllers.

    The files such as memory.high and pids.max relate to the memory and pids controllers, which are in the root control group (/sys/fs/cgroup/), and are always enabled by default.

    By default, the newly created child group inherits access to all of the system’s CPU and memory resources, without any limits.

  6. Enable the CPU-related controllers in /sys/fs/cgroup/Example/ to obtain controllers that are relevant only to CPU:

    # echo "+cpu" >> /sys/fs/cgroup/Example/cgroup.subtree_control
    # echo "+cpuset" >> /sys/fs/cgroup/Example/cgroup.subtree_control

    These commands ensure that the immediate child control group will only have controllers relevant to regulate the CPU time distribution - not to memory or pids controllers.

  7. Create the /sys/fs/cgroup/Example/tasks/ directory:

    # mkdir /sys/fs/cgroup/Example/tasks/

    The /sys/fs/cgroup/Example/tasks/ directory defines a child group with files that relate purely to cpu and cpuset controllers.

  8. Optionally, inspect another child control group:

    # ll /sys/fs/cgroup/Example/tasks
    -r—​r—​r--. 1 root root 0 Jun  1 11:45 cgroup.controllers
    -r—​r—​r--. 1 root root 0 Jun  1 11:45 cgroup.events
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 cgroup.freeze
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 cgroup.max.depth
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 cgroup.max.descendants
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 cgroup.procs
    -r—​r—​r--. 1 root root 0 Jun  1 11:45 cgroup.stat
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 cgroup.subtree_control
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 cgroup.threads
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 cgroup.type
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 cpu.max
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 cpu.pressure
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 cpuset.cpus
    -r—​r—​r--. 1 root root 0 Jun  1 11:45 cpuset.cpus.effective
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 cpuset.cpus.partition
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 cpuset.mems
    -r—​r—​r--. 1 root root 0 Jun  1 11:45 cpuset.mems.effective
    -r—​r—​r--. 1 root root 0 Jun  1 11:45 cpu.stat
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 cpu.weight
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 cpu.weight.nice
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 io.pressure
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 memory.pressure
  9. Ensure the processes that you want to control for CPU time compete on the same CPU:

    # echo "1" > /sys/fs/cgroup/Example/tasks/cpuset.cpus

    The previous command ensures that the processes you will place in the Example/tasks child control group, compete on the same CPU. This setting is important for the cpu controller to activate.

    Important

    The cpu controller is only activated if the relevant child control group has at least 2 processes which compete for time on a single CPU.

Verification steps

  1. Optional: ensure that the CPU-related controllers are enabled for the immediate children cgroups:

    # cat /sys/fs/cgroup/cgroup.subtree_control /sys/fs/cgroup/Example/cgroup.subtree_control
    cpuset cpu memory pids
    cpuset cpu
  2. Optional: ensure the processes that you want to control for CPU time compete on the same CPU:

    # cat /sys/fs/cgroup/Example/tasks/cpuset.cpus
    1

Additional resources

17.3. Controlling distribution of CPU time for applications by adjusting CPU bandwidth

You need to assign values to the relevant files of the cpu controller to regulate distribution of the CPU time to applications under the specific cgroup tree.

Prerequisites

  • You have root permissions.
  • You have at least two applications for which you want to control distribution of CPU time.
  • You ensured the relevant applications compete for CPU time on the same CPU as described in Preparing the cgroup for distribution of CPU time.
  • You mounted cgroups-v2 filesystem as described in Mounting cgroups-v2.
  • You enabled cpu and cpuset controllers both in the parent control group and in child control group similarly as described in Preparing the cgroup for distribution of CPU time.
  • You created two levels of child control groups inside the /sys/fs/cgroup/ root control group as in the example below:

    …​
      ├── Example
      │   ├── tasks
    …​

Procedure

  1. Configure CPU bandwidth to achieve resource restrictions within the control group:

    # echo "200000 1000000" > /sys/fs/cgroup/Example/tasks/cpu.max

    The first value is the allowed time quota in microseconds for which all processes collectively in a child group can run during one period. The second value specifies the length of the period.

    During a single period, when processes in a control group collectively exhaust the time specified by this quota, they are throttled for the remainder of the period and not allowed to run until the next period.

    This command sets CPU time distribution controls so that all processes collectively in the /sys/fs/cgroup/Example/tasks child group can run on the CPU for only 0.2 seconds of every 1 second. That is, one fifth of each second.

  2. Optionally, verify the time quotas:

    # cat /sys/fs/cgroup/Example/tasks/cpu.max
    200000 1000000
  3. Add the applications' PIDs to the Example/tasks child group:

    # echo "34578" > /sys/fs/cgroup/Example/tasks/cgroup.procs
    # echo "34579" > /sys/fs/cgroup/Example/tasks/cgroup.procs

    The example commands ensure that desired applications become members of the Example/tasks child group and do not exceed the CPU time distribution configured for this child group.

Verification steps

  1. Verify that the applications run in the specified control group:

    # cat /proc/34578/cgroup /proc/34579/cgroup
    0::/Example/tasks
    0::/Example/tasks

    The output above shows the processes of the specified applications that run in the Example/tasks child group.

  2. Inspect the current CPU consumption of the throttled applications:

    # top
    top - 11:13:53 up 23:10,  1 user,  load average: 0.26, 1.33, 1.66
    Tasks: 104 total,   3 running, 101 sleeping,   0 stopped,   0 zombie
    %Cpu(s):  3.0 us,  7.0 sy,  0.0 ni, 89.5 id,  0.0 wa,  0.2 hi,  0.2 si,  0.2 st
    MiB Mem :   3737.4 total,   3312.6 free,    133.4 used,    291.4 buff/cache
    MiB Swap:   4060.0 total,   4060.0 free,      0.0 used.   3376.0 avail Mem
    
        PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
      34578 root      20   0   18720   1756   1468 R  10.0   0.0  37:36.13 sha1sum
      34579 root      20   0   18720   1772   1480 R  10.0   0.0  37:41.22 sha1sum
          1 root      20   0  186192  13940   9500 S   0.0   0.4   0:01.60 systemd
          2 root      20   0       0      0      0 S   0.0   0.0   0:00.01 kthreadd
          3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp
          4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp
    ...

    Notice that the CPU consumption for the PID 34578 and PID 34579 has decreased to 10%. The Example/tasks child group regulates its processes to 20% of the CPU time collectively. Since there are 2 processes in the control group, each can utilize 10% of the CPU time.

17.4. Controlling distribution of CPU time for applications by adjusting CPU weight

You need to assign values to the relevant files of the cpu controller to regulate distribution of the CPU time to applications under the specific cgroup tree.

Prerequisites

  • You have root permissions.
  • You have applications for which you want to control distribution of CPU time.
  • You ensured the relevant applications compete for CPU time on the same CPU as described in Preparing the cgroup for distribution of CPU time.
  • You mounted cgroups-v2 filesystem as described in Mounting cgroups-v2.
  • You created two levels of child control groups inside the /sys/fs/cgroup/ root control group as in the following example:

    …​
      ├── Example
      │   ├── g1
      │   ├── g2
      │   └── g3
    …​
  • You enabled cpu and cpuset controllers in the parent control group and in child control groups similarly as described in Preparing the cgroup for distribution of CPU time.

Procedure

  1. Configure CPU weights to achieve resource restrictions within the control groups:

    # echo "150" > /sys/fs/cgroup/Example/g1/cpu.weight
    # echo "100" > /sys/fs/cgroup/Example/g2/cpu.weight
    # echo "50" > /sys/fs/cgroup/Example/g3/cpu.weight
  2. Add the applications' PIDs to the g1, g2, and g3 child groups:

    # echo "33373" > /sys/fs/cgroup/Example/g1/cgroup.procs
    # echo "33374" > /sys/fs/cgroup/Example/g2/cgroup.procs
    # echo "33377" > /sys/fs/cgroup/Example/g3/cgroup.procs

    The example commands ensure that desired applications become members of the Example/g*/ child cgroups and do not exceed the distribution control configured for those child cgroups.

    The weights of the children cgroups (g1, g2, g3) that have running processes are summed up at the level of the parent cgroup (Example). The CPU resource is then distributed proportionally based on the respective weights.

    As a result, when all processes run at the same time, the kernel allocates to each of them the proportionate CPU time based on their respective cgroup’s cpu.weight file:

    Child cgroupcpu.weight fileCPU time allocation

    g1

    150

    ~50% (150/300)

    g2

    100

    ~33% (100/300)

    g3

    50

    ~16% (50/300)

    The value of the cpu.weight controller file is not a percentage.

    If one process stopped running, leaving cgroup g2 with no running processes, the calculation would omit the cgroup g2 and only account weights of cgroups g1 and g3:

    Child cgroupcpu.weight fileCPU time allocation

    g1

    150

    ~75% (150/200)

    g3

    50

    ~25% (50/200)

    Important

    If a child cgroup had multiple running processes, the CPU time allocated to the respective cgroup would be distributed equally to the member processes of that cgroup.

Verification

  1. Verify that the applications run in the specified control groups:

    # cat /proc/33373/cgroup /proc/33374/cgroup /proc/33377/cgroup
    0::/Example/g1
    0::/Example/g2
    0::/Example/g3

    The command output shows the processes of the specified applications that run in the Example/g*/ child cgroups.

  2. Inspect the current CPU consumption of the throttled applications:

    # top
    top - 05:17:18 up 1 day, 18:25,  1 user,  load average: 3.03, 3.03, 3.00
    Tasks:  95 total,   4 running,  91 sleeping,   0 stopped,   0 zombie
    %Cpu(s): 18.1 us, 81.6 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.3 hi,  0.0 si,  0.0 st
    MiB Mem :   3737.0 total,   3233.7 free,    132.8 used,    370.5 buff/cache
    MiB Swap:   4060.0 total,   4060.0 free,      0.0 used.   3373.1 avail Mem
    
        PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
      33373 root      20   0   18720   1748   1460 R  49.5   0.0 415:05.87 sha1sum
      33374 root      20   0   18720   1756   1464 R  32.9   0.0 412:58.33 sha1sum
      33377 root      20   0   18720   1860   1568 R  16.3   0.0 411:03.12 sha1sum
        760 root      20   0  416620  28540  15296 S   0.3   0.7   0:10.23 tuned
          1 root      20   0  186328  14108   9484 S   0.0   0.4   0:02.00 systemd
          2 root      20   0       0      0      0 S   0.0   0.0   0:00.01 kthread
    ...

    Notice that the CPU resource for the PID 33373, PID 33374, and PID 33377 was allocated based on the weights, 150, 100, 50, you assigned to the respective child cgroups. The weights correspond to around 50%, 33%, and 16% allocation of CPU time for each application.