Chapter 35. Using cgroupfs to manually manage cgroups

You can manage cgroup hierarchies on your system by creating directories on the cgroupfs virtual file system. The file system is mounted by default on the /sys/fs/cgroup/ directory and you can specify desired configurations in dedicated control files.

Important

In general, Red Hat recommends you use systemd for controlling the usage of system resources. You should manually configure the cgroups virtual file system only in special cases. For example, when you need to use cgroup-v1 controllers that have no equivalents in cgroup-v2 hierarchy.

35.1. Creating cgroups and enabling controllers in cgroups-v2 file system

You can manage the control groups (cgroups) by creating or removing directories and by writing to files in the cgroups virtual file system. The file system is by default mounted on the /sys/fs/cgroup/ directory. To use settings from the cgroups controllers, you also need to enable the desired controllers for child cgroups. The root cgroup has, by default, enabled the memory and pids controllers for its child cgroups. Therefore, Red Hat recommends to create at least two levels of child cgroups inside the /sys/fs/cgroup/ root cgroup. This way you optionally remove the memory and pids controllers from the child cgroups and maintain better organizational clarity of cgroup files.

Prerequisites

  • You have root permissions.

Procedure

  1. Create the /sys/fs/cgroup/Example/ directory:

    # mkdir /sys/fs/cgroup/Example/

    The /sys/fs/cgroup/Example/ directory defines a child group. When you create the /sys/fs/cgroup/Example/ directory, some cgroups-v2 interface files are automatically created in the directory. The /sys/fs/cgroup/Example/ directory contains also controller-specific files for the memory and pids controllers.

  2. Optionally, inspect the newly created child control group:

    # ll /sys/fs/cgroup/Example/
    -r—​r—​r--. 1 root root 0 Jun  1 10:33 cgroup.controllers
    -r—​r—​r--. 1 root root 0 Jun  1 10:33 cgroup.events
    -rw-r—​r--. 1 root root 0 Jun  1 10:33 cgroup.freeze
    -rw-r—​r--. 1 root root 0 Jun  1 10:33 cgroup.procs
    …​
    -rw-r—​r--. 1 root root 0 Jun  1 10:33 cgroup.subtree_control
    -r—​r—​r--. 1 root root 0 Jun  1 10:33 memory.events.local
    -rw-r—​r--. 1 root root 0 Jun  1 10:33 memory.high
    -rw-r—​r--. 1 root root 0 Jun  1 10:33 memory.low
    …​
    -r—​r—​r--. 1 root root 0 Jun  1 10:33 pids.current
    -r—​r—​r--. 1 root root 0 Jun  1 10:33 pids.events
    -rw-r—​r--. 1 root root 0 Jun  1 10:33 pids.max

    The example output shows general cgroup control interface files such as cgroup.procs or cgroup.controllers. These files are common to all control groups, regardless of enabled controllers.

    The files such as memory.high and pids.max relate to the memory and pids controllers, which are in the root control group (/sys/fs/cgroup/), and are enabled by default by systemd.

    By default, the newly created child group inherits all settings from the parent cgroup. In this case, there are no limits from the root cgroup.

  3. Verify that the desired controllers are available in the /sys/fs/cgroup/cgroup.controllers file:

    # cat /sys/fs/cgroup/cgroup.controllers
    cpuset cpu io memory hugetlb pids rdma
  4. Enable the desired controllers. In this example it is cpu and cpuset controllers:

    # echo "+cpu" >> /sys/fs/cgroup/cgroup.subtree_control
    # echo "+cpuset" >> /sys/fs/cgroup/cgroup.subtree_control

    These commands enable the cpu and cpuset controllers for the immediate child groups of the /sys/fs/cgroup/ root control group. Including the newly created Example control group. A child group is where you can specify processes and apply control checks to each of the processes based on your criteria.

    Users can read the contents of the cgroup.subtree_control file at any level to get an idea of what controllers are going to be available for enablement in the immediate child group.

    Note

    By default, the /sys/fs/cgroup/cgroup.subtree_control file in the root control group contains memory and pids controllers.

  5. Enable the desired controllers for child cgroups of the Example control group:

    # echo "+cpu +cpuset" >> /sys/fs/cgroup/Example/cgroup.subtree_control

    This command ensures that the immediate child control group will only have controllers relevant to regulate the CPU time distribution - not to memory or pids controllers.

  6. Create the /sys/fs/cgroup/Example/tasks/ directory:

    # mkdir /sys/fs/cgroup/Example/tasks/

    The /sys/fs/cgroup/Example/tasks/ directory defines a child group with files that relate purely to cpu and cpuset controllers. You can now assign processes to this control group and utilize cpu and cpuset controller options for your processes.

  7. Optionally, inspect the child control group:

    # ll /sys/fs/cgroup/Example/tasks
    -r—​r—​r--. 1 root root 0 Jun  1 11:45 cgroup.controllers
    -r—​r—​r--. 1 root root 0 Jun  1 11:45 cgroup.events
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 cgroup.freeze
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 cgroup.max.depth
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 cgroup.max.descendants
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 cgroup.procs
    -r—​r—​r--. 1 root root 0 Jun  1 11:45 cgroup.stat
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 cgroup.subtree_control
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 cgroup.threads
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 cgroup.type
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 cpu.max
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 cpu.pressure
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 cpuset.cpus
    -r—​r—​r--. 1 root root 0 Jun  1 11:45 cpuset.cpus.effective
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 cpuset.cpus.partition
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 cpuset.mems
    -r—​r—​r--. 1 root root 0 Jun  1 11:45 cpuset.mems.effective
    -r—​r—​r--. 1 root root 0 Jun  1 11:45 cpu.stat
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 cpu.weight
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 cpu.weight.nice
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 io.pressure
    -rw-r—​r--. 1 root root 0 Jun  1 11:45 memory.pressure
Important

The cpu controller is only activated if the relevant child control group has at least 2 processes which compete for time on a single CPU.

Verification steps

  • Optional: confirm that you have created a new cgroup with only the desired controllers active:

    # cat /sys/fs/cgroup/Example/tasks/cgroup.controllers
    cpuset cpu

Additional resources

35.2. Controlling distribution of CPU time for applications by adjusting CPU weight

You need to assign values to the relevant files of the cpu controller to regulate distribution of the CPU time to applications under the specific cgroup tree.

Prerequisites

  • You have root permissions.
  • You have applications for which you want to control distribution of CPU time.
  • You created a two level hierarchy of child control groups inside the /sys/fs/cgroup/ root control group as in the following example:

    …​
      ├── Example
      │   ├── g1
      │   ├── g2
      │   └── g3
    …​
  • You enabled the cpu controller in the parent control group and in child control groups similarly as described in Creating cgroups and enabling controllers in cgroups-v2 file system.

Procedure

  1. Configure desired CPU weights to achieve resource restrictions within the control groups:

    # echo "150" > /sys/fs/cgroup/Example/g1/cpu.weight
    # echo "100" > /sys/fs/cgroup/Example/g2/cpu.weight
    # echo "50" > /sys/fs/cgroup/Example/g3/cpu.weight
  2. Add the applications' PIDs to the g1, g2, and g3 child groups:

    # echo "33373" > /sys/fs/cgroup/Example/g1/cgroup.procs
    # echo "33374" > /sys/fs/cgroup/Example/g2/cgroup.procs
    # echo "33377" > /sys/fs/cgroup/Example/g3/cgroup.procs

    The example commands ensure that desired applications become members of the Example/g*/ child cgroups and will get their CPU time distributed as per the configuration of those cgroups.

    The weights of the children cgroups (g1, g2, g3) that have running processes are summed up at the level of the parent cgroup (Example). The CPU resource is then distributed proportionally based on the respective weights.

    As a result, when all processes run at the same time, the kernel allocates to each of them the proportionate CPU time based on their respective cgroup’s cpu.weight file:

    Child cgroupcpu.weight fileCPU time allocation

    g1

    150

    ~50% (150/300)

    g2

    100

    ~33% (100/300)

    g3

    50

    ~16% (50/300)

    The value of the cpu.weight controller file is not a percentage.

    If one process stopped running, leaving cgroup g2 with no running processes, the calculation would omit the cgroup g2 and only account weights of cgroups g1 and g3:

    Child cgroupcpu.weight fileCPU time allocation

    g1

    150

    ~75% (150/200)

    g3

    50

    ~25% (50/200)

    Important

    If a child cgroup had multiple running processes, the CPU time allocated to the respective cgroup would be distributed equally to the member processes of that cgroup.

Verification

  1. Verify that the applications run in the specified control groups:

    # cat /proc/33373/cgroup /proc/33374/cgroup /proc/33377/cgroup
    0::/Example/g1
    0::/Example/g2
    0::/Example/g3

    The command output shows the processes of the specified applications that run in the Example/g*/ child cgroups.

  2. Inspect the current CPU consumption of the throttled applications:

    # top
    top - 05:17:18 up 1 day, 18:25,  1 user,  load average: 3.03, 3.03, 3.00
    Tasks:  95 total,   4 running,  91 sleeping,   0 stopped,   0 zombie
    %Cpu(s): 18.1 us, 81.6 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.3 hi,  0.0 si,  0.0 st
    MiB Mem :   3737.0 total,   3233.7 free,    132.8 used,    370.5 buff/cache
    MiB Swap:   4060.0 total,   4060.0 free,      0.0 used.   3373.1 avail Mem
    
        PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
      33373 root      20   0   18720   1748   1460 R  49.5   0.0 415:05.87 sha1sum
      33374 root      20   0   18720   1756   1464 R  32.9   0.0 412:58.33 sha1sum
      33377 root      20   0   18720   1860   1568 R  16.3   0.0 411:03.12 sha1sum
        760 root      20   0  416620  28540  15296 S   0.3   0.7   0:10.23 tuned
          1 root      20   0  186328  14108   9484 S   0.0   0.4   0:02.00 systemd
          2 root      20   0       0      0      0 S   0.0   0.0   0:00.01 kthread
    ...
    Note

    We forced all the example processes to run on a single CPU for clearer illustration. The CPU weight applies the same principles also when used on multiple CPUs.

    Notice that the CPU resource for the PID 33373, PID 33374, and PID 33377 was allocated based on the weights, 150, 100, 50, you assigned to the respective child cgroups. The weights correspond to around 50%, 33%, and 16% allocation of CPU time for each application.

35.3. Mounting cgroups-v1

During the boot process, RHEL 9 mounts the cgroup-v2 virtual filesystem by default. To utilize cgroup-v1 functionality in limiting resources for your applications, manually configure the system.

Note

Both cgroup-v1 and cgroup-v2 are fully enabled in the kernel. There is no default control group version from the kernel point of view, and is decided by systemd to mount at startup.

Prerequisites

  • You have root permissions.

Procedure

  1. Configure the system to mount cgroups-v1 by default during system boot by the systemd system and service manager:

    # grubby --update-kernel=/boot/vmlinuz-$(uname -r) --args="systemd.unified_cgroup_hierarchy=0 systemd.legacy_systemd_cgroup_controller"

    This adds the necessary kernel command-line parameters to the current boot entry.

    To add the same parameters to all kernel boot entries:

    # grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0 systemd.legacy_systemd_cgroup_controller"
  2. Reboot the system for the changes to take effect.

Verification

  1. Optionally, verify that the cgroups-v1 filesystem was mounted:

    # mount -l | grep cgroup
    tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,size=4096k,nr_inodes=1024,mode=755,inode64)
    cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
    cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,perf_event)
    cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpu,cpuacct)
    cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,pids)
    cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuset)
    cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,net_cls,net_prio)
    cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,hugetlb)
    cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,memory)
    cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,blkio)
    cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,devices)
    cgroup on /sys/fs/cgroup/misc type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,misc)
    cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,freezer)
    cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,rdma)

    The cgroups-v1 filesystems that correspond to various cgroup-v1 controllers, were successfully mounted on the /sys/fs/cgroup/ directory.

  2. Optionally, inspect the contents of the /sys/fs/cgroup/ directory:

    # ll /sys/fs/cgroup/
    dr-xr-xr-x. 10 root root  0 Mar 16 09:34 blkio
    lrwxrwxrwx.  1 root root 11 Mar 16 09:34 cpu → cpu,cpuacct
    lrwxrwxrwx.  1 root root 11 Mar 16 09:34 cpuacct → cpu,cpuacct
    dr-xr-xr-x. 10 root root  0 Mar 16 09:34 cpu,cpuacct
    dr-xr-xr-x.  2 root root  0 Mar 16 09:34 cpuset
    dr-xr-xr-x. 10 root root  0 Mar 16 09:34 devices
    dr-xr-xr-x.  2 root root  0 Mar 16 09:34 freezer
    dr-xr-xr-x.  2 root root  0 Mar 16 09:34 hugetlb
    dr-xr-xr-x. 10 root root  0 Mar 16 09:34 memory
    dr-xr-xr-x.  2 root root  0 Mar 16 09:34 misc
    lrwxrwxrwx.  1 root root 16 Mar 16 09:34 net_cls → net_cls,net_prio
    dr-xr-xr-x.  2 root root  0 Mar 16 09:34 net_cls,net_prio
    lrwxrwxrwx.  1 root root 16 Mar 16 09:34 net_prio → net_cls,net_prio
    dr-xr-xr-x.  2 root root  0 Mar 16 09:34 perf_event
    dr-xr-xr-x. 10 root root  0 Mar 16 09:34 pids
    dr-xr-xr-x.  2 root root  0 Mar 16 09:34 rdma
    dr-xr-xr-x. 11 root root  0 Mar 16 09:34 systemd

    The /sys/fs/cgroup/ directory, also called the root control group, by default, contains controller-specific directories such as cpuset. In addition, there are some directories related to systemd.

35.4. Setting CPU limits to applications using cgroups-v1

To configure CPU limits to an application by using control groups version 1 (cgroups-v1), use the /sys/fs/ virtual file system.

Prerequisites

  • You have root permissions.
  • You have an application whose CPU consumption you want to restrict.
  • You configured the system to mount cgroups-v1 by default during system boot by the systemd system and service manager:

    # grubby --update-kernel=/boot/vmlinuz-$(uname -r) --args="systemd.unified_cgroup_hierarchy=0 systemd.legacy_systemd_cgroup_controller"

    This adds the necessary kernel command-line parameters to the current boot entry.

Procedure

  1. Identify the process ID (PID) of the application that you want to restrict in CPU consumption:

    # top
    top - 11:34:09 up 11 min,  1 user,  load average: 0.51, 0.27, 0.22
    Tasks: 267 total,   3 running, 264 sleeping,   0 stopped,   0 zombie
    %Cpu(s): 49.0 us,  3.3 sy,  0.0 ni, 47.5 id,  0.0 wa,  0.2 hi,  0.0 si,  0.0 st
    MiB Mem :   1826.8 total,    303.4 free,   1046.8 used,    476.5 buff/cache
    MiB Swap:   1536.0 total,   1396.0 free,    140.0 used.    616.4 avail Mem
    
      PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
     6955 root      20   0  228440   1752   1472 R  99.3   0.1   0:32.71 sha1sum
     5760 jdoe      20   0 3603868 205188  64196 S   3.7  11.0   0:17.19 gnome-shell
     6448 jdoe      20   0  743648  30640  19488 S   0.7   1.6   0:02.73 gnome-terminal-
        1 root      20   0  245300   6568   4116 S   0.3   0.4   0:01.87 systemd
      505 root      20   0       0      0      0 I   0.3   0.0   0:00.75 kworker/u4:4-events_unbound
    ...

    This example output of the top program reveals that illustrative application sha1sum with PID 6955 consumes a lot of CPU resources.

  2. Create a sub-directory in the cpu resource controller directory:

    # mkdir /sys/fs/cgroup/cpu/Example/

    This directory represents a control group, where you can place specific processes and apply certain CPU limits to the processes. At the same time, a number of cgroups-v1 interface files and cpu controller-specific files will be created in the directory.

  3. Optional: Inspect the newly created control group:

    # ll /sys/fs/cgroup/cpu/Example/
    -rw-r—​r--. 1 root root 0 Mar 11 11:42 cgroup.clone_children
    -rw-r—​r--. 1 root root 0 Mar 11 11:42 cgroup.procs
    -r—​r—​r--. 1 root root 0 Mar 11 11:42 cpuacct.stat
    -rw-r—​r--. 1 root root 0 Mar 11 11:42 cpuacct.usage
    -r—​r—​r--. 1 root root 0 Mar 11 11:42 cpuacct.usage_all
    -r—​r—​r--. 1 root root 0 Mar 11 11:42 cpuacct.usage_percpu
    -r—​r—​r--. 1 root root 0 Mar 11 11:42 cpuacct.usage_percpu_sys
    -r—​r—​r--. 1 root root 0 Mar 11 11:42 cpuacct.usage_percpu_user
    -r—​r—​r--. 1 root root 0 Mar 11 11:42 cpuacct.usage_sys
    -r—​r—​r--. 1 root root 0 Mar 11 11:42 cpuacct.usage_user
    -rw-r—​r--. 1 root root 0 Mar 11 11:42 cpu.cfs_period_us
    -rw-r—​r--. 1 root root 0 Mar 11 11:42 cpu.cfs_quota_us
    -rw-r—​r--. 1 root root 0 Mar 11 11:42 cpu.rt_period_us
    -rw-r—​r--. 1 root root 0 Mar 11 11:42 cpu.rt_runtime_us
    -rw-r—​r--. 1 root root 0 Mar 11 11:42 cpu.shares
    -r—​r—​r--. 1 root root 0 Mar 11 11:42 cpu.stat
    -rw-r—​r--. 1 root root 0 Mar 11 11:42 notify_on_release
    -rw-r—​r--. 1 root root 0 Mar 11 11:42 tasks

    This example output shows files, such as cpuacct.usage, cpu.cfs._period_us, that represent specific configurations and/or limits, which can be set for processes in the Example control group. Note that the respective file names are prefixed with the name of the control group controller to which they belong.

    By default, the newly created control group inherits access to the system’s entire CPU resources without a limit.

  4. Configure CPU limits for the control group:

    # echo "1000000" > /sys/fs/cgroup/cpu/Example/cpu.cfs_period_us
    # echo "200000" > /sys/fs/cgroup/cpu/Example/cpu.cfs_quota_us
    • The cpu.cfs_period_us file represents a period of time in microseconds (µs, represented here as "us") for how frequently a control group’s access to CPU resources should be reallocated. The upper limit is 1 000 000 microsecond and the lower limit is 1 000 microseconds.
    • The cpu.cfs_quota_us file represents the total amount of time in microseconds for which all processes collectively in a control group can run during one period (as defined by cpu.cfs_period_us). When processes in a control group, during a single period, use up all the time specified by the quota, they are throttled for the remainder of the period and not allowed to run until the next period. The lower limit is 1 000 microseconds.

      The example commands above set the CPU time limits so that all processes collectively in the Example control group will be able to run only for 0.2 seconds (defined by cpu.cfs_quota_us) out of every 1 second (defined by cpu.cfs_period_us).

  5. Optional: Verify the limits:

    # cat /sys/fs/cgroup/cpu/Example/cpu.cfs_period_us /sys/fs/cgroup/cpu/Example/cpu.cfs_quota_us
    1000000
    200000
  6. Add the application’s PID to the Example control group:

    # echo "6955" > /sys/fs/cgroup/cpu/Example/cgroup.procs

    This command ensures that a specific application becomes a member of the Example control group and hence does not exceed the CPU limits configured for the Example control group. The PID should represent an existing process in the system. The PID 6955 here was assigned to process sha1sum /dev/zero &, used to illustrate the use case of the cpu controller.

Verification

  1. Verify that the application runs in the specified control group:

    # cat /proc/6955/cgroup
    12:cpuset:/
    11:hugetlb:/
    10:net_cls,net_prio:/
    9:memory:/user.slice/user-1000.slice/user@1000.service
    8:devices:/user.slice
    7:blkio:/
    6:freezer:/
    5:rdma:/
    4:pids:/user.slice/user-1000.slice/user@1000.service
    3:perf_event:/
    2:cpu,cpuacct:/Example
    1:name=systemd:/user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service

    This example output shows that the process of the desired application runs in the Example control group, which applies CPU limits to the application’s process.

  2. Identify the current CPU consumption of your throttled application:

    # top
    top - 12:28:42 up  1:06,  1 user,  load average: 1.02, 1.02, 1.00
    Tasks: 266 total,   6 running, 260 sleeping,   0 stopped,   0 zombie
    %Cpu(s): 11.0 us,  1.2 sy,  0.0 ni, 87.5 id,  0.0 wa,  0.2 hi,  0.0 si,  0.2 st
    MiB Mem :   1826.8 total,    287.1 free,   1054.4 used,    485.3 buff/cache
    MiB Swap:   1536.0 total,   1396.7 free,    139.2 used.    608.3 avail Mem
    
      PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
     6955 root      20   0  228440   1752   1472 R  20.6   0.1  47:11.43 sha1sum
     5760 jdoe      20   0 3604956 208832  65316 R   2.3  11.2   0:43.50 gnome-shell
     6448 jdoe      20   0  743836  31736  19488 S   0.7   1.7   0:08.25 gnome-terminal-
      505 root      20   0       0      0      0 I   0.3   0.0   0:03.39 kworker/u4:4-events_unbound
     4217 root      20   0   74192   1612   1320 S   0.3   0.1   0:01.19 spice-vdagentd
    ...

    Note that the CPU consumption of the PID 6955 has decreased from 99% to 20%.

Note

The cgroups-v2 counterpart for cpu.cfs_period_us and cpu.cfs_quota_us is the cpu.max file. The cpu.max file is available through the cpu controller.

Additional resources