Chapter 9. Setting limits for applications

As a system administrator, use the control groups kernel functionality to set limits, prioritize or isolate the hardware resources of processes so that applications on your system are stable and do not run out of memory.

9.1. Understanding control groups

Control groups is a Linux kernel feature that enables you to organize processes into hierarchically ordered groups - cgroups. The hierarchy (control groups tree) is defined by providing structure to cgroups virtual file system, mounted by default on the /sys/fs/cgroup/ directory. It is done manually by creating and removing sub-directories in /sys/fs/cgroup/. Alternatively, by using the systemd system and service manager.

The resource controllers (a kernel component) then modify the behavior of processes in cgroups by limiting, prioritizing or allocating system resources, (such as CPU time, memory, network bandwidth, or various combinations) of those processes.

The added value of cgroups is process aggregation which enables division of hardware resources among applications and users. Thereby an increase in overall efficiency, stability and security of users' environment can be achieved.

Control groups version 1

Control groups version 1 (cgroups-v1) provide a per-resource controller hierarchy. It means that each resource, such as CPU, memory, I/O, and so on, has its own control group hierarchy. It is possible to combine different control group hierarchies in a way that one controller can coordinate with another one in managing their respective resources. However, the two controllers may belong to different process hierarchies, which does not permit their proper coordination.

The cgroups-v1 controllers were developed across a large time span and as a result, the behavior and naming of their control files is not uniform.

Control groups version 2

The problems with controller coordination, which stemmed from hierarchy flexibility, led to the development of control groups version 2.

Control groups version 2 (cgroups-v2) provides a single control group hierarchy against which all resource controllers are mounted.

The control file behavior and naming is consistent among different controllers.

Warning

RHEL 8 provides cgroups-v2 as a technology preview with a limited number of resource controllers. For more information about the relevant resource controllers, see the cgroups-v2 release note.

This sub-section was based on a Devconf.cz 2019 presentation.[1]

Additional resources

9.2. What kernel resource controllers are

The functionality of control groups is enabled by kernel resource controllers. RHEL 8 supports various controllers for control groups version 1 (cgroups-v1) and control groups version 2 (cgroups-v2).

A resource controller, also called a control group subsystem, is a kernel subsystem that represents a single resource, such as CPU time, memory, network bandwidth or disk I/O. The Linux kernel provides a range of resource controllers that are mounted automatically by the systemd system and service manager. Find a list of currently mounted resource controllers in the /proc/cgroups file.

The following controllers are available for cgroups-v1:

  • blkio - can set limits on input/output access to and from block devices.
  • cpu - can adjust the parameters of the Completely Fair Scheduler (CFS) scheduler for control group’s tasks. It is mounted together with the cpuacct controller on the same mount.
  • cpuacct - creates automatic reports on CPU resources used by tasks in a control group. It is mounted together with the cpu controller on the same mount.
  • cpuset - can be used to restrict control group tasks to run only on a specified subset of CPUs and to direct the tasks to use memory only on specified memory nodes.
  • devices - can control access to devices for tasks in a control group.
  • freezer - can be used to suspend or resume tasks in a control group.
  • memory - can be used to set limits on memory use by tasks in a control group and generates automatic reports on memory resources used by those tasks.
  • net_cls - tags network packets with a class identifier (classid) that enables the Linux traffic controller (the tc command) to identify packets that originate from a particular control group task. A subsystem of net_cls, the net_filter (iptables), can also use this tag to perform actions on such packets. The net_filter tags network sockets with a firewall identifier (fwid) that allows the Linux firewall (through iptables command) to identify packets originating from a particular control group task.
  • net_prio - sets the priority of network traffic.
  • pids - can set limits for a number of processes and their children in a control group.
  • perf_event - can group tasks for monitoring by the perf performance monitoring and reporting utility.
  • rdma - can set limits on Remote Direct Memory Access/InfiniBand specific resources in a control group.
  • hugetlb - can be used to limit the usage of large size virtual memory pages by tasks in a control group.

The following controllers are available for cgroups-v2:

  • io - A follow-up to blkio of cgroups-v1.
  • memory - A follow-up to memory of cgroups-v1.
  • pids - Same as pids in cgroups-v1.
  • rdma - Same as rdma in cgroups-v1.
  • cpu - A follow-up to cpu and cpuacct of cgroups-v1.
  • cpuset - Supports only the core functionality (cpus{,.effective}, mems{,.effective}) with a new partition feature.
  • perf_event - Support is inherent, no explicit control file. You can specify a v2 cgroup as a parameter to the perf command that will profile all the tasks within that cgroup.
Important

A resource controller can be used either in a cgroups-v1 hierarchy or a cgroups-v2 hierarchy, not simultaneously in both.

Additional resources

  • For more information about resource controllers in general, refer to the cgroups(7) manual page.
  • For detailed descriptions of specific resource controllers, see the documentation in the /usr/share/doc/kernel-doc-<kernel_version>/Documentation/cgroups-v1/ directory.
  • For more information about cgroups-v2, refer to the cgroups(7) manual page.

9.3. Using control groups through a virtual file system

You can use control groups (cgroups) to set limits, prioritize, or control access to hardware resources for groups of processes. This allows you to granularly control resource usage of applications to utilize them more efficiently. The following sections provide an overview of tasks related to management of cgroups for both version 1 and version 2 using a virtual file system.

9.3.1. Setting CPU limits to applications using cgroups-v1

Sometimes an application consumes a lot of CPU time, which may negatively impact the overall health of your environment. Use the /sys/fs/ virtual file system to configure CPU limits to an application using control groups version 1 (cgroups-v1).

Prerequisites

  • An application whose CPU consumption you want to restrict.
  • Verify that the cgroups-v1 controllers were mounted:

    # mount -l | grep cgroup
    tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755)
    cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
    cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpu,cpuacct)
    cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,perf_event)
    cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,pids)
    ...

Procedure

  1. Identify the process ID (PID) of the application you want to restrict in CPU consumption:

    # top
    top - 11:34:09 up 11 min,  1 user,  load average: 0.51, 0.27, 0.22
    Tasks: 267 total,   3 running, 264 sleeping,   0 stopped,   0 zombie
    %Cpu(s): 49.0 us,  3.3 sy,  0.0 ni, 47.5 id,  0.0 wa,  0.2 hi,  0.0 si,  0.0 st
    MiB Mem :   1826.8 total,    303.4 free,   1046.8 used,    476.5 buff/cache
    MiB Swap:   1536.0 total,   1396.0 free,    140.0 used.    616.4 avail Mem
    
      PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
     6955 root      20   0  228440   1752   1472 R  99.3   0.1   0:32.71 sha1sum
     5760 jdoe      20   0 3603868 205188  64196 S   3.7  11.0   0:17.19 gnome-shell
     6448 jdoe      20   0  743648  30640  19488 S   0.7   1.6   0:02.73 gnome-terminal-
        1 root      20   0  245300   6568   4116 S   0.3   0.4   0:01.87 systemd
      505 root      20   0       0      0      0 I   0.3   0.0   0:00.75 kworker/u4:4-events_unbound
    ...

    The example output of the top program reveals that PID 6955 (illustrative application sha1sum) consumes a lot of CPU resources.

  2. Create a sub-directory in the cpu resource controller directory:

    # mkdir /sys/fs/cgroup/cpu/Example/

    The directory above represents a control group, where you can place specific processes and apply certain CPU limits to the processes. At the same time, some cgroups-v1 interface files and cpu controller-specific files will be created in the directory.

  3. Optionally, inspect the newly created control group:

    # ll /sys/fs/cgroup/cpu/Example/
    -rw-r—​r--. 1 root root 0 Mar 11 11:42 cgroup.clone_children
    -rw-r—​r--. 1 root root 0 Mar 11 11:42 cgroup.procs
    -r—​r—​r--. 1 root root 0 Mar 11 11:42 cpuacct.stat
    -rw-r—​r--. 1 root root 0 Mar 11 11:42 cpuacct.usage
    -r—​r—​r--. 1 root root 0 Mar 11 11:42 cpuacct.usage_all
    -r—​r—​r--. 1 root root 0 Mar 11 11:42 cpuacct.usage_percpu
    -r—​r—​r--. 1 root root 0 Mar 11 11:42 cpuacct.usage_percpu_sys
    -r—​r—​r--. 1 root root 0 Mar 11 11:42 cpuacct.usage_percpu_user
    -r—​r—​r--. 1 root root 0 Mar 11 11:42 cpuacct.usage_sys
    -r—​r—​r--. 1 root root 0 Mar 11 11:42 cpuacct.usage_user
    -rw-r—​r--. 1 root root 0 Mar 11 11:42 cpu.cfs_period_us
    -rw-r—​r--. 1 root root 0 Mar 11 11:42 cpu.cfs_quota_us
    -rw-r—​r--. 1 root root 0 Mar 11 11:42 cpu.rt_period_us
    -rw-r—​r--. 1 root root 0 Mar 11 11:42 cpu.rt_runtime_us
    -rw-r—​r--. 1 root root 0 Mar 11 11:42 cpu.shares
    -r—​r—​r--. 1 root root 0 Mar 11 11:42 cpu.stat
    -rw-r—​r--. 1 root root 0 Mar 11 11:42 notify_on_release
    -rw-r—​r--. 1 root root 0 Mar 11 11:42 tasks

    The example output shows files, such as cpuacct.usage, cpu.cfs._period_us, that represent specific configurations and/or limits, which can be set for processes in the Example control group. Notice that the respective file names are prefixed with the name of the control group controller to which they belong.

    By default, the newly created control group inherits access to the system’s entire CPU resources without a limit.

  4. Configure CPU limits for the control group:

    # echo "1000000" > /sys/fs/cgroup/cpu/Example/cpu.cfs_period_us
    # echo "200000" > /sys/fs/cgroup/cpu/Example/cpu.cfs_quota_us

    The cpu.cfs_period_us file represents a period of time in microseconds (µs, represented here as "us") for how frequently a control group’s access to CPU resources should be reallocated. The upper limit is 1 second and the lower limit is 1000 microseconds.

    The cpu.cfs_quota_us file represents the total amount of time in microseconds for which all processes collectively in a control group can run during one period (as defined by cpu.cfs_period_us). As soon as processes in a control group, during a single period, use up all the time specified by the quota, they are throttled for the remainder of the period and not allowed to run until the next period. The lower limit is 1000 microseconds.

    The example commands above set the CPU time limits so that all processes collectively in the Example control group will be able to run only for 0.2 seconds (defined by cpu.cfs_quota_us) out of every 1 second (defined by cpu.cfs_period_us).

  5. Optionally, verify the limits:

    # cat /sys/fs/cgroup/cpu/Example/cpu.cfs_period_us /sys/fs/cgroup/cpu/Example/cpu.cfs_quota_us
    1000000
    200000
  6. Add the application’s PID to the Example control group:

    # echo "6955" > /sys/fs/cgroup/cpu/Example/cgroup.procs
    
    or
    
    # echo "6955" > /sys/fs/cgroup/cpu/Example/tasks

    The previous command ensures that a desired application becomes a member of the Example control group and hence does not exceed the CPU limits configured for the Example control group. The PID should represent an existing process in the system. The PID 6955 here was assigned to process sha1sum /dev/zero &, used to illustrate the use-case of the cpu controller.

  7. Verify that the application runs in the specified control group:

    # cat /proc/6955/cgroup
    12:cpuset:/
    11:hugetlb:/
    10:net_cls,net_prio:/
    9:memory:/user.slice/user-1000.slice/user@1000.service
    8:devices:/user.slice
    7:blkio:/
    6:freezer:/
    5:rdma:/
    4:pids:/user.slice/user-1000.slice/user@1000.service
    3:perf_event:/
    2:cpu,cpuacct:/Example
    1:name=systemd:/user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service

    The example output above shows that the process of the desired application runs in the Example control group, which applies CPU limits to the application’s process.

  8. Identify the current CPU consumption of your throttled application:

    # top
    top - 12:28:42 up  1:06,  1 user,  load average: 1.02, 1.02, 1.00
    Tasks: 266 total,   6 running, 260 sleeping,   0 stopped,   0 zombie
    %Cpu(s): 11.0 us,  1.2 sy,  0.0 ni, 87.5 id,  0.0 wa,  0.2 hi,  0.0 si,  0.2 st
    MiB Mem :   1826.8 total,    287.1 free,   1054.4 used,    485.3 buff/cache
    MiB Swap:   1536.0 total,   1396.7 free,    139.2 used.    608.3 avail Mem
    
      PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
     6955 root      20   0  228440   1752   1472 R  20.6   0.1  47:11.43 sha1sum
     5760 jdoe      20   0 3604956 208832  65316 R   2.3  11.2   0:43.50 gnome-shell
     6448 jdoe      20   0  743836  31736  19488 S   0.7   1.7   0:08.25 gnome-terminal-
      505 root      20   0       0      0      0 I   0.3   0.0   0:03.39 kworker/u4:4-events_unbound
     4217 root      20   0   74192   1612   1320 S   0.3   0.1   0:01.19 spice-vdagentd
    ...

    Notice that the CPU consumption of the PID 6955 has decreased from 99% to 20%.

Additional resources

9.3.2. Setting CPU limits to applications using cgroups-v2

Sometimes an application uses a lot of CPU time, which may negatively impact the overall health of your environment. Use control groups version 2 (cgroups-v2) to configure CPU limits to the application, and restrict its consumption.

Prerequisites

Procedure

  1. Prevent cgroups-v1 from automatically mounting during the system boot:

    # grubby --update-kernel=/boot/vmlinuz-$(uname -r) --args="cgroup_no_v1=all"

    The command adds a kernel command-line parameter to the current boot entry. The cgroup_no_v1=all parameter prevents cgroups-v1 from being automatically mounted.

    Alternatively, use the systemd.unified_cgroup_hierarchy=1 kernel command-line parameter to mount cgroups-v2 during the system boot by default.

    Note

    RHEL 8 supports both cgroups-v1 and cgroups-v2. However, cgroups-v1 is enabled and mounted by default during the booting process.

  2. Reboot the system for the changes to take effect.
  3. Optionally, verify the cgroups-v1 functionality has been disabled:

    # mount -l | grep cgroup
    tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755)
    cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)

    If cgroups-v1 have been successfully disabled, the output does not show any "type cgroup" references, except for those which belong to systemd.

  4. Mount cgroups-v2 anywhere in the filesystem:

    # mount -t cgroup2 none <MOUNT_POINT>
  5. Optionally, verify the cgroups-v2 functionality has been mounted:

    # mount -l | grep cgroup2
    none on /cgroups-v2 type cgroup2 (rw,relatime,seclabel)

    The example output shows that cgroups-v2 has been mounted to the /cgroups-v2/ directory.

  6. Optionally, inspect the contents of the /cgroups-v2/ directory:

    # ll /cgroups-v2/
    -r—​r—​r--. 1 root root 0 Mar 13 11:57 cgroup.controllers
    -rw-r—​r--. 1 root root 0 Mar 13 11:57 cgroup.max.depth
    -rw-r—​r--. 1 root root 0 Mar 13 11:57 cgroup.max.descendants
    -rw-r—​r--. 1 root root 0 Mar 13 11:57 cgroup.procs
    -r—​r—​r--. 1 root root 0 Mar 13 11:57 cgroup.stat
    -rw-r—​r--. 1 root root 0 Mar 13 11:58 cgroup.subtree_control
    -rw-r—​r--. 1 root root 0 Mar 13 11:57 cgroup.threads
    -rw-r—​r--. 1 root root 0 Mar 13 11:57 cpu.pressure
    -r—​r—​r--. 1 root root 0 Mar 13 11:57 cpuset.cpus.effective
    -r—​r—​r--. 1 root root 0 Mar 13 11:57 cpuset.mems.effective
    -rw-r—​r--. 1 root root 0 Mar 13 11:57 io.pressure
    -rw-r—​r--. 1 root root 0 Mar 13 11:57 memory.pressure

    The /cgroups-v2/ directory, also called the root control group, contains some interface files (starting with cgroup) and some controller-specific files such as cpuset.cpus.effective.

  7. Identify the process IDs (PIDs) of applications you want to restrict in CPU consumption:

    # top
    top - 15:39:52 up  3:45,  1 user,  load average: 0.79, 0.20, 0.07
    Tasks: 265 total,   3 running, 262 sleeping,   0 stopped,   0 zombie
    %Cpu(s): 74.3 us,  6.1 sy,  0.0 ni, 19.4 id,  0.0 wa,  0.2 hi,  0.0 si,  0.0 st
    MiB Mem :   1826.8 total,    243.8 free,   1102.1 used,    480.9 buff/cache
    MiB Swap:   1536.0 total,   1526.2 free,      9.8 used.    565.6 avail Mem
    
      PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
     5473 root      20   0  228440   1740   1456 R  99.7   0.1   0:12.11 sha1sum
     5439 root      20   0  222616   3420   3052 R  60.5   0.2   0:27.08 cpu_load_generator
     2170 jdoe      20   0 3600716 209960  67548 S   0.3  11.2   1:18.50 gnome-shell
     3051 root      20   0  274424   3976   3092 R   0.3   0.2   1:01.25 top
        1 root      20   0  245448  10256   5448 S   0.0   0.5   0:02.52 systemd
    ...

    The example output of the top program reveals that PID 5473 and 5439 (illustrative application sha1sum and cpu_load_generator) consume a lot of resources, namely CPU. Both are example applications used to demonstrate managing the cgroups-v2 functionality.

  8. Enable CPU-related controllers:

    # echo "+cpu" > /cgroups-v2/cgroup.subtree_control
    # echo "+cpuset" > /cgroups-v2/cgroup.subtree_control

    The previous commands enable the cpu and cpuset controllers for the immediate sub-control groups of the /cgroups-v2/ root control group.

  9. Create a sub-directory in the previously created /cgroups-v2/ directory:

    # mkdir /cgroups-v2/Example/

    The /cgroups-v2/Example/ directory represents a sub-control group, where you can place specific processes and apply various CPU limits to the processes. Also, the previous step enabled the cpu and cpuset controllers for this sub-control group.

    At the time of creation of /cgroups-v2/Example/, some cgroups-v2 interface files and cpu and cpuset controller-specific files will be created in the directory.

  10. Optionally, inspect the newly created control group:

    # ll /cgroups-v2/Example/
    -r—​r—​r--. 1 root root 0 Mar 13 14:48 cgroup.controllers
    -r—​r—​r--. 1 root root 0 Mar 13 14:48 cgroup.events
    -rw-r—​r--. 1 root root 0 Mar 13 14:48 cgroup.freeze
    -rw-r—​r--. 1 root root 0 Mar 13 14:48 cgroup.max.depth
    -rw-r—​r--. 1 root root 0 Mar 13 14:48 cgroup.max.descendants
    -rw-r—​r--. 1 root root 0 Mar 13 14:48 cgroup.procs
    -r—​r—​r--. 1 root root 0 Mar 13 14:48 cgroup.stat
    -rw-r—​r--. 1 root root 0 Mar 13 14:48 cgroup.subtree_control
    -rw-r—​r--. 1 root root 0 Mar 13 14:48 cgroup.threads
    -rw-r—​r--. 1 root root 0 Mar 13 14:48 cgroup.type
    -rw-r—​r--. 1 root root 0 Mar 13 14:48 cpu.max
    -rw-r—​r--. 1 root root 0 Mar 13 14:48 cpu.pressure
    -rw-r—​r--. 1 root root 0 Mar 13 14:48 cpuset.cpus
    -r—​r—​r--. 1 root root 0 Mar 13 14:48 cpuset.cpus.effective
    -rw-r—​r--. 1 root root 0 Mar 13 14:48 cpuset.cpus.partition
    -rw-r—​r--. 1 root root 0 Mar 13 14:48 cpuset.mems
    -r—​r—​r--. 1 root root 0 Mar 13 14:48 cpuset.mems.effective
    -r—​r—​r--. 1 root root 0 Mar 13 14:48 cpu.stat
    -rw-r—​r--. 1 root root 0 Mar 13 14:48 cpu.weight
    -rw-r—​r--. 1 root root 0 Mar 13 14:48 cpu.weight.nice
    -rw-r—​r--. 1 root root 0 Mar 13 14:48 io.pressure
    -rw-r—​r--. 1 root root 0 Mar 13 14:48 memory.pressure

    The example output shows files such as cpuset.cpus and cpu.max. The files are specific to the cpuset and cpu controllers that you enabled for the root’s (/cgroups-v2/) direct child control groups using the /cgroups-v2/cgroup.subtree_control file. Also, there are general cgroup control interface files such as cgroup.procs or cgroup.controllers, which are common to all control groups, regardless of enabled controllers.

    By default, the newly created sub-control group inherited access to the system’s entire CPU resources without a limit.

  11. Ensure the processes that you want to limit compete for CPU time on the same CPU:

    # echo "1" > /cgroups-v2/Example/cpuset.cpus

    The previous command secures processes that you placed in the Example sub-control group, compete on the same CPU. This setting is important for the cpu controller to activate.

    Important

    The cpu controller is only activated if the relevant sub-control group has at least 2 processes, which compete for time on a single CPU.

  12. Configure CPU limits of the control group:

    # echo "200000 1000000" > /cgroups-v2/Example/cpu.max

    The first value is the allowed time quota in microseconds for which all processes collectively in a sub-control group can run during one period (specified by the second value). During a single period, when processes in a control group collectively exhaust all the time specified by this quota, they are throttled for the remainder of the period and not allowed to run until the next period.

    The example command sets the CPU time limits so that all processes collectively in the Example sub-control group are able to run on the CPU only for 0.2 seconds out of every 1 second.

  13. Optionally, verify the limits:

    # cat /cgroups-v2/Example/cpu.max
    200000 1000000
  14. Add the applications' PIDs to the Example sub-control group:

    # echo "5473" > /cgroups-v2/Example/cgroup.procs
    # echo "5439" > /cgroups-v2/Example/cgroup.procs

    The example commands ensure that desired applications become members of the Example sub-control group and hence do not exceed the CPU limits configured for the Example sub-control group.

  15. Verify that the applications run in the specified control group:

    # cat /proc/5473/cgroup /proc/5439/cgroup
    1:name=systemd:/user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service
    0::/Example
    1:name=systemd:/user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service
    0::/Example

    The example output above shows that the processes of the desired applications run in the Example sub-control group.

  16. Inspect the current CPU consumption of your throttled applications:

    # top
    top - 15:56:27 up  4:02,  1 user,  load average: 0.03, 0.41, 0.55
    Tasks: 265 total,   4 running, 261 sleeping,   0 stopped,   0 zombie
    %Cpu(s):  9.6 us,  0.8 sy,  0.0 ni, 89.4 id,  0.0 wa,  0.2 hi,  0.0 si,  0.0 st
    MiB Mem :   1826.8 total,    243.4 free,   1102.1 used,    481.3 buff/cache
    MiB Swap:   1536.0 total,   1526.2 free,      9.8 used.    565.5 avail Mem
    
      PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
     5439 root      20   0  222616   3420   3052 R  10.0   0.2   6:15.83 cpu_load_generator
     5473 root      20   0  228440   1740   1456 R  10.0   0.1   9:20.65 sha1sum
     2753 jdoe      20   0  743928  35328  20608 S   0.7   1.9   0:20.36 gnome-terminal-
     2170 jdoe      20   0 3599688 208820  67552 S   0.3  11.2   1:33.06 gnome-shell
     5934 root      20   0  274428   5064   4176 R   0.3   0.3   0:00.04 top
     ...

    Notice that the CPU consumption for the PID 5439 and PID 5473 has decreased to 10%. The Example sub-control group limits its processes to 20% of the CPU time collectively. Since there are 2 processes in the control group, each can utilize 10% of the CPU time.

Additional resources

9.4. Role of systemd in control groups version 1

Red Hat Enterprise Linux 8 moves the resource management settings from the process level to the application level by binding the system of cgroup hierarchies with the systemd unit tree. Therefore, you can manage the system resources with the systemctl command, or by modifying the systemd unit files.

By default, the systemd system and service manager makes use of the slice, the scope and the service units to organize and structure processes in the control groups. The systemctl command enables you to further modify this structure by creating custom slices. Also, systemd automatically mounts hierarchies for important kernel resource controllers in the /sys/fs/cgroup/ directory.

Three systemd unit types are used for resource control:

  • Service - A process or a group of processes, which systemd started according to a unit configuration file. Services encapsulate the specified processes so that they can be started and stopped as one set. Services are named in the following way:

    <name>.service
  • Scope - A group of externally created processes. Scopes encapsulate processes that are started and stopped by the arbitrary processes through the fork() function and then registered by systemd at runtime. For example, user sessions, containers, and virtual machines are treated as scopes. Scopes are named as follows:

    <name>.scope
  • Slice - A group of hierarchically organized units. Slices organize a hierarchy in which scopes and services are placed. The actual processes are contained in scopes or in services. Every name of a slice unit corresponds to the path to a location in the hierarchy. The dash ("-") character acts as a separator of the path components to a slice from the -.slice root slice. In the following example:

    <parent-name>.slice

    parent-name.slice is a sub-slice of parent.slice, which is a sub-slice of the -.slice root slice. parent-name.slice can have its own sub-slice named parent-name-name2.slice, and so on.

The service, the scope, and the slice units directly map to objects in the control group hierarchy. When these units are activated, they map directly to control group paths built from the unit names.

The following is an abbreviated example of a control group hierarchy:

Control group /:
-.slice
├─user.slice
│ ├─user-42.slice
│ │ ├─session-c1.scope
│ │ │ ├─ 967 gdm-session-worker [pam/gdm-launch-environment]
│ │ │ ├─1035 /usr/libexec/gdm-x-session gnome-session --autostart /usr/share/gdm/greeter/autostart
│ │ │ ├─1054 /usr/libexec/Xorg vt1 -displayfd 3 -auth /run/user/42/gdm/Xauthority -background none -noreset -keeptty -verbose 3
│ │ │ ├─1212 /usr/libexec/gnome-session-binary --autostart /usr/share/gdm/greeter/autostart
│ │ │ ├─1369 /usr/bin/gnome-shell
│ │ │ ├─1732 ibus-daemon --xim --panel disable
│ │ │ ├─1752 /usr/libexec/ibus-dconf
│ │ │ ├─1762 /usr/libexec/ibus-x11 --kill-daemon
│ │ │ ├─1912 /usr/libexec/gsd-xsettings
│ │ │ ├─1917 /usr/libexec/gsd-a11y-settings
│ │ │ ├─1920 /usr/libexec/gsd-clipboard
…​
├─init.scope
│ └─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 18
└─system.slice
  ├─rngd.service
  │ └─800 /sbin/rngd -f
  ├─systemd-udevd.service
  │ └─659 /usr/lib/systemd/systemd-udevd
  ├─chronyd.service
  │ └─823 /usr/sbin/chronyd
  ├─auditd.service
  │ ├─761 /sbin/auditd
  │ └─763 /usr/sbin/sedispatch
  ├─accounts-daemon.service
  │ └─876 /usr/libexec/accounts-daemon
  ├─example.service
  │ ├─ 929 /bin/bash /home/jdoe/example.sh
  │ └─4902 sleep 1
  …​

The example above shows that services and scopes contain processes and are placed in slices that do not contain processes of their own.

Additional resources

  • For more information about systemd, unit files, and a complete list of systemd unit types, see the relevant sections in Configuring basic system settings.
  • For more information about resource controllers, see the What are kernel resource controllers section and the systemd.resource-control(5), cgroups(7) manual pages.
  • For more information about fork(), see the fork(2) manual pages.

9.5. Using control groups version 1 with systemd

The following sections provide an overview of tasks related to creation, modification and removal of the control groups (cgroups). The utilities provided by the systemd system and service manager are the preferred way of the cgroups management and will be supported in the future.

9.5.1. Creating control groups version 1 with systemd

You can use the systemd system and service manager to create transient and persistent control groups (cgroups) to set limits, prioritize, or control access to hardware resources for groups of processes.

9.5.1.1. Creating transient control groups

The transient cgroups set limits on resources consumed by a unit (service or scope) during its runtime.

Procedure

  • To create a transient control group, use the systemd-run command in the following format:

    # systemd-run --unit=<name> --slice=<name>.slice <command>

    This command creates and starts a transient service or a scope unit and runs a custom command in such a unit.

    • The --unit=<name> option gives a name to the unit. If --unit is not specified, the name is generated automatically.
    • The --slice=<name>.slice option makes your service or scope unit a member of a specified slice. Replace <name>.slice with the name of an existing slice (as shown in the output of systemctl -t slice), or create a new slice by passing a unique name. By default, services and scopes are created as members of the system.slice.
    • Replace <command> with the command you wish to execute in the service or the scope unit.

      The following message is displayed to confirm that you created and started the service or the scope successfully:

      # Running as unit <name>.service
  • Optionally, keep the unit running after its processes finished to collect run-time information:

    # systemd-run --unit=<name> --slice=<name>.slice --remain-after-exit <command>

    The command creates and starts a transient service unit and runs a custom command in such a unit. The --remain-after-exit option ensures that the service keeps running after its processes have finished.

Additional resources

9.5.1.2. Creating persistent control groups

To assign a persistent control group to a service, it is necessary to edit its unit configuration file. The configuration is preserved after the system reboot, so it can be used to manage services that are started automatically.

Procedure

  • To create a persistent control group, execute:

    # systemctl enable <name>.service

    The command above automatically creates a unit configuration file into the /usr/lib/systemd/system/ directory and by default, it assigns <name>.service to the system.slice unit.

Additional resources

9.5.2. Modifying control groups version 1 with systemd

Each persistent unit is supervised by the systemd system and service manager, and has a unit configuration file in the /usr/lib/systemd/system/ directory. To change the resource control settings of the persistent units, modify its unit configuration file either manually in a text editor or from the command-line interface.

9.5.2.1. Configuring memory resource control settings on the command-line

Executing commands in the command-line interface is one of the ways how to set limits, prioritize, or control access to hardware resources for groups of processes.

Procedure

  • To limit the memory usage of a service, run the following:

    # systemctl set-property example.service MemoryLimit=1500K

    The command instantly assigns the memory limit of 1,500 kilobytes to processes executed in a control group the example.service service belongs to. The MemoryLimit parameter, in this configuration variant, is defined in the /etc/systemd/system.control/example.service.d/50-MemoryLimit.conf file and controls the value of the /sys/fs/cgroup/memory/system.slice/example.service/memory.limit_in_bytes file.

  • Optionally, to temporarily limit the memory usage of a service, run:

    # systemctl set-property --runtime example.service MemoryLimit=1500K

    The command instantly assigns the memory limit to the example.service service. The MemoryLimit parameter is defined until the next reboot in the /run/systemd/system.control/example.service.d/50-MemoryLimit.conf file. With a reboot, the whole /run/systemd/system.control/ directory and MemoryLimit are removed.

Note

The 50-MemoryLimit.conf file stores the memory limit as a multiple of 4096 bytes - one kernel page size specific for AMD64 and Intel 64. The actual number of bytes depends on a CPU architecture.

Additional resources

9.5.2.2. Configuring memory resource control settings with unit files

Manually modifying unit files is one of the ways how to set limits, prioritize, or control access to hardware resources for groups of processes.

Procedure

  1. To limit the memory usage of a service, modify the /usr/lib/systemd/system/example.service file as follows:

    …​
    [Service]
    MemoryLimit=1500K
    …​

    The configuration above places a limit on maximum memory consumption of processes executed in a control group, which example.service is part of.

    Note

    Use suffixes K, M, G, or T to identify Kilobyte, Megabyte, Gigabyte, or Terabyte as a unit of measurement.

  2. Reload all unit configuration files:

    # systemctl daemon-reload
  3. Restart the service:

    # systemctl restart example.service
  4. Reboot the system.
  5. Optionally, check that the changes took effect:

    # cat /sys/fs/cgroup/memory/system.slice/example.service/memory.limit_in_bytes
    1536000

    The example output shows that the memory consumption was limited at around 1,500 Kilobytes.

    Note

    The memory.limit_in_bytes file stores the memory limit as a multiple of 4096 bytes - one kernel page size specific for AMD64 and Intel 64. The actual number of bytes depends on a CPU architecture.

Additional resources

9.5.3. Removing control groups version 1 with systemd

You can use the systemd system and service manager to remove transient and persistent control groups (cgroups) if you no longer need to limit, prioritize, or control access to hardware resources for groups of processes.

9.5.3.1. Removing transient control groups

Transient cgroups are automatically released once all the processes that a service or a scope unit contains, finish.

Procedure

  • To stop the service unit with all its processes, execute:

    # systemctl stop <name>.service
  • To terminate one or more of the unit processes, execute:

    # systemctl kill <name>.service --kill-who=PID,…​ --signal=signal

    The command above uses the --kill-who option to select process(es) from the control group you wish to terminate. To kill multiple processes at the same time, pass a comma-separated list of PIDs. The --signal option determines the type of POSIX signal to be sent to the specified processes. The default signal is SIGTERM.

Additional resources

9.5.3.2. Removing persistent control groups

Persistent cgroups are released when a service or a scope unit is stopped or disabled and its configuration file is deleted.

Procedure

  1. Stop the service unit:

    # systemctl stop <name>.service
  2. Disable the service unit:

    # systemctl disable <name>.service
  3. Remove the relevant unit configuration file:

    # rm /usr/lib/systemd/system/<name>.service
  4. Reload all unit configuration files so that changes take effect:

    # systemctl daemon-reload

Additional resources

9.6. Obtaining information about control groups version 1

The following sections describe how to display various information about control groups (cgroups):

  • Listing systemd units and viewing their status
  • Viewing the cgroups hierarchy
  • Monitoring resource consumption in real time

9.6.1. Listing systemd units

The following procedure describes how to use the systemd system and service manager to list its units.

Procedure

  • To list all active units on the system, execute the # systemctl command and the terminal will return an output similar to the following example:

    UNIT                                                LOAD   ACTIVE SUB       DESCRIPTION
    …​
    init.scope                                          loaded active running   System and Service Manager
    session-2.scope                                     loaded active running   Session 2 of user jdoe
    abrt-ccpp.service                                   loaded active exited    Install ABRT coredump hook
    abrt-oops.service                                   loaded active running   ABRT kernel log watcher
    abrt-vmcore.service                                 loaded active exited    Harvest vmcores for ABRT
    abrt-xorg.service                                   loaded active running   ABRT Xorg log watcher
    …​
    -.slice                                             loaded active active    Root Slice
    machine.slice                                       loaded active active    Virtual Machine and Container Slice system-getty.slice                                                                       loaded active active    system-getty.slice
    system-lvm2\x2dpvscan.slice                         loaded active active    system-lvm2\x2dpvscan.slice
    system-sshd\x2dkeygen.slice                         loaded active active    system-sshd\x2dkeygen.slice
    system-systemd\x2dhibernate\x2dresume.slice         loaded active active    system-systemd\x2dhibernate\x2dresume>
    system-user\x2druntime\x2ddir.slice                 loaded active active    system-user\x2druntime\x2ddir.slice
    system.slice                                        loaded active active    System Slice
    user-1000.slice                                     loaded active active    User Slice of UID 1000
    user-42.slice                                       loaded active active    User Slice of UID 42
    user.slice                                          loaded active active    User and Session Slice
    …​
    • UNIT - a name of a unit that also reflects the unit position in a control group hierarchy. The units relevant for resource control are a slice, a scope, and a service.
    • LOAD - indicates whether the unit configuration file was properly loaded. If the unit file failed to load, the field contains the state error instead of loaded. Other unit load states are: stub, merged, and masked.
    • ACTIVE - the high-level unit activation state, which is a generalization of SUB.
    • SUB - the low-level unit activation state. The range of possible values depends on the unit type.
    • DESCRIPTION - the description of the unit content and functionality.
  • To list inactive units, execute:

    # systemctl --all
  • To limit the amount of information in the output, execute:

    # systemctl --type service,masked

    The --type option requires a comma-separated list of unit types such as a service and a slice, or unit load states such as loaded and masked.

Additional resources

9.6.2. Viewing a control group version 1 hierarchy

The following procedure describes how to display control groups (cgroups) hierarchy and processes running in specific cgroups.

Procedure

  • To display the whole cgroups hierarchy on your system, execute # systemd-cgls:

    Control group /:
    -.slice
    ├─user.slice
    │ ├─user-42.slice
    │ │ ├─session-c1.scope
    │ │ │ ├─ 965 gdm-session-worker [pam/gdm-launch-environment]
    │ │ │ ├─1040 /usr/libexec/gdm-x-session gnome-session --autostart /usr/share/gdm/greeter/autostart
    …​
    ├─init.scope
    │ └─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 18
    └─system.slice
      …​
      ├─example.service
      │ ├─6882 /bin/bash /home/jdoe/example.sh
      │ └─6902 sleep 1
      ├─systemd-journald.service
        └─629 /usr/lib/systemd/systemd-journald
      …​

    The example output returns the entire cgroups hierarchy, where the highest level is formed by slices.

  • To display the cgroups hierarchy filtered by a resource controller, execute # systemd-cgls <resource_controller>:

    # systemd-cgls memory
    Controller memory; Control group /:
    ├─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 18
    ├─user.slice
    │ ├─user-42.slice
    │ │ ├─session-c1.scope
    │ │ │ ├─ 965 gdm-session-worker [pam/gdm-launch-environment]
    …​
    └─system.slice
      |
      …​
      ├─chronyd.service
      │ └─844 /usr/sbin/chronyd
      ├─example.service
      │ ├─8914 /bin/bash /home/jdoe/example.sh
      │ └─8916 sleep 1
      …​

    The example output of the above command lists the services that interact with the selected controller.

  • To display detailed information about a certain unit and its part of the cgroups hierarchy, execute # systemctl status <system_unit>:

    # systemctl status example.service
    ● example.service - My example service
       Loaded: loaded (/usr/lib/systemd/system/example.service; enabled; vendor preset: disabled)
       Active: active (running) since Tue 2019-04-16 12:12:39 CEST; 3s ago
     Main PID: 17737 (bash)
        Tasks: 2 (limit: 11522)
       Memory: 496.0K (limit: 1.5M)
       CGroup: /system.slice/example.service
               ├─17737 /bin/bash /home/jdoe/example.sh
               └─17743 sleep 1
    Apr 16 12:12:39 redhat systemd[1]: Started My example service.
    Apr 16 12:12:39 redhat bash[17737]: The current time is Tue Apr 16 12:12:39 CEST 2019
    Apr 16 12:12:40 redhat bash[17737]: The current time is Tue Apr 16 12:12:40 CEST 2019

Additional resources

9.6.3. Viewing resource controllers

The following procedure describes how to learn which processes use which resource controllers.

Procedure

  1. To view which resource controllers a process interacts with, execute the # cat proc/<PID>/cgroup command:

    # cat /proc/11269/cgroup
    12:freezer:/
    11:cpuset:/
    10:devices:/system.slice
    9:memory:/system.slice/example.service
    8:pids:/system.slice/example.service
    7:hugetlb:/
    6:rdma:/
    5:perf_event:/
    4:cpu,cpuacct:/
    3:net_cls,net_prio:/
    2:blkio:/
    1:name=systemd:/system.slice/example.service

    The example output relates to a process of interest. In this case, it is a process identified by PID 11269, which belongs to the example.service unit. You can determine whether the process was placed in a correct control group as defined by the systemd unit file specifications.

    Note

    By default, the items and their ordering in the list of resource controllers is the same for all units started by systemd, since it automatically mounts all the default resource controllers.

Additional resources

  • For more information about resource controllers in general refer to the cgroups(7) manual pages.
  • For a detailed description of specific resource controllers, see the documentation in the /usr/share/doc/kernel-doc-<kernel_version>/Documentation/cgroups-v1/ directory.

9.6.4. Monitoring resource consumption

The following procedure describes how to view a list of currently running control groups (cgroups) and their resource consumption in real-time.

Procedure

  1. To see a dynamic account of currently running cgroups, execute the # systemd-cgtop command:

    Control Group                            Tasks   %CPU   Memory  Input/s Output/s
    /                                          607   29.8     1.5G        -        -
    /system.slice                              125      -   428.7M        -        -
    /system.slice/ModemManager.service           3      -     8.6M        -        -
    /system.slice/NetworkManager.service         3      -    12.8M        -        -
    /system.slice/accounts-daemon.service        3      -     1.8M        -        -
    /system.slice/boot.mount                     -      -    48.0K        -        -
    /system.slice/chronyd.service                1      -     2.0M        -        -
    /system.slice/cockpit.socket                 -      -     1.3M        -        -
    /system.slice/colord.service                 3      -     3.5M        -        -
    /system.slice/crond.service                  1      -     1.8M        -        -
    /system.slice/cups.service                   1      -     3.1M        -        -
    /system.slice/dev-hugepages.mount            -      -   244.0K        -        -
    /system.slice/dev-mapper-rhel\x2dswap.swap   -      -   912.0K        -        -
    /system.slice/dev-mqueue.mount               -      -    48.0K        -        -
    /system.slice/example.service                2      -     2.0M        -        -
    /system.slice/firewalld.service              2      -    28.8M        -        -
    ...

    The example output displays currently running cgroups ordered by their resource usage (CPU, memory, disk I/O load). The list refreshes every 1 second by default. Therefore, it offers a dynamic insight into the actual resource usage of each control group.

Additional resources

  • For more information about dynamic monitoring of resource usage, see the systemd-cgtop(1) manual pages.

9.7. What namespaces are

Namespaces are one of the most important methods for organizing and identifying software objects.

A namespace wraps a global system resource (for example a mount point, a network device, or a hostname) in an abstraction that makes it appear to processes within the namespace that they have their own isolated instance of the global resource. One of the most common technologies that utilize namespaces are containers.

Changes to a particular global resource are visible only to processes in that namespace and do not affect the rest of the system or other namespaces.

To inspect which namespaces a process is a member of, you can check the symbolic links in the /proc/<PID>/ns/ directory.

The following table shows supported namespaces and resources which they isolate:

NamespaceIsolates

Mount

Mount points

UTS

Hostname and NIS domain name

IPC

System V IPC, POSIX message queues

PID

Process IDs

Network

Network devices, stacks, ports, etc

User

User and group IDs

Control groups

Control group root directory

Additional resources



[1] Linux Control Group v2 - An Introduction, Devconf.cz 2019 presentation by Waiman Long