Chapter 16. Setting limits for applications

You can use the control groups (cgroups) kernel functionality to set limits, prioritize or isolate the hardware resources of processes. This allows you to granularly control resource usage of applications to utilize them more efficiently.

16.1. Understanding control groups

Control groups is a Linux kernel feature that enables you to organize processes into hierarchically ordered groups - cgroups. The hierarchy (control groups tree) is defined by providing structure to cgroups virtual file system, mounted by default on the /sys/fs/cgroup/ directory. It is done manually by creating and removing sub-directories in /sys/fs/cgroup/. Alternatively, by using the systemd system and service manager.

The resource controllers (a kernel component) then modify the behavior of processes in cgroups by limiting, prioritizing or allocating system resources, (such as CPU time, memory, network bandwidth, or various combinations) of those processes.

The added value of cgroups is process aggregation which enables division of hardware resources among applications and users. Thereby an increase in overall efficiency, stability and security of users' environment can be achieved.

Control groups version 1

Control groups version 1 (cgroups-v1) provide a per-resource controller hierarchy. It means that each resource, such as CPU, memory, I/O, and so on, has its own control group hierarchy. It is possible to combine different control group hierarchies in a way that one controller can coordinate with another one in managing their respective resources. However, the two controllers may belong to different process hierarchies, which does not permit their proper coordination.

The cgroups-v1 controllers were developed across a large time span and as a result, the behavior and naming of their control files is not uniform.

Control groups version 2

The problems with controller coordination, which stemmed from hierarchy flexibility, led to the development of control groups version 2.

Control groups version 2 (cgroups-v2) provides a single control group hierarchy against which all resource controllers are mounted.

The control file behavior and naming is consistent among different controllers.

This sub-section was based on a Devconf.cz 2019 presentation.[1]

Additional resources

16.2. What kernel resource controllers are

The functionality of control groups is enabled by kernel resource controllers. RHEL 9 supports various controllers for control groups version 1 (cgroups-v1) and control groups version 2 (cgroups-v2).

A resource controller, also called a control group subsystem, is a kernel subsystem that represents a single resource, such as CPU time, memory, network bandwidth or disk I/O. The Linux kernel provides a range of resource controllers that are mounted automatically by the systemd system and service manager. Find a list of currently mounted resource controllers in the /proc/cgroups file.

The following controllers are available for cgroups-v1:

  • blkio - can set limits on input/output access to and from block devices.
  • cpu - can adjust the parameters of the Completely Fair Scheduler (CFS) scheduler for control group’s tasks. It is mounted together with the cpuacct controller on the same mount.
  • cpuacct - creates automatic reports on CPU resources used by tasks in a control group. It is mounted together with the cpu controller on the same mount.
  • cpuset - can be used to restrict control group tasks to run only on a specified subset of CPUs and to direct the tasks to use memory only on specified memory nodes.
  • devices - can control access to devices for tasks in a control group.
  • freezer - can be used to suspend or resume tasks in a control group.
  • memory - can be used to set limits on memory use by tasks in a control group and generates automatic reports on memory resources used by those tasks.
  • net_cls - tags network packets with a class identifier (classid) that enables the Linux traffic controller (the tc command) to identify packets that originate from a particular control group task. A subsystem of net_cls, the net_filter (iptables), can also use this tag to perform actions on such packets. The net_filter tags network sockets with a firewall identifier (fwid) that allows the Linux firewall (through iptables command) to identify packets originating from a particular control group task.
  • net_prio - sets the priority of network traffic.
  • pids - can set limits for a number of processes and their children in a control group.
  • perf_event - can group tasks for monitoring by the perf performance monitoring and reporting utility.
  • rdma - can set limits on Remote Direct Memory Access/InfiniBand specific resources in a control group.
  • hugetlb - can be used to limit the usage of large size virtual memory pages by tasks in a control group.

The following controllers are available for cgroups-v2:

  • io - A follow-up to blkio of cgroups-v1.
  • memory - A follow-up to memory of cgroups-v1.
  • pids - Same as pids in cgroups-v1.
  • rdma - Same as rdma in cgroups-v1.
  • cpu - A follow-up to cpu and cpuacct of cgroups-v1.
  • cpuset - Supports only the core functionality (cpus{,.effective}, mems{,.effective}) with a new partition feature.
  • perf_event - Support is inherent, no explicit control file. You can specify a v2 cgroup as a parameter to the perf command that will profile all the tasks within that cgroup.
Important

A resource controller can be used either in a cgroups-v1 hierarchy or a cgroups-v2 hierarchy, not simultaneously in both.

Additional resources

  • cgroups(7) manual page
  • Documentation in /usr/share/doc/kernel-doc-<kernel_version>/Documentation/cgroups-v1/ directory

16.3. What namespaces are

Namespaces are one of the most important methods for organizing and identifying software objects.

A namespace wraps a global system resource (for example a mount point, a network device, or a hostname) in an abstraction that makes it appear to processes within the namespace that they have their own isolated instance of the global resource. One of the most common technologies that utilize namespaces are containers.

Changes to a particular global resource are visible only to processes in that namespace and do not affect the rest of the system or other namespaces.

To inspect which namespaces a process is a member of, you can check the symbolic links in the /proc/<PID>/ns/ directory.

The following table shows supported namespaces and resources which they isolate:

NamespaceIsolates

Mount

Mount points

UTS

Hostname and NIS domain name

IPC

System V IPC, POSIX message queues

PID

Process IDs

Network

Network devices, stacks, ports, etc

User

User and group IDs

Control groups

Control group root directory

Additional resources

16.4. Setting CPU limits to applications using cgroups-v1

Sometimes an application consumes a lot of CPU time, which may negatively impact the overall health of your environment. Use the /sys/fs/ virtual file system to configure CPU limits to an application using control groups version 1 (cgroups-v1).

Prerequisites

  • You have root permissions.
  • You have an application whose CPU consumption you want to restrict.
  • You configured the system to mount cgroups-v1 by default during system boot by the systemd system and service manager:

    # grubby --update-kernel=/boot/vmlinuz-$(uname -r) --args="systemd.unified_cgroup_hierarchy=0 systemd.legacy_systemd_cgroup_controller"

    This adds the necessary kernel command-line parameters to the current boot entry.

Procedure

  1. Identify the process ID (PID) of the application you want to restrict in CPU consumption:

    # top
    top - 11:34:09 up 11 min,  1 user,  load average: 0.51, 0.27, 0.22
    Tasks: 267 total,   3 running, 264 sleeping,   0 stopped,   0 zombie
    %Cpu(s): 49.0 us,  3.3 sy,  0.0 ni, 47.5 id,  0.0 wa,  0.2 hi,  0.0 si,  0.0 st
    MiB Mem :   1826.8 total,    303.4 free,   1046.8 used,    476.5 buff/cache
    MiB Swap:   1536.0 total,   1396.0 free,    140.0 used.    616.4 avail Mem
    
      PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
     6955 root      20   0  228440   1752   1472 R  99.3   0.1   0:32.71 sha1sum
     5760 jdoe      20   0 3603868 205188  64196 S   3.7  11.0   0:17.19 gnome-shell
     6448 jdoe      20   0  743648  30640  19488 S   0.7   1.6   0:02.73 gnome-terminal-
        1 root      20   0  245300   6568   4116 S   0.3   0.4   0:01.87 systemd
      505 root      20   0       0      0      0 I   0.3   0.0   0:00.75 kworker/u4:4-events_unbound
    ...

    The example output of the top program reveals that PID 6955 (illustrative application sha1sum) consumes a lot of CPU resources.

  2. Create a sub-directory in the cpu resource controller directory:

    # mkdir /sys/fs/cgroup/cpu/Example/

    The directory above represents a control group, where you can place specific processes and apply certain CPU limits to the processes. At the same time, some cgroups-v1 interface files and cpu controller-specific files will be created in the directory.

  3. Optionally, inspect the newly created control group:

    # ll /sys/fs/cgroup/cpu/Example/
    -rw-r—​r--. 1 root root 0 Mar 11 11:42 cgroup.clone_children
    -rw-r—​r--. 1 root root 0 Mar 11 11:42 cgroup.procs
    -r—​r—​r--. 1 root root 0 Mar 11 11:42 cpuacct.stat
    -rw-r—​r--. 1 root root 0 Mar 11 11:42 cpuacct.usage
    -r—​r—​r--. 1 root root 0 Mar 11 11:42 cpuacct.usage_all
    -r—​r—​r--. 1 root root 0 Mar 11 11:42 cpuacct.usage_percpu
    -r—​r—​r--. 1 root root 0 Mar 11 11:42 cpuacct.usage_percpu_sys
    -r—​r—​r--. 1 root root 0 Mar 11 11:42 cpuacct.usage_percpu_user
    -r—​r—​r--. 1 root root 0 Mar 11 11:42 cpuacct.usage_sys
    -r—​r—​r--. 1 root root 0 Mar 11 11:42 cpuacct.usage_user
    -rw-r—​r--. 1 root root 0 Mar 11 11:42 cpu.cfs_period_us
    -rw-r—​r--. 1 root root 0 Mar 11 11:42 cpu.cfs_quota_us
    -rw-r—​r--. 1 root root 0 Mar 11 11:42 cpu.rt_period_us
    -rw-r—​r--. 1 root root 0 Mar 11 11:42 cpu.rt_runtime_us
    -rw-r—​r--. 1 root root 0 Mar 11 11:42 cpu.shares
    -r—​r—​r--. 1 root root 0 Mar 11 11:42 cpu.stat
    -rw-r—​r--. 1 root root 0 Mar 11 11:42 notify_on_release
    -rw-r—​r--. 1 root root 0 Mar 11 11:42 tasks

    The example output shows files, such as cpuacct.usage, cpu.cfs._period_us, that represent specific configurations and/or limits, which can be set for processes in the Example control group. Notice that the respective file names are prefixed with the name of the control group controller to which they belong.

    By default, the newly created control group inherits access to the system’s entire CPU resources without a limit.

  4. Configure CPU limits for the control group:

    # echo "1000000" > /sys/fs/cgroup/cpu/Example/cpu.cfs_period_us
    # echo "200000" > /sys/fs/cgroup/cpu/Example/cpu.cfs_quota_us

    The cpu.cfs_period_us file represents a period of time in microseconds (µs, represented here as "us") for how frequently a control group’s access to CPU resources should be reallocated. The upper limit is 1 second and the lower limit is 1000 microseconds.

    The cpu.cfs_quota_us file represents the total amount of time in microseconds for which all processes collectively in a control group can run during one period (as defined by cpu.cfs_period_us). As soon as processes in a control group, during a single period, use up all the time specified by the quota, they are throttled for the remainder of the period and not allowed to run until the next period. The lower limit is 1000 microseconds.

    The example commands above set the CPU time limits so that all processes collectively in the Example control group will be able to run only for 0.2 seconds (defined by cpu.cfs_quota_us) out of every 1 second (defined by cpu.cfs_period_us).

  5. Optionally, verify the limits:

    # cat /sys/fs/cgroup/cpu/Example/cpu.cfs_period_us /sys/fs/cgroup/cpu/Example/cpu.cfs_quota_us
    1000000
    200000
  6. Add the application’s PID to the Example control group:

    # echo "6955" > /sys/fs/cgroup/cpu/Example/cgroup.procs
    
    or
    
    # echo "6955" > /sys/fs/cgroup/cpu/Example/tasks

    The previous command ensures that a desired application becomes a member of the Example control group and hence does not exceed the CPU limits configured for the Example control group. The PID should represent an existing process in the system. The PID 6955 here was assigned to process sha1sum /dev/zero &, used to illustrate the use-case of the cpu controller.

  7. Verify that the application runs in the specified control group:

    # cat /proc/6955/cgroup
    12:cpuset:/
    11:hugetlb:/
    10:net_cls,net_prio:/
    9:memory:/user.slice/user-1000.slice/user@1000.service
    8:devices:/user.slice
    7:blkio:/
    6:freezer:/
    5:rdma:/
    4:pids:/user.slice/user-1000.slice/user@1000.service
    3:perf_event:/
    2:cpu,cpuacct:/Example
    1:name=systemd:/user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service

    The example output above shows that the process of the desired application runs in the Example control group, which applies CPU limits to the application’s process.

  8. Identify the current CPU consumption of your throttled application:

    # top
    top - 12:28:42 up  1:06,  1 user,  load average: 1.02, 1.02, 1.00
    Tasks: 266 total,   6 running, 260 sleeping,   0 stopped,   0 zombie
    %Cpu(s): 11.0 us,  1.2 sy,  0.0 ni, 87.5 id,  0.0 wa,  0.2 hi,  0.0 si,  0.2 st
    MiB Mem :   1826.8 total,    287.1 free,   1054.4 used,    485.3 buff/cache
    MiB Swap:   1536.0 total,   1396.7 free,    139.2 used.    608.3 avail Mem
    
      PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
     6955 root      20   0  228440   1752   1472 R  20.6   0.1  47:11.43 sha1sum
     5760 jdoe      20   0 3604956 208832  65316 R   2.3  11.2   0:43.50 gnome-shell
     6448 jdoe      20   0  743836  31736  19488 S   0.7   1.7   0:08.25 gnome-terminal-
      505 root      20   0       0      0      0 I   0.3   0.0   0:03.39 kworker/u4:4-events_unbound
     4217 root      20   0   74192   1612   1320 S   0.3   0.1   0:01.19 spice-vdagentd
    ...

    Notice that the CPU consumption of the PID 6955 has decreased from 99% to 20%.

Additional resources



[1] Linux Control Group v2 - An Introduction, Devconf.cz 2019 presentation by Waiman Long