Chapter 9. Setting limits for applications

As a system administrator, use the control groups kernel functionality to set limits, prioritize or isolate the hardware resources of processes so that applications on your system are stable and do not run out of memory.

9.1. What are control groups

Control groups is a Linux kernel feature that enables you to organize processes into hierarchically ordered groups - cgroups. The hierarchy (control groups tree) is defined by providing structure to cgroups virtual file system, mounted by default on the /sys/fs/cgroup/ directory. It is done manually by creating and removing sub-directories in /sys/fs/cgroup/. Alternatively, by using the systemd system and service manager.

The resource controllers (a kernel component) then modify the behavior of processes in cgroups by limiting, prioritizing or allocating system resources, (such as CPU time, memory, network bandwidth, or various combinations) of those processes.

The added value of cgroups is process aggregation which enables division of hardware resources among applications and users. Thereby an increase in overall efficiency, stability and security of users' environment can be achieved.

9.1.1. Control groups version 1

Control groups version 1 (cgroups-v1) provide a per-resource controller hierarchy. It means that each resource, such as CPU, memory, I/O, and so on, has its own control group hierarchy. It is possible to combine different control group hierarchies in a way that one controller can coordinate with another one in managing their respective resources. However, the two controllers may belong to different process hierarchies, which does not permit their proper coordination.

The cgroups-v1 controllers were developed across a large time span and as a result, the behavior and naming of their control files is not uniform.

This sub-section was based on a Devconf.cz 2019 presentation.[1]

9.1.2. Control groups version 2

The problems with controller coordination, which stemmed from hierarchy flexibility, led to the development of control groups version 2.

Control groups version 2 (cgroups-v2) provides a single control group hierarchy against which all resource controllers are mounted.

The control file behavior and naming is consistent among different controllers.

This sub-section was based on a Devconf.cz 2019 presentation.[2]

Warning

Red Hat Enterprise Linux 8 provides cgroups-v2 as a technology preview with a limited number of resource controllers. For more information about the relevant resource controllers, see cgroups-v2 release note.

Additional resources

  • For more information about resource controllers, see What are kernel resource controllers section and cgroups(7) manual pages.
  • For more information about cgroups hierarchies and cgroups versions, refer to cgroups(7) manual pages.

9.2. What are kernel resource controllers

This section explains the concept of resource controllers in the Linux kernel and also lists supported controllers for control groups version 1 (cgroups-v1) and control groups version 2 (cgroups-v2) in Red Hat Enterprise Linux 8.

A resource controller, also called a cgroup subsystem, represents a single resource, such as CPU time, memory, network bandwidth or disk I/O. The Linux kernel provides a range of resource controllers that are mounted automatically by the systemd system and service manager. Find a list of currently mounted resource controllers in the /proc/cgroups entry.

The following controllers are available for cgroups-v1:

  • blkio - sets limits on input/output access to and from block devices.
  • cpu - uses the CPU scheduler to provide the control group tasks with an access to the CPU. It is mounted together with the cpuacct controller on the same mount.
  • cpuacct - creates automatic reports on CPU resources used by tasks in a control group. It is mounted together with the cpu controller on the same mount.
  • cpuset - assigns individual CPUs on a multicore system and memory nodes to tasks in a control group.
  • devices - grants or denies access to devices for tasks in a control group.
  • freezer - suspends or resumes tasks in a control group.
  • memory - sets limits on memory use by tasks in a control group and generates automatic reports on memory resources used by those tasks.
  • net_cls - tags network packets with a class identifier (classid) that enables the Linux traffic controller (the tc command) to identify packets originating from a particular control group task. A subsystem of net_cls, the net_filter (iptables), can also use this tag to perform actions on such packets. The net_filter tags network sockets with a firewall identifier (fwid) that allows the Linux firewall (the iptables command) to identify packets originating from a particular control group task.
  • net_prio - sets the priority of network traffic.
  • pids - sets limits on number of processes and their children in a control group.
  • perf_event - enables monitoring cgroups with the perf tool.
  • rdma - sets limits on Remote Direct Memory Access/InfiniBand specific resources in a control group.
  • hugetlb - enables to use virtual memory pages of large sizes and to enforce resource limits on these pages.

The following controllers are available for cgroups-v2:

  • io - follow-up to blkio of cgroups-v1
  • memory - follow-up to memory of cgroups-v1
  • pids - same as pids in cgroups-v1
  • rdma - same as rdma in cgroups-v1
  • cpu - follow-up to cpu and cpuacct of cgroups-v1
Important

A given resource controller can be employed either in a cgroups-v1 hierarchy or a cgroups-v2 hierarchy, not simultaneously in both.

Additional resources

  • For more information about resource controllers in general, refer to the cgroups(7) manual page.
  • For detailed descriptions of specific resource controllers, see the documentation in the /usr/share/doc/kernel-doc-<kernel_version>/Documentation/cgroups-v1/ directory.
  • For more information about cgroups-v2, refer to the cgroups(7) manual page.

9.3. What are namespaces

This section explains the concept of namespaces, their connection to control groups and resource management.

Namespaces are a kernel feature that enables a virtual view of isolated system resources through the /proc/self/ns/cgroup interface. By isolating a process from system resources, you can specify and control what a process is able to interact with.

The purpose is to prevent leakage of privileged data from the global namespaces to cgroup and to enable other features, such as container migration.

The following namespaces are supported:

  • Mount

    • The mount namespace isolates file system mount points, enabling each process to have a distinct filesystem space within wich to operate.
  • UTS

    • Hostname and NIS domain name
  • IPC

    • System V IPC, POSIX message queues
  • PID

    • Process IDs
  • Network

    • Network devices, stacks, ports, etc.
  • User

    • User and group IDs
  • Control groups

    • Isolates cgroups

Additional resources

  • For more information about namespaces, see the namespaces(7) and cgroup_namespaces(7) manual pages.
  • For more information about cgroups, see What are control groups.

9.4. Using control groups through a virtual file system

The following sections provide an overview of tasks related to creation, modification and removal of control groups (cgroups) using the /sys/fs/ virtual file system.

9.4.1. Setting memory limits to applications through cgroups-v1

This procedure describes how to use the /sys/fs/ virtual file system to configure a memory limit to an application through control groups version 1 (cgroups-v1).

Prerequisites

Procedure

  1. Create a sub-directory in the memory resource controller directory:

    # mkdir /sys/fs/cgroup/memory/example/

    The directory above represents a control group, where you can place specific processes and apply certain memory limits to the processes.

  2. Optionally, investigate the newly created control group:

    # ll /sys/fs/cgroup/memory/example/
    -rw-r—​r--. 1 root root 0 Apr 25 16:34 cgroup.clone_children
    --w—​w—​w-. 1 root root 0 Apr 25 16:34 cgroup.event_control
    -rw-r—​r--. 1 root root 0 Apr 25 16:42 cgroup.procs
    …​

    The example output shows files that the example control group inherited from its parent resource controller. By default, the newly created control group inherited access to the system’s entire memory without a limit.

  3. Configure a memory limit of the control group:

    # echo 700000 > /sys/fs/cgroup/memory/example/memory.limit_in_bytes

    The example command sets the memory limit to 700 Kilobytes.

  4. Verify the limit:

    # cat /sys/fs/cgroup/memory/example/memory.limit_in_bytes
    696320

    The example output displays the memory limit value as a multiple of 4096 bytes - one kernel page size.

  5. Add the application’s PID to the control group:

    # echo 23453 > /sys/fs/cgroup/memory/example/cgroup.procs

    The example command ensures that a desired application does not exceed a memory limit configured in the control group. Your PID should come from an existing process in the system, PID 23453 here is fictional.

  6. Verify that the application runs in the specified control group:

    # ps -o cgroup 23453
    CGROUP
    11:memory:/example,5:devices:/system.slice/example.service,4:pids:/system.slice/example.service,1:name=systemd:/system.slice/example.service

    The example output above shows that the process of the desired application runs in the example control group, which applies a memory limit to the application’s process.

Additional resources

  • For more information about resource controllers, see the What are kernel resource controllers section and the cgroups(7) manual page.
  • For more information about /sys/fs/, see the sysfs(5) manual page.


[1] Linux Control Group v2 - An Introduction, Devconf.cz 2019 presentation by Waiman Long
[2] Linux Control Group v2 - An Introduction, Devconf.cz 2019 presentation by Waiman Long