Red Hat Training

A Red Hat training course is available for RHEL 8

Chapter 25. Using control groups version 1 with systemd

You can manage cgroups with the systemd system and service manager and the utilities they provide. This is also the preferred way of the cgroups management.

25.1. Role of systemd in control groups version 1

RHEL 8 moves the resource management settings from the process level to the application level by binding the system of cgroup hierarchies with the systemd unit tree. Therefore, you can manage the system resources with the systemctl command, or by modifying the systemd unit files.

By default, the systemd system and service manager makes use of the slice, the scope and the service units to organize and structure processes in the control groups. The systemctl command enables you to further modify this structure by creating custom slices. Also, systemd automatically mounts hierarchies for important kernel resource controllers in the /sys/fs/cgroup/ directory.

Three systemd unit types are used for resource control:

  • Service - A process or a group of processes, which systemd started according to a unit configuration file. Services encapsulate the specified processes so that they can be started and stopped as one set. Services are named in the following way:

    <name>.service
  • Scope - A group of externally created processes. Scopes encapsulate processes that are started and stopped by the arbitrary processes through the fork() function and then registered by systemd at runtime. For example, user sessions, containers, and virtual machines are treated as scopes. Scopes are named as follows:

    <name>.scope
  • Slice - A group of hierarchically organized units. Slices organize a hierarchy in which scopes and services are placed. The actual processes are contained in scopes or in services. Every name of a slice unit corresponds to the path to a location in the hierarchy. The dash ("-") character acts as a separator of the path components to a slice from the -.slice root slice. In the following example:

    <parent-name>.slice

    parent-name.slice is a sub-slice of parent.slice, which is a sub-slice of the -.slice root slice. parent-name.slice can have its own sub-slice named parent-name-name2.slice, and so on.

The service, the scope, and the slice units directly map to objects in the control group hierarchy. When these units are activated, they map directly to control group paths built from the unit names.

The following is an abbreviated example of a control group hierarchy:

Control group /:
-.slice
├─user.slice
│ ├─user-42.slice
│ │ ├─session-c1.scope
│ │ │ ├─ 967 gdm-session-worker [pam/gdm-launch-environment]
│ │ │ ├─1035 /usr/libexec/gdm-x-session gnome-session --autostart /usr/share/gdm/greeter/autostart
│ │ │ ├─1054 /usr/libexec/Xorg vt1 -displayfd 3 -auth /run/user/42/gdm/Xauthority -background none -noreset -keeptty -verbose 3
│ │ │ ├─1212 /usr/libexec/gnome-session-binary --autostart /usr/share/gdm/greeter/autostart
│ │ │ ├─1369 /usr/bin/gnome-shell
│ │ │ ├─1732 ibus-daemon --xim --panel disable
│ │ │ ├─1752 /usr/libexec/ibus-dconf
│ │ │ ├─1762 /usr/libexec/ibus-x11 --kill-daemon
│ │ │ ├─1912 /usr/libexec/gsd-xsettings
│ │ │ ├─1917 /usr/libexec/gsd-a11y-settings
│ │ │ ├─1920 /usr/libexec/gsd-clipboard
…​
├─init.scope
│ └─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 18
└─system.slice
  ├─rngd.service
  │ └─800 /sbin/rngd -f
  ├─systemd-udevd.service
  │ └─659 /usr/lib/systemd/systemd-udevd
  ├─chronyd.service
  │ └─823 /usr/sbin/chronyd
  ├─auditd.service
  │ ├─761 /sbin/auditd
  │ └─763 /usr/sbin/sedispatch
  ├─accounts-daemon.service
  │ └─876 /usr/libexec/accounts-daemon
  ├─example.service
  │ ├─ 929 /bin/bash /home/jdoe/example.sh
  │ └─4902 sleep 1
  …​

The example above shows that services and scopes contain processes and are placed in slices that do not contain processes of their own.

Additional resources

25.2. Creating transient control groups

The transient cgroups set limits on resources consumed by a unit (service or scope) during its runtime.

Procedure

  • To create a transient control group, use the systemd-run command in the following format:

    # systemd-run --unit=<name> --slice=<name>.slice <command>

    This command creates and starts a transient service or a scope unit and runs a custom command in such a unit.

    • The --unit=<name> option gives a name to the unit. If --unit is not specified, the name is generated automatically.
    • The --slice=<name>.slice option makes your service or scope unit a member of a specified slice. Replace <name>.slice with the name of an existing slice (as shown in the output of systemctl -t slice), or create a new slice by passing a unique name. By default, services and scopes are created as members of the system.slice.
    • Replace <command> with the command you want to enter in the service or the scope unit.

      The following message is displayed to confirm that you created and started the service or the scope successfully:

      # Running as unit <name>.service
  • Optional: Keep the unit running after its processes finished to collect run-time information:

    # systemd-run --unit=<name> --slice=<name>.slice --remain-after-exit <command>

    The command creates and starts a transient service unit and runs a custom command in the unit. The --remain-after-exit option ensures that the service keeps running after its processes have finished.

Additional resources

25.3. Creating persistent control groups

To assign a persistent control group to a service, it is necessary to edit its unit configuration file. The configuration is preserved after the system reboot, so it can be used to manage services that are started automatically.

Procedure

  • To create a persistent control group, enter:

    # systemctl enable <name>.service

    The command above automatically creates a unit configuration file into the /usr/lib/systemd/system/ directory and by default, it assigns <name>.service to the system.slice unit.

25.4. Configuring memory resource control settings on the command-line

Executing commands in the command-line interface is one of the ways how to set limits, prioritize, or control access to hardware resources for groups of processes.

Procedure

  • To limit the memory usage of a service, run the following:

    # systemctl set-property example.service MemoryMax=1500K

    The command instantly assigns the memory limit of 1,500 KB to processes executed in a control group the example.service service belongs to. The MemoryMax parameter, in this configuration variant, is defined in the /etc/systemd/system.control/example.service.d/50-MemoryMax.conf file and controls the value of the /sys/fs/cgroup/memory/system.slice/example.service/memory.limit_in_bytes file.

  • Optionally, to temporarily limit the memory usage of a service, run:

    # systemctl set-property --runtime example.service MemoryMax=1500K

    The command instantly assigns the memory limit to the example.service service. The MemoryMax parameter is defined until the next reboot in the /run/systemd/system.control/example.service.d/50-MemoryMax.conf file. With a reboot, the whole /run/systemd/system.control/ directory and MemoryMax are removed.

Note

The 50-MemoryMax.conf file stores the memory limit as a multiple of 4096 bytes - one kernel page size specific for AMD64 and Intel 64. The actual number of bytes depends on a CPU architecture.

Additional resources

25.5. Configuring memory resource control settings with unit files

Each persistent unit is supervised by the systemd system and service manager, and has a unit configuration file in the /usr/lib/systemd/system/ directory. To change the resource control settings of the persistent units, modify its unit configuration file either manually in a text editor or from the command-line interface.

Manually modifying unit files is one of the ways how to set limits, prioritize, or control access to hardware resources for groups of processes.

Procedure

  1. To limit the memory usage of a service, modify the /usr/lib/systemd/system/example.service file as follows:

    …​
    [Service]
    MemoryMax=1500K
    …​

    The configuration above places a limit on maximum memory consumption of processes executed in a control group, which example.service is part of.

    Note

    Use suffixes K, M, G, or T to identify Kilobyte, Megabyte, Gigabyte, or Terabyte as a unit of measurement.

  2. Reload all unit configuration files:

    # systemctl daemon-reload
  3. Restart the service:

    # systemctl restart example.service
  4. Reboot the system.
  5. Optionally, check that the changes took effect:

    # cat /sys/fs/cgroup/memory/system.slice/example.service/memory.limit_in_bytes
    1536000

    The example output shows that the memory consumption was limited at around 1,500 KB.

    Note

    The memory.limit_in_bytes file stores the memory limit as a multiple of 4096 bytes - one kernel page size specific for AMD64 and Intel 64. The actual number of bytes depends on a CPU architecture.

25.6. Removing transient control groups

You can use the systemd system and service manager to remove transient control groups (cgroups) if you no longer need to limit, prioritize, or control access to hardware resources for groups of processes.

Transient cgroups are automatically released once all the processes that a service or a scope unit contains finish.

Procedure

  • To stop the service unit with all its processes, enter:

    # systemctl stop <name>.service
  • To terminate one or more of the unit processes, enter:

    # systemctl kill <name>.service --kill-who=PID,…​ --signal=<signal>

    The command uses the --kill-who option to select process(es) from the control group you want to terminate. To kill multiple processes at the same time, pass a comma-separated list of PIDs. The --signal option determines the type of POSIX signal to be sent to the specified processes. The default signal is SIGTERM.

Additional resources

25.7. Removing persistent control groups

You can use the systemd system and service manager to remove persistent control groups (cgroups) if you no longer need to limit, prioritize, or control access to hardware resources for groups of processes.

Persistent cgroups are released when a service or a scope unit is stopped or disabled and its configuration file is deleted.

Procedure

  1. Stop the service unit:

    # systemctl stop <name>.service
  2. Disable the service unit:

    # systemctl disable <name>.service
  3. Remove the relevant unit configuration file:

    # rm /usr/lib/systemd/system/<name>.service
  4. Reload all unit configuration files so that changes take effect:

    # systemctl daemon-reload

Additional resources

25.8. Listing systemd units

Use the systemd system and service manager to list its units.

Procedure

  • List all active units on the system with the systemctl utility. The terminal returns an output similar to the following example:

    # systemctl
    UNIT                                                LOAD   ACTIVE SUB       DESCRIPTION
    …​
    init.scope                                          loaded active running   System and Service Manager
    session-2.scope                                     loaded active running   Session 2 of user jdoe
    abrt-ccpp.service                                   loaded active exited    Install ABRT coredump hook
    abrt-oops.service                                   loaded active running   ABRT kernel log watcher
    abrt-vmcore.service                                 loaded active exited    Harvest vmcores for ABRT
    abrt-xorg.service                                   loaded active running   ABRT Xorg log watcher
    …​
    -.slice                                             loaded active active    Root Slice
    machine.slice                                       loaded active active    Virtual Machine and Container Slice system-getty.slice                                                                       loaded active active    system-getty.slice
    system-lvm2\x2dpvscan.slice                         loaded active active    system-lvm2\x2dpvscan.slice
    system-sshd\x2dkeygen.slice                         loaded active active    system-sshd\x2dkeygen.slice
    system-systemd\x2dhibernate\x2dresume.slice         loaded active active    system-systemd\x2dhibernate\x2dresume>
    system-user\x2druntime\x2ddir.slice                 loaded active active    system-user\x2druntime\x2ddir.slice
    system.slice                                        loaded active active    System Slice
    user-1000.slice                                     loaded active active    User Slice of UID 1000
    user-42.slice                                       loaded active active    User Slice of UID 42
    user.slice                                          loaded active active    User and Session Slice
    …​
    UNIT
    A name of a unit that also reflects the unit position in a control group hierarchy. The units relevant for resource control are a slice, a scope, and a service.
    LOAD
    Indicates whether the unit configuration file was properly loaded. If the unit file failed to load, the field contains the state error instead of loaded. Other unit load states are: stub, merged, and masked.
    ACTIVE
    The high-level unit activation state, which is a generalization of SUB.
    SUB
    The low-level unit activation state. The range of possible values depends on the unit type.
    DESCRIPTION
    The description of the unit content and functionality.
  • List all active and inactive units:

    # systemctl --all
  • Limit the amount of information in the output:

    # systemctl --type service,masked

    The --type option requires a comma-separated list of unit types such as a service and a slice, or unit load states such as loaded and masked.

Additional resources

25.9. Viewing systemd cgroups hierarchy

Display control groups (cgroups) hierarchy and processes running in specific cgroups.

Procedure

  • Display the whole cgroups hierarchy on your system with the systemd-cgls command.

    # systemd-cgls
    Control group /:
    -.slice
    ├─user.slice
    │ ├─user-42.slice
    │ │ ├─session-c1.scope
    │ │ │ ├─ 965 gdm-session-worker [pam/gdm-launch-environment]
    │ │ │ ├─1040 /usr/libexec/gdm-x-session gnome-session --autostart /usr/share/gdm/greeter/autostart
    …​
    ├─init.scope
    │ └─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 18
    └─system.slice
      …​
      ├─example.service
      │ ├─6882 /bin/bash /home/jdoe/example.sh
      │ └─6902 sleep 1
      ├─systemd-journald.service
        └─629 /usr/lib/systemd/systemd-journald
      …​

    The example output returns the entire cgroups hierarchy, where the highest level is formed by slices.

  • Display the cgroups hierarchy filtered by a resource controller with the systemd-cgls <resource_controller> command.

    # systemd-cgls memory
    Controller memory; Control group /:
    ├─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 18
    ├─user.slice
    │ ├─user-42.slice
    │ │ ├─session-c1.scope
    │ │ │ ├─ 965 gdm-session-worker [pam/gdm-launch-environment]
    …​
    └─system.slice
      |
      …​
      ├─chronyd.service
      │ └─844 /usr/sbin/chronyd
      ├─example.service
      │ ├─8914 /bin/bash /home/jdoe/example.sh
      │ └─8916 sleep 1
      …​

    The example output lists the services that interact with the selected controller.

  • Display detailed information about a certain unit and its part of the cgroups hierarchy with the systemctl status <system_unit> command.

    # systemctl status example.service
    ● example.service - My example service
       Loaded: loaded (/usr/lib/systemd/system/example.service; enabled; vendor preset: disabled)
       Active: active (running) since Tue 2019-04-16 12:12:39 CEST; 3s ago
     Main PID: 17737 (bash)
        Tasks: 2 (limit: 11522)
       Memory: 496.0K (limit: 1.5M)
       CGroup: /system.slice/example.service
               ├─17737 /bin/bash /home/jdoe/example.sh
               └─17743 sleep 1
    Apr 16 12:12:39 redhat systemd[1]: Started My example service.
    Apr 16 12:12:39 redhat bash[17737]: The current time is Tue Apr 16 12:12:39 CEST 2019
    Apr 16 12:12:40 redhat bash[17737]: The current time is Tue Apr 16 12:12:40 CEST 2019

Additional resources

25.10. Viewing resource controllers

Find out which processes use which resource controllers.

Procedure

  1. View which resource controllers a process interacts with, enter the cat proc/<PID>/cgroup command.

    # cat /proc/11269/cgroup
    12:freezer:/
    11:cpuset:/
    10:devices:/system.slice
    9:memory:/system.slice/example.service
    8:pids:/system.slice/example.service
    7:hugetlb:/
    6:rdma:/
    5:perf_event:/
    4:cpu,cpuacct:/
    3:net_cls,net_prio:/
    2:blkio:/
    1:name=systemd:/system.slice/example.service

    The example output relates to a process of interest. In this case, it is a process identified by PID 11269, which belongs to the example.service unit. You can determine whether the process was placed in a correct control group as defined by the systemd unit file specifications.

    Note

    By default, the items and their ordering in the list of resource controllers is the same for all units started by systemd, since it automatically mounts all the default resource controllers.

Additional resources

  • The cgroups(7) manual page
  • Documentation in the /usr/share/doc/kernel-doc-<kernel_version>/Documentation/cgroups-v1/ directory

25.11. Monitoring resource consumption

View a list of currently running control groups (cgroups) and their resource consumption in real-time.

Procedure

  1. Display a dynamic account of currently running cgroups with the systemd-cgtop command.

    # systemd-cgtop
    Control Group                            Tasks   %CPU   Memory  Input/s Output/s
    /                                          607   29.8     1.5G        -        -
    /system.slice                              125      -   428.7M        -        -
    /system.slice/ModemManager.service           3      -     8.6M        -        -
    /system.slice/NetworkManager.service         3      -    12.8M        -        -
    /system.slice/accounts-daemon.service        3      -     1.8M        -        -
    /system.slice/boot.mount                     -      -    48.0K        -        -
    /system.slice/chronyd.service                1      -     2.0M        -        -
    /system.slice/cockpit.socket                 -      -     1.3M        -        -
    /system.slice/colord.service                 3      -     3.5M        -        -
    /system.slice/crond.service                  1      -     1.8M        -        -
    /system.slice/cups.service                   1      -     3.1M        -        -
    /system.slice/dev-hugepages.mount            -      -   244.0K        -        -
    /system.slice/dev-mapper-rhel\x2dswap.swap   -      -   912.0K        -        -
    /system.slice/dev-mqueue.mount               -      -    48.0K        -        -
    /system.slice/example.service                2      -     2.0M        -        -
    /system.slice/firewalld.service              2      -    28.8M        -        -
    ...

    The example output displays currently running cgroups ordered by their resource usage (CPU, memory, disk I/O load). The list refreshes every 1 second by default. Therefore, it offers a dynamic insight into the actual resource usage of each control group.

Additional resources

  • The systemd-cgtop(1) manual page