Red Hat Training

A Red Hat training course is available for RHEL 8

Chapter 10. Kernel

10.1. Resource control

10.1.1. Control group v2 available as a Technology Preview in RHEL 8

Control group v2 mechanism is a unified hierarchy control group. Control group v2 organizes processes hierarchically and distributes system resources along the hierarchy in a controlled and configurable manner.

Unlike the previous version, control group v2 has only a single hierarchy. This single hierarchy enables the Linux kernel to:

  • Categorize processes based on the role of their owner.
  • Eliminate issues with conflicting policies of multiple hierarchies.

Control group v2 supports numerous controllers:

  • CPU controller regulates the distribution of CPU cycles. This controller implements:

    • Weight and absolute bandwidth limit models for normal scheduling policy.
    • Absolute bandwidth allocation model for real time scheduling policy.
  • Memory controller regulates the memory distribution. Currently, the following types of memory usages are tracked:

    • Userland memory - page cache and anonymous memory.
    • Kernel data structures such as dentries and inodes.
    • TCP socket buffers.
  • I/O controller regulates the distribution of I/O resources.
  • Remote Direct Memory Access (RDMA) controller limits RDMA/IB specific resources that certain processes can use. These processes are grouped through the RDMA controller.
  • Process number controller enables the control group to stop any new tasks from being fork()’d or clone()’d after a certain limit.
  • Writeback controller acts as a mechanism, which balances conflicts between I/O and the memory controllers.

The information above was based on cgroups-v2 online documentation. You can refer to the same link to obtain more information about particular control group v2 controllers.

10.2. Memory management

10.2.1. 52-bit PA for 64-bit ARM available

With this update, support for 52-bit physical addressing (PA) for the 64-bit ARM architecture is available. This provides a larger physical address space than previous 48-bit PA.

10.2.2. 5-level page tables x86_64

In RHEL 7, existing memory bus had 48/46 bit of virtual/physical memory addressing capacity, and the Linux kernel implemented 4 levels of page tables to manage these virtual addresses to physical addresses. The physical bus addressing line put the physical memory upper limit capacity at 64 TB.

These limits have been extended to 57/52 bit of virtual/physical memory addressing with 128 PiB of virtual address space (64PB user/64PB kernel) and 4 PB of physical memory capacity.

With the extended address range, the memory management in RHEL 8 adds support for 5-level page table implementation. This implementation is able to handle the expanded address range with up to 128 PiB of virtual address space and 4 PiB of physical address space.

The 5-level page table is enabled by default for hardware capable of supporting this feature even if the installed physical memory is less than 64 TiB. For systems with less than 64 TiB of memory, there is a small overhead increase in walking the 5-level page table. To avoid this overhead, users can disable 5-level page table by using the no5lvl kernel command-line parameter to force the use of 4-level page table.

10.3. Performance analysis and observability tools

10.3.1. bpftool added to kernel

The bpftool utility that serves for inspection and simple manipulation of programs and maps based on extended Berkeley Packet Filtering (eBPF) has been added into the Linux kernel. bpftool is a part of the kernel source tree, and is provided by the bpftool package, which is included as a sub-package of the kernel package.

10.3.2. eBPF available as a Technology Preview

The extended Berkeley Packet Filtering (eBPF) feature is available as a Technology Preview for both networking and tracing. eBPF enables the user space to attach custom programs onto a variety of points (sockets, trace points, packet reception) to receive and process data. The feature includes a new system call bpf(), which supports creating various types of maps, and also to insert various types of programs into the kernel. Note that the bpf() syscall can be successfully used only by a user with the CAP_SYS_ADMIN capability, such as a root user. See the bpf(2) man page for more information.

10.3.3. BCC is available as a Technology Preview

BPF Compiler Collection (BCC) is a user space tool kit for creating efficient kernel tracing and manipulation programs that is available as a Technology Preview in RHEL 8. BCC provides tools for I/O analysis, networking, and monitoring of Linux operating systems using the extended Berkeley Packet Filtering (eBPF).

10.4. Booting process

10.4.1. How to install and boot custom kernels in RHEL

The Boot Loader Specification (BLS) defines a scheme and file format to manage bootloader configurations for each boot option in a drop-in directory. There is no need to manipulate the individual drop-in configuration files. This premise is particularly relevant in RHEL 8 because not all architectures use the same bootloader:

  • x86_64, aarch64 and ppc64le with open firmware use GRUB2
  • ppc64le with Open Power Abstraction Layer (OPAL) uses Petitboot
  • s390x uses zipl

Each bootloader has a different configuration file and format that has to be modified when a new kernel is installed or removed. In the previous versions of RHEL, the component that permitted this work was the grubby utility. However, for RHEL 8 the bootloader configuration was standardized by implementing the BLS file format, where grubby works as a thin wrapper around the BLS operations.

10.4.2. Early kdump support in RHEL

Previously, the kdump service started too late to register the kernel crashes that occurred in early stages of the booting process. As a result, the crash information together with a chance for troubleshooting was lost.

To address this problem, RHEL 8 introduced an early kdump support. To learn more about this mechanism, see the /usr/share/doc/kexec-tools/early-kdump-howto.txt file. See also What is early kdump support and how do I configure it?.