Red Hat Training

A Red Hat training course is available for RHEL 8

Chapter 36. Configuring huge pages

Physical memory is managed in fixed-size chunks called pages. On the x86_64 architecture, supported by Red Hat Enterprise Linux 8, the default size of a memory page is 4 KB. This default page size has proved to be suitable for general-purpose operating systems, such as Red Hat Enterprise Linux, which supports many different kinds of workloads.

However, specific applications can benefit from using larger page sizes in certain cases. For example, an application that works with a large and relatively fixed data set of hundreds of megabytes or even dozens of gigabytes can have performance issues when using 4 KB pages. Such data sets can require a huge amount of 4 KB pages, which can lead to overhead in the operating system and the CPU.

This section provides information about huge pages available in RHEL 8 and how you can configure them.

36.1. Available huge page features

With Red Hat Enterprise Linux 8, you can use huge pages for applications that work with big data sets, and improve the performance of such applications.

The following are the huge page methods, which are supported in RHEL 8:

HugeTLB pages

HugeTLB pages are also called static huge pages. There are two ways of reserving HugeTLB pages:

  • At boot time: It increases the possibility of success because the memory has not yet been significantly fragmented. However, on NUMA machines, the number of pages is automatically split among the NUMA nodes.

For more information about parameters that influence HugeTLB page behavior at boot time, see Parameters for reserving HugeTLB pages at boot time and how to use these parameters to configure HugeTLB pages at boot time, see Configuring HugeTLB at boot time.

  • At run time: It allows you to reserve the huge pages per NUMA node. If the run-time reservation is done as early as possible in the boot process, the probability of memory fragmentation is lower.

For more information about parameters that influence HugeTLB page behavior at run time, see Parameters for reserving HugeTLB pages at run time and how to use these parameters to configure HugeTLB pages at run time, see Configuring HugeTLB at run time.

Transparent HugePages (THP)

With THP, the kernel automatically assigns huge pages to processes, and therefore there is no need to manually reserve the static huge pages. The following are the two modes of operation in THP:

  • system-wide: Here, the kernel tries to assign huge pages to a process whenever it is possible to allocate the huge pages and the process is using a large contiguous virtual memory area.
  • per-process: Here, the kernel only assigns huge pages to the memory areas of individual processes which you can specify using the madvise() system call.

    Note

    The THP feature only supports 2 MB pages.

For more information about parameters that influence HugeTLB page behavior at boot time, see Enabling transparent hugepages and Disabling transparent hugepages.

36.2. Parameters for reserving HugeTLB pages at boot time

Use the following parameters to influence HugeTLB page behavior at boot time.

For more infomration on how to use these parameters to configure HugeTLB pages at boot time, see Configuring HugeTLB at boot time.

Table 36.1. Parameters used to configure HugeTLB pages at boot time

ParameterDescriptionDefault value

hugepages

Defines the number of persistent huge pages configured in the kernel at boot time.

In a NUMA system, huge pages, that have this parameter defined, are divided equally between nodes.

You can assign huge pages to specific nodes at runtime by changing the value of the nodes in the /sys/devices/system/node/node_id/hugepages/hugepages-size/nr_hugepages file.

The default value is 0.

To update this value at boot, change the value of this parameter in the /proc/sys/vm/nr_hugepages file.

hugepagesz

Defines the size of persistent huge pages configured in the kernel at boot time.

Valid values are 2 MB and 1 GB. The default value is 2 MB.

default_hugepagesz

Defines the default size of persistent huge pages configured in the kernel at boot time.

Valid values are 2 MB and 1 GB. The default value is 2 MB.

36.3. Configuring HugeTLB at boot time

The page size, which the HugeTLB subsystem supports, depends on the architecture. The x86_64 architecture supports 2 MB huge pages and 1 GB gigantic pages.

This procedure describes how to reserve a 1 GB page at boot time.

Procedure

  1. To create a HugeTLB pool for 1 GB pages, enable the default_hugepagesz=1G and hugepagesz=1G kernel options:

    # grubby --update-kernel=ALL --args="default_hugepagesz=1G hugepagesz=1G"
  2. Create a new file called hugetlb-gigantic-pages.service in the /usr/lib/systemd/system/ directory and add the following content:

    [Unit]
    Description=HugeTLB Gigantic Pages Reservation
    DefaultDependencies=no
    Before=dev-hugepages.mount
    ConditionPathExists=/sys/devices/system/node
    ConditionKernelCommandLine=hugepagesz=1G
    
    [Service]
    Type=oneshot
    RemainAfterExit=yes
    ExecStart=/usr/lib/systemd/hugetlb-reserve-pages.sh
    
    [Install]
    WantedBy=sysinit.target
  3. Create a new file called hugetlb-reserve-pages.sh in the /usr/lib/systemd/ directory and add the following content:

    While adding the following content, replace number_of_pages with the number of 1GB pages you want to reserve, and node with the name of the node on which to reserve these pages.

    #!/bin/sh
    
    nodes_path=/sys/devices/system/node/
    if [ ! -d $nodes_path ]; then
        echo "ERROR: $nodes_path does not exist"
        exit 1
    fi
    
    reserve_pages()
    {
        echo $1 > $nodes_path/$2/hugepages/hugepages-1048576kB/nr_hugepages
    }
    
    reserve_pages number_of_pages node

    For example, to reserve two 1 GB pages on node0 and one 1GB page on node1, replace the number_of_pages with 2 for node0 and 1 for node1:

    reserve_pages 2 node0
    reserve_pages 1 node1
  4. Create an executable script:

    # chmod +x /usr/lib/systemd/hugetlb-reserve-pages.sh
  5. Enable early boot reservation:

    # systemctl enable hugetlb-gigantic-pages
Note
  • You can try reserving more 1 GB pages at runtime by writing to nr_hugepages at any time. However, to prevent failures due to memory fragmentation, reserve 1 GB pages early during the boot process.
  • Reserving static huge pages can effectively reduce the amount of memory available to the system, and prevents it from properly utilizing its full memory capacity. Although a properly sized pool of reserved huge pages can be beneficial to applications that utilize it, an oversized or unused pool of reserved huge pages will eventually be detrimental to overall system performance. When setting a reserved huge page pool, ensure that the system can properly utilize its full memory capacity.

Additional resources

  • systemd.service(5) man page
  • /usr/share/doc/kernel-doc-kernel_version/Documentation/vm/hugetlbpage.txt file

36.4. Parameters for reserving HugeTLB pages at run time

Use the following parameters to influence HugeTLB page behavior at run time.

For more information about how to use these parameters to configure HugeTLB pages at run time, see Configuring HugeTLB at run time.

Table 36.2. Parameters used to configure HugeTLB pages at run time

ParameterDescriptionFile name

nr_hugepages

Defines the number of huge pages of a specified size assigned to a specified NUMA node.

/sys/devices/system/node/node_id/hugepages/hugepages-size/nr_hugepages

nr_overcommit_hugepages

Defines the maximum number of additional huge pages that can be created and used by the system through overcommitting memory.

Writing any non-zero value into this file indicates that the system obtains that number of huge pages from the kernel’s normal page pool if the persistent huge page pool is exhausted. As these surplus huge pages become unused, they are then freed and returned to the kernel’s normal page pool.

/proc/sys/vm/nr_overcommit_hugepages

36.5. Configuring HugeTLB at run time

This procedure describes how to add 20 2048 kB huge pages to node2.

To reserve pages based on your requirements, replace:

  • 20 with the number of huge pages you wish to reserve,
  • 2048kB with the size of the huge pages,
  • node2 with the node on which you wish to reserve the pages.

Procedure

  1. Display the memory statistics:

    # numastat -cm | egrep 'Node|Huge'
                     Node 0 Node 1 Node 2 Node 3  Total add
    AnonHugePages         0      2      0      8     10
    HugePages_Total       0      0      0      0      0
    HugePages_Free        0      0      0      0      0
    HugePages_Surp        0      0      0      0      0
  2. Add the number of huge pages of a specified size to the node:

    # echo 20 > /sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages

Verification steps

  • Ensure that the number of huge pages are added:

    # numastat -cm | egrep 'Node|Huge'
                     Node 0 Node 1 Node 2 Node 3  Total
    AnonHugePages         0      2      0      8     10
    HugePages_Total       0      0     40      0     40
    HugePages_Free        0      0     40      0     40
    HugePages_Surp        0      0      0      0      0

Additional resources

  • numastat(8) man page

36.6. Enabling transparent hugepages

THP is enabled by default in Red Hat Enterprise Linux 8. However, you can enable or disable THP.

This procedure describes how to enable THP.

Procedure

  1. Check the current status of THP:

    # cat /sys/kernel/mm/transparent_hugepage/enabled
  2. Enable THP:

    # echo always > /sys/kernel/mm/transparent_hugepage/enabled
  3. To prevent applications from allocating more memory resources than necessary, disable the system-wide transparent huge pages and only enable them for the applications that explicitly request it through the madvise:

    # echo madvise > /sys/kernel/mm/transparent_hugepage/enabled
Note

Sometimes, providing low latency to short-lived allocations has higher priority than immediately achieving the best performance with long-lived allocations. In such cases, you can disable direct compaction while leaving THP enabled.

Direct compaction is a synchronous memory compaction during the huge page allocation. Disabling direct compaction provides no guarantee of saving memory, but can decrease the risk of higher latencies during frequent page faults. Note that if the workload benefits significantly from THP, the performance decreases. Disable direct compaction:

# echo madvise > /sys/kernel/mm/transparent_hugepage/defrag

Additional resources

36.7. Disabling transparent hugepages

THP is enabled by default in Red Hat Enterprise Linux 8. However, you can enable or disable THP.

This procedure describes how to disable THP.

Procedure

  1. Check the current status of THP:

    # cat /sys/kernel/mm/transparent_hugepage/enabled
  2. Disable THP:

    # echo never > /sys/kernel/mm/transparent_hugepage/enabled

36.8. Impact of page size on translation lookaside buffer size

Reading address mappings from the page table is time-consuming and resource-expensive, so CPUs are built with a cache for recently-used addresses, called the Translation Lookaside Buffer (TLB). However, the default TLB can only cache a certain number of address mappings.

If a requested address mapping is not in the TLB, called a TLB miss, the system still needs to read the page table to determine the physical to virtual address mapping. Because of the relationship between application memory requirements and the size of pages used to cache address mappings, applications with large memory requirements are more likely to suffer performance degradation from TLB misses than applications with minimal memory requirements. It is therefore important to avoid TLB misses wherever possible.

Both HugeTLB and Transparent Huge Page features allow applications to use pages larger than 4 KB. This allows addresses stored in the TLB to reference more memory, which reduces TLB misses and improves application performance.