7.3. Configuring HugeTLB Huge Pages

Starting with Red Hat Enterprise Linux 7.1, there are two ways of reserving huge pages: at boot time and at run time. Reserving at boot time increases the possibility of success because the memory has not yet been significantly fragmented. However, on NUMA machines, the number of pages is automatically split among NUMA nodes. The run-time method allows you to reserve huge pages per NUMA node. If the run-time reservation is done as early as possible in the boot process, the probability of memory fragmentation is lower.

7.3.1. Configuring Huge Pages at Boot Time

To configure huge pages at boot time, add the following parameters to the kernel boot command line:
hugepages
Defines the number of persistent huge pages configured in the kernel at boot time. The default value is 0. It is only possible to allocate huge pages if there are sufficient physically contiguous free pages in the system. Pages reserved by this parameter cannot be used for other purposes.
This value can be adjusted after boot by changing the value of the /proc/sys/vm/nr_hugepages file.
In a NUMA system, huge pages assigned with this parameter are divided equally between nodes. You can assign huge pages to specific nodes at runtime by changing the value of the node's /sys/devices/system/node/node_id/hugepages/hugepages-1048576kB/nr_hugepages file.
For more information, read the relevant kernel documentation, which is installed in /usr/share/doc/kernel-doc-kernel_version/Documentation/vm/hugetlbpage.txt by default.
hugepagesz
Defines the size of persistent huge pages configured in the kernel at boot time. Valid values are 2 MB and 1 GB. The default value is 2 MB.
default_hugepagesz
Defines the default size of persistent huge pages configured in the kernel at boot time. Valid values are 2 MB and 1 GB. The default value is 2 MB.

Procedure 7.1. Reserving 1 GB Pages During Early Boot

The page size the HugeTLB subsystem supports depends on the architecture. On the AMD64 and Intel 64 architecture, 2 MB huge pages and 1 GB gigantic pages are supported.
  1. Create a HugeTLB pool for 1 GB pages by appending the following line to the kernel command-line options:
    default_hugepagesz=1G hugepagesz=1G
    
  2. Create a file named /usr/lib/systemd/system/hugetlb-gigantic-pages.service with the following content:
    [Unit]
    Description=HugeTLB Gigantic Pages Reservation
    DefaultDependencies=no
    Before=dev-hugepages.mount
    ConditionPathExists=/sys/devices/system/node
    ConditionKernelCommandLine=hugepagesz=1G
    
    [Service]
    Type=oneshot
    RemainAfterExit=yes
    ExecStart=/usr/lib/systemd/hugetlb-reserve-pages.sh
    
    [Install]
    WantedBy=sysinit.target
    
  3. Create a file named /usr/lib/systemd/hugetlb-reserve-pages.sh with the following content:
    #!/bin/sh
    
    nodes_path=/sys/devices/system/node/
    if [ ! -d $nodes_path ]; then
    	echo "ERROR: $nodes_path does not exist"
    	exit 1
    fi
    
    reserve_pages()
    {
    	echo $1 > $nodes_path/$2/hugepages/hugepages-1048576kB/nr_hugepages
    }
    
    reserve_pages number_of_pages node
    On the last line, replace number_of_pages with the number of 1GB pages to reserve and node with the name of the node on which to reserve these pages.

    Example 7.1. Reserving Pages on node0 and node1

    For example, to reserve two 1GB pages on node0 and one 1GB page on node1, replace the last line with the following code:
    reserve_pages 2 node0
    reserve_pages 1 node1
    
    You can modify it to your needs or add more lines to reserve memory in other nodes.
  4. Make the script executable:
    # chmod +x /usr/lib/systemd/hugetlb-reserve-pages.sh
    
  5. Enable early boot reservation:
    # systemctl enable hugetlb-gigantic-pages
    

Note

You can try reserving more 1GB pages at runtime by writing to nr_hugepages at any time. However, such reservations can fail due to memory fragmentation. The most reliable way to reserve 1GB pages is by using this script, which runs at early boot.

7.3.2. Configuring Huge Pages at Run Time

Use the following parameters to influence huge page behavior at run time:
/sys/devices/system/node/node_id/hugepages/hugepages-size/nr_hugepages
Defines the number of huge pages of the specified size assigned to the specified NUMA node. This is supported as of Red Hat Enterprise Linux 7.1. The following example moves adds twenty 2048 kB huge pages to node2.
# numastat -cm | egrep 'Node|Huge'
                 Node 0 Node 1 Node 2 Node 3  Total add 
AnonHugePages         0      2      0      8     10
HugePages_Total       0      0      0      0      0
HugePages_Free        0      0      0      0      0
HugePages_Surp        0      0      0      0      0
# echo 20 > /sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages 
# numastat -cm | egrep 'Node|Huge'
                 Node 0 Node 1 Node 2 Node 3  Total
AnonHugePages         0      2      0      8     10 
HugePages_Total       0      0     40      0     40
HugePages_Free        0      0     40      0     40
HugePages_Surp        0      0      0      0      0
/proc/sys/vm/nr_overcommit_hugepages
Defines the maximum number of additional huge pages that can be created and used by the system through overcommitting memory. Writing any non-zero value into this file indicates that the system obtains that number of huge pages from the kernel's normal page pool if the persistent huge page pool is exhausted. As these surplus huge pages become unused, they are then freed and returned to the kernel's normal page pool.