Red Hat Training
A Red Hat training course is available for RHEL 8
Chapter 35. Configuring an operating system to optimize memory access
You can configure the operating system to optimize memory access across workloads with the tools that are included in RHEL
35.1. Tools for monitoring and diagnosing system memory issues
The following tools are available in Red Hat Enterprise Linux 8 for monitoring system performance and diagnosing performance problems related to system memory:
vmstattool, provided by the
procps-ngpackage, displays reports of a system’s processes, memory, paging, block I/O, traps, disks, and CPU activity. It provides an instantaneous report of the average of these events since the machine was last turned on, or since the previous report.
valgrindframework provides instrumentation to user-space binaries. Install this tool, using the
yum install valgrindcommand. It includes a number of tools, that you can use to profile and analyze program performance, such as:
memcheckoption is the default
valgrindtool. It detects and reports on a number of memory errors that can be difficult to detect and diagnose, such as:
- Memory access that should not occur
- Undefined or uninitialized value use
- Incorrectly freed heap memory
- Pointer overlap
Memcheck can only report these errors, it cannot prevent them from occurring. However,
memchecklogs an error message immediately before the error occurs.
cachegrindoption simulates application interaction with a system’s cache hierarchy and branch predictor. It gathers statistics for the duration of application’s execution and outputs a summary to the console.
massifoption measures the heap space used by a specified application. It measures both useful space and any additional space allocated for bookkeeping and alignment purposes.
35.2. Overview of a system’s memory
The Linux Kernel is designed to maximize the utilization of a system’s memory resources (RAM). Due to these design characteristics, and depending on the memory requirements of the workload, part of the system’s memory is in use within the kernel on behalf of the workload, while a small part of the memory is free. This free memory is reserved for special system allocations, and for other low or high priority system services.
The rest of the system’s memory is dedicated to the workload itself, and divided into the following two categories:
Pages added in this category represent parts of files in permanent storage. These pages, from the page cache, can be mapped or unmapped in an application’s address spaces. You can use applications to map files into their address space using the
mmapsystem calls, or to operate on files via the buffered I/O read or write system calls.
Buffered I/O system calls, as well as applications that map pages directly, can re-utilize unmapped pages. As a result, these pages are stored in the cache by the kernel, especially when the system is not running any memory intensive tasks, to avoid re-issuing costly I/O operations over the same set of pages.
- Pages in this category are in use by a dynamically allocated process, or are not related to files in permanent storage. This set of pages back up the in-memory control structures of each task, such as the application stack and heap areas.
Figure 35.1. Memory usage patterns
35.3. Virtual memory parameters
The virtual memory parameters are listed in the
The following are the available virtual memory parameters:
Is a percentage value. When this percentage of the total system memory is modified, the system begins writing the modifications to the disk with the
pdflushoperation. The default value is
A percentage value. When this percentage of total system memory is modified, the system begins writing the modifications to the disk in the background. The default value is
Defines the conditions that determine whether a large memory request is accepted or denied.The default value is
By default, the kernel performs checks if a virtual memory allocation request fits into the present amount of memory (total + swap) and rejects only large requests. Otherwise virtual memory allocations are granted, and this means they allow memory overcommitment.
When this parameter is set to
1, the kernel performs no memory overcommit handling. This increases the possibility of memory overload, but improves performance for memory-intensive tasks.
When this parameter is set to
2, the kernel denies requests for memory equal to or larger than the sum of the total available swap space and the percentage of physical RAM specified in the
overcommit_ratio. This reduces the risk of overcommitting memory, but is recommended only for systems with swap areas larger than their physical memory.
- When this parameter is set to
Specifies the percentage of physical RAM considered when
overcommit_memoryis set to
2. The default value is
Defines the maximum number of memory map areas that a process can use. The default value is
65530. Increase this value if your application needs more memory map areas.
Sets the size of the reserved free pages pool. It is also responsible for setting the
high_pagethresholds that govern the behavior of the Linux kernel’s page reclaim algorithms. It also specifies the minimum number of kilobytes to keep free across the system. This calculates a specific value for each low memory zone, each of which is assigned a number of reserved free pages in proportion to their size.
- Increasing the parameter value effectively reduces the application working set usable memory. Therefore, you might want to use it for only kernel-driven workloads, where driver buffers need to be allocated in atomic contexts.
Decreasing the parameter value might render the kernel unable to service system requests, if memory becomes heavily contended in the system.Warning
Extreme values can be detrimental to the system’s performance. Setting the
vm.min_free_kbytesto an extremely low value prevents the system from reclaiming memory effectively, which can result in system crashes and failure to service interrupts or other kernel services. However, setting
vm.min_free_kbytestoo high considerably increases system reclaim activity, causing allocation latency due to a false direct reclaim state. This might cause the system to enter an out-of-memory state immediately.
vm.min_free_kbytesparameter also sets a page reclaim watermark, called
min_pages. This watermark is used as a factor when determining the two other memory watermarks,
high_pages, that govern page reclaim algorithms.
In the event that a system runs out of memory, and the
panic_on_oomparameter is set to
oom_killerfunction kills processes, starting with the process that has the highest
oom_score, until the system recovers.
oom_adjparameter determines the
oom_scoreof a process. This parameter is set per process identifier. A value of
oom_killerfor that process. Other valid values range from
Processes created by an adjusted process inherit the
oom_score of that process.
The swappiness value, ranging from
200, controls the degree to which the system favors reclaiming memory from the anonymous memory pool, or the page cache memory pool.
- Higher values favor file-mapped driven workloads while swapping out the less actively accessed processes’ anonymous mapped memory of RAM. This is useful for file-servers or streaming applications that depend on data, from files in the storage, to reside on memory to reduce I/O latency for the service requests.
Low values favor anonymous-mapped driven workloads while reclaiming the page cache (file mapped memory). This setting is useful for applications that do not depend heavily on the file system information, and heavily utilize dynamically allocated and private memory, such as mathematical and number crunching applications, and few hardware virtualization supervisors like QEMU.
The default value of the
0aggressively avoids swapping anonymous memory out to a disk, this increases the risk of processes being killed by the
oom_killerfunction when under memory or I/O intensive workloads.
This control is used to deprecate the per-cgroup swappiness value available only in
cgroupsV1. Most of all system and user processes are run within a cgroup. Cgroup swappiness values default to 60. This can lead to effects where systems swappiness value has little effect on the swap behavior of their system. If a user does not care about the per-cgroup swappiness feature they can configure their system with
force_cgroup_v2_swappiness=1to have more consistent swappiness behavior across their whole system.
- Setting memory-related kernel parameters
35.4. File system parameters
The file system parameters are listed in the
/proc/sys/fs directory. The following are the available file system parameters:
Defines the maximum allowed number of events in all active asynchronous input/output contexts. The default value is
65536, and modifying this value does not pre-allocate or resize any kernel data structures.
Determines the maximum number of file handles for the entire system. The default value on Red Hat Enterprise Linux 8 is either
8192or one tenth of the free memory pages available at the time the kernel starts, whichever is higher.
Raising this value can resolve errors caused by a lack of available file handles.
35.5. Kernel parameters
The default values for the kernel parameters are located in the
/proc/sys/kernel/ directory. These are set default values provided by the kernel or values specified by a user via
The following are the available kernel parameters used to set up limits for the
shm* System V IPC (
sysvipc) system calls:
Defines the maximum allowed size in bytes of any single message in a message queue. This value must not exceed the size of the queue (
msgmnb). Use the
sysctl msgmaxcommand to determine the current
msgmaxvalue on your system.
Defines the maximum size in bytes of a single message queue. Use the
sysctl msgmnbcommand to determine the current
msgmnbvalue on your system.
Defines the maximum number of message queue identifiers, and therefore the maximum number of queues. Use the
sysctl msgmnicommand to determine the current
msgmnivalue on your system.
Defines the total amount of shared memory
pagesthat can be used on the system at one time. For example, a page is
4096bytes on the AMD64 and Intel 64 architecture. Use the
sysctl shmallcommand to determine the current
shmallvalue on your system.
Defines the maximum size in bytes of a single shared memory segment allowed by the kernel. Shared memory segments up to 1Gb are now supported in the kernel. Use the
sysctl shmmaxcommand to determine the current
shmmaxvalue on your system.
Defines the system-wide maximum number of shared memory segments. The default value is
4096on all systems.
35.6. Setting memory-related kernel parameters
Setting a parameter temporarily is useful for determining the effect the parameter has on a system. You can later set the parameter persistently when you are sure that the parameter value has the desired effect.
This procedure describes how to set a memory-related kernel parameter temporarily and persistently.
To temporarily set the memory-related kernel parameters, edit the respective files in the
/procfile system or the
For example, to temporarily set the
vm.overcommit_memoryparameter to 1:
# echo 1 > /proc/sys/vm/overcommit_memory # sysctl -w vm.overcommit_memory=1
To persistently set the memory-related kernel parameter, edit the
/etc/sysctl.conffile and reload the settings.
For example, to persistently set the
vm.overcommit_memoryparameter to 1:
Add the following content in the
sysctlsettings from the
# sysctl -p