Premature swapping while there is still plenty of pagecache to be reclaimed
Red Hat Lightspeed can detect this issue
Environment
- Red Hat Enterprise Linux 8
Issue
- Experiencing high Swap utilization while there is still free memory, or even available memory as reclaimable cache pages
- After upgrading from RHEL 7 to RHEL 8, noticed higher Swap utilization than before
- Even after lowering the value of
/proc/sys/vm/swappinessswap utilization is still high
Resolution
To address this issue, Red Hat Enterprise Linux Engineering created a new sysctl option: vm.force_cgroup_v2_swappiness.
When set to 1, all the cgroups’ memory.swappiness value becomes deprecated, and all per-cgroup swappiness values mirror the system-wide vm.swappiness sysctl value (ie the /proc/sys/vm/swappiness file).
After implementing this resolution,, the memory swapping behavior of cgroups is more consistent. This is the recommended solution while using cgroups v1.
The kernel may need to be updated to a version patched with the new sysctl:
- Red Hat Enterprise Linux 8.7 - update to
kernel-4.18.0-425.3.1.el8.x86_64or later, as per Errata: RHSA-2022:7683 - Red Hat Enterprise Linux 8.6.z(EUS) - update to
kernel-4.18.0-372.36.1.el8_6.x86_64or later, as per Errata: RHSA-2022:8809 - Red Hat Enterprise Linux 8.4.z(EUS) - update to
kernel-4.18.0-305.76.1.el8_4.x86_64or later, as per Errata: RHSA-2023:0496
Setting sysctl variables in a persistent manner is described in How to set sysctl variables on Red Hat Enterprise Linux.
Workarounds and setting adjustments
The following workarounds are also available in case the recommended solution cannot be used.
Workaround #1: Switch to cgroup v2
- Possible workaround to mitigate the issue is to switch to cgroup v2:
# grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=1"
(requires reboot)
- Note: Please refer to Migrating from CGroups V1 in Red Hat Enterprise Linux 7 and below to CGroups V2 in Red Hat Enterprise Linux 8 for more details of cgroup v2 migration.
Workaround #2: Adjust swappiness value of all existing cgroups
-
If it's not feasible to migrate to cgroups v2 for any reason, a solution another workaround would be to adjust all memory cgroups' swappiness to the desired value, preferably in a pre-order fashion with regards to filesystem structure.
- For example, the following command can be sufficient:
# for cgfile in $(find /sys/fs/cgroup -name *swappiness); do cat /proc/sys/vm/swappiness > $cgfile; doneNOTE:
findcommand unfortunately doesn't return the files in a pre-order fashion, hence it is prone to a possible race condition with cgroup creation. It is recommended to confirm that all existing cgroups have correctly setmemory.swappinessvalues afterwards.- Ideally a service can be crafted which would do that at boot time, specifically
After=systemd-sysctl.service.
Workaround #3: Push the desired global swappiness value into initramfs
- To use per-cgroup swappiness and to change the default value from
60, the following change can be done to specify the desired swappiness value in the/etc/sysctl.conffile:
vm.swappiness=##
-
After setting this value, the
initramfswill need to be refreshed and a reboot of the system will be required. This can be done with commanddracut -f. -
Note: This will change the default swappiness for the user.slice, init.scope, and machine.slice cgroups; however, this will have no effect on the system.slice cgroup, and may still lead to unexpected swap behavior.
Considerations for virtual guests
-
If
force_cgroup_v2_swappiness=1cannot be set and host-side swapping is occurring from memory pressure inside a guest, the swappiness value can be controlled for each guest or all guests. -
To have all virtual machine guests inherit the same swappiness value, the following command can be run before starting the virtual machines with
libvirtd:
# echo [value] > /sys/fs/cgroup/memory/machine.slice/memory.swappiness
- To change the value after booting the guest, the following command can be run while ensuring to specify the guest name in the appropriate location:
# echo [value] > /sys/fs/cgroup/memory/machine.slice/<GUEST_NAME>/memory.swappiness
- Note: It is recommended to set the swappiness value for every cgroup created on the system if the desired result is for the system to honor the specified swappiness value across all cgroups.
Root Cause
- In cases where there is high memory pressure and page reclamation is needed, users may experience swapping earlier or more aggressively than expected with regards to the swappiness value. This issue is due the fact that
systemdruns its processes within cgroups and the root swappiness value has little-to-no effect on swap heuristics. - In cgroups v1 there is the per-cgroup swappiness value
memory.swappiness. This value controls the swap behavior of the given cgroup. These cgroups, which are initialized at boot, get created before the sysctl service is able to run and properly set the desired swappiness value. This leads to a default swappiness value of60for the processes running on the system. - This is not an issue in cgroups v2 as there is no swappiness parameter available to the memory controller in cgroups v2, and as such, cgroups v2 will utilize the sysctl value.
Diagnostic Steps
- All the per-cgroup
memory.swappinessare set to 60 (default value) while the system-widevm.swappinessis set to a lower value.
$ find /sys/fs/cgroup/memory/ -name memory.swappiness -exec cat {} \; | sort | uniq -c
1 1 <-- sys/fs/cgroup/memory/memory.swappiness
117 60 <-- memory.swappiness under all the .slice/scope
$ sysctl -a | grep vm.swappiness
vm.swappiness = 1
- To list the Memory cgroups that are utilized by your system you can run:
# grep . /sys/fs/cgroup/memory/*/memory.swappiness
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments