Premature swapping while there is still plenty of pagecache to be reclaimed

Solution Verified - Updated -

Red Hat Lightspeed can detect this issue

Proactively detect and remediate issues impacting your systems.
View matching systems and remediation

Environment

  • Red Hat Enterprise Linux 8

Issue

  • Experiencing high Swap utilization while there is still free memory, or even available memory as reclaimable cache pages
  • After upgrading from RHEL 7 to RHEL 8, noticed higher Swap utilization than before
  • Even after lowering the value of /proc/sys/vm/swappiness swap utilization is still high

Resolution

To address this issue, Red Hat Enterprise Linux Engineering created a new sysctl option: vm.force_cgroup_v2_swappiness.
When set to 1, all the cgroups’ memory.swappiness value becomes deprecated, and all per-cgroup swappiness values mirror the system-wide vm.swappiness sysctl value (ie the /proc/sys/vm/swappiness file).
After implementing this resolution,, the memory swapping behavior of cgroups is more consistent. This is the recommended solution while using cgroups v1.

The kernel may need to be updated to a version patched with the new sysctl:

  • Red Hat Enterprise Linux 8.7 - update to kernel-4.18.0-425.3.1.el8.x86_64 or later, as per Errata: RHSA-2022:7683
  • Red Hat Enterprise Linux 8.6.z(EUS) - update to kernel-4.18.0-372.36.1.el8_6.x86_64 or later, as per Errata: RHSA-2022:8809
  • Red Hat Enterprise Linux 8.4.z(EUS) - update to kernel-4.18.0-305.76.1.el8_4.x86_64 or later, as per Errata: RHSA-2023:0496

Setting sysctl variables in a persistent manner is described in How to set sysctl variables on Red Hat Enterprise Linux.

Workarounds and setting adjustments

The following workarounds are also available in case the recommended solution cannot be used.

Workaround #1: Switch to cgroup v2

  • Possible workaround to mitigate the issue is to switch to cgroup v2:
        # grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=1"
        (requires reboot)

Workaround #2: Adjust swappiness value of all existing cgroups

  • If it's not feasible to migrate to cgroups v2 for any reason, a solution another workaround would be to adjust all memory cgroups' swappiness to the desired value, preferably in a pre-order fashion with regards to filesystem structure.

    • For example, the following command can be sufficient:
        # for cgfile in $(find /sys/fs/cgroup -name *swappiness); do cat /proc/sys/vm/swappiness > $cgfile; done
    

    NOTE: find command unfortunately doesn't return the files in a pre-order fashion, hence it is prone to a possible race condition with cgroup creation. It is recommended to confirm that all existing cgroups have correctly set memory.swappiness values afterwards.

    • Ideally a service can be crafted which would do that at boot time, specifically After=systemd-sysctl.service.

Workaround #3: Push the desired global swappiness value into initramfs

  • To use per-cgroup swappiness and to change the default value from 60, the following change can be done to specify the desired swappiness value in the /etc/sysctl.conf file:
        vm.swappiness=##
  • After setting this value, the initramfs will need to be refreshed and a reboot of the system will be required. This can be done with command dracut -f.

  • Note: This will change the default swappiness for the user.slice, init.scope, and machine.slice cgroups; however, this will have no effect on the system.slice cgroup, and may still lead to unexpected swap behavior.

Considerations for virtual guests

  • If force_cgroup_v2_swappiness=1 cannot be set and host-side swapping is occurring from memory pressure inside a guest, the swappiness value can be controlled for each guest or all guests.

  • To have all virtual machine guests inherit the same swappiness value, the following command can be run before starting the virtual machines with libvirtd:

        # echo [value] > /sys/fs/cgroup/memory/machine.slice/memory.swappiness
  • To change the value after booting the guest, the following command can be run while ensuring to specify the guest name in the appropriate location:
        # echo [value] > /sys/fs/cgroup/memory/machine.slice/<GUEST_NAME>/memory.swappiness
  • Note: It is recommended to set the swappiness value for every cgroup created on the system if the desired result is for the system to honor the specified swappiness value across all cgroups.

Root Cause

  • In cases where there is high memory pressure and page reclamation is needed, users may experience swapping earlier or more aggressively than expected with regards to the swappiness value. This issue is due the fact that systemd runs its processes within cgroups and the root swappiness value has little-to-no effect on swap heuristics.
  • In cgroups v1 there is the per-cgroup swappiness value memory.swappiness. This value controls the swap behavior of the given cgroup. These cgroups, which are initialized at boot, get created before the sysctl service is able to run and properly set the desired swappiness value. This leads to a default swappiness value of 60 for the processes running on the system.
  • This is not an issue in cgroups v2 as there is no swappiness parameter available to the memory controller in cgroups v2, and as such, cgroups v2 will utilize the sysctl value.

Diagnostic Steps

  • All the per-cgroup memory.swappiness are set to 60 (default value) while the system-wide vm.swappiness is set to a lower value.
         $ find /sys/fs/cgroup/memory/ -name memory.swappiness -exec cat {} \; | sort | uniq -c
            1 1  <-- sys/fs/cgroup/memory/memory.swappiness
            117 60  <-- memory.swappiness under all the .slice/scope

        $ sysctl -a | grep vm.swappiness
        vm.swappiness = 1
  • To list the Memory cgroups that are utilized by your system you can run:
        # grep . /sys/fs/cgroup/memory/*/memory.swappiness

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments