Premature swapping while there is still plenty of pagecache to be reclaimed

Solution Verified - Updated -

Red Hat Insights can detect this issue

Proactively detect and remediate issues impacting your systems.
View matching systems and remediation

Environment

  • Red Hat Enterprise Linux 8
  • Red Hat Enterprise Linux 9 w/ cgroups v1 enabled

Issue

  • The system wide swappiness value specified at /proc/sys/vm/swappiness has little-to-no effect on the swap characteristics of a system with cgroups v1. This issue may lead to unexpected and inconsistent swap behavior.

Resolution

In the following, various settings for /proc, /sys and sysctl are discussed. These can be implemented by various means, i.e. via custom systemd services, or /etc/rc.local.

Red Hat Enterprise Linux 8

To address this issue, Red Hat Enterprise Linux Engineering created a new sysctl option: vm.force_cgroup_v2_swappiness. When set to 1, all the cgroup's memory.swappiness value becomes deprecated, and all per-cgroups swappiness values mirrors the system-wide vm.swappiness sysctl value (ie /proc/sys/vm/swappiness file). As a result, the memory swapping behavior of cgroups is more consistent. This is the recommended solution while using cgroups v1.

The kernel may need to be updated to a version patched with the new sysctl:

  • Red Hat Enterprise Linux 8.7 - update to kernel-4.18.0-425.3.1.el8.x86_64 or later, as per Errata: RHSA-2022:7683
  • Red Hat Enterprise Linux 8.6.z(EUS) - update to kernel-4.18.0-372.36.1.el8_6.x86_64 or later, as per Errata: RHSA-2022:8809
  • Red Hat Enterprise Linux 8.4.z(EUS) - update to kernel-4.18.0-305.76.1.el8_4.x86_64 or later, as per Errata: RHSA-2023:0496

Setting sysctl variables in a persistent manner is described in How to set sysctl variables on Red Hat Enterprise Linux.

Red Hat Enterprise Linux 9

Red Hat Enterprise Linux 9 uses cgroups v2 by default, which are not subject to per-cgroup swappiness value. If the user decides to switch to cgroups v1, it is recommended to check for the bug presence.

Workarounds and setting adjustments

The following workarounds are also available in case the recommended solution cannot be used. See also, Change in swap behavior between RHEL 7 and RHEL 8 kernels

Workaround #1: Switch to cgroup v2

Workaround #2: Adjust swappiness value of all existing cgroups

  • If it's not feasible to migrate to cgroups v2 for any reason, a solution would be to adjust all memory cgroups' swappiness to desired value, preferably in a pre-order fashion with regards to filesystem structure.
    For example, the following command can be sufficient:

    # for cgfile in $(find /sys/fs/cgroup -name *swappiness); do cat /proc/sys/vm/swappiness > $cgfile; done
    

    NOTE: find command unfortunately doesn't return the files in a pre-order fashion, hence it is prone to possible race condition with cgroup creation. It is recommended to confirm that all existing cgroups have correctly set memory.swappiness values afterwards.

    Ideally a service can be crafted which would do that at boot time, specifically After=systemd-sysctl.service.

Workaround #3: Push the desired global swappiness value into initramfs

  • To use per-cgroup swappiness and to change the default value from 60, the following change can be done to specify the desired swappiness value in the /etc/sysctl.conf file:

    vm.swappiness=##
    
  • After setting this value, the initramfs will need to be refreshed and a reboot of the system will be required. This can be done with command dracut -f.

  • Note: This will change the default swappiness for the user.slice, init.scope, and machine.slice cgroups; however, this will have no effect on the system.slice cgroup, and may still lead to unexpected swap behavior.

Considerations for virtual guests

  • If force_cgroup_v2_swappiness=1 cannot be set and host-side swapping is occurring from memory pressure inside a guest, the swappiness value can be controlled for each guest or all guests.

  • To have all virtual machine guests inherit the same swappiness value, the following command can be run before starting the virtual machines with libvirtd:

    # echo [value] > /sys/fs/cgroup/memory/machine.slice/memory.swappiness
    
  • To change the value after booting the guest, the following command can be run while ensuring to specify the guest name in the appropriate location:

    # echo [value] > /sys/fs/cgroup/memory/machine.slice/<GUEST_NAME>/memory.swappiness
    
  • Note: It is recommended to set the swappiness value for every cgroup created on the system if the desired result is for the system to honor the specified swapiness value across all cgroups.

Root Cause

  • In cases where there is high memory pressure and page reclamation is needed, users may experience swapping earlier or more aggressively than expected with regards to the swappiness value. This issue is due the fact that systemd runs its processes within cgroups and the root swappiness value has little-to-no effect on swap heuristics.
  • In cgroups v1 there is the per-cgroup swappiness value memory.swappiness. This value controls the swap behavior of the given cgroup. These cgroups, which are initialized at boot, get created before the sysctl service is able to run and properly set the desired swappiness value. This leads to a default swappiness value of 60 for the processes running on the system.
  • This is not an issue in cgroups v2 as there is no swappiness parameter available to the memory controller in cgroups v2, and as such, cgroups v2 will utilize the sysfs value.

Diagnostic Steps

  • All the per-cgroup memory.swappiness are set to 60(default value) while system-wide vm.swappiness is set to 1.

    $ find /sys/fs/cgroup/memory/ -name memory.swappiness -exec cat {} \;|uniq -c
          1 1  <-- sys/fs/cgroup/memory/memory.swappiness
        117 60  <-- memory.swappiness under all the .slice/scope
    
    $ sysctl -a | grep vm.swappiness
    vm.swappiness = 1
    
  • To list the Memory cgroups that are utilized by your system you can run:

    # grep . /sys/fs/cgroup/memory/*/memory.swappiness
    
  • If you have already addressed this through one of the resolutions and are still seeing unexpected swap behavior, please refer to Change in swap behavior between RHEL 7 and RHEL 8 kernels

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments