Which amount of CPU and memory are recommended to reserve for the system in OpenShift 4 nodes?

Solution Verified - Updated -

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4
  • Red Hat Enterprise Linux CoreOS (RHCOS)
    • 4

Issue

  • Recommended systemReserved values for OpenShift 4 clusters.
  • Recommended kubeReserved values for OpenShift 4 clusters.

Resolution

Disclaimer: Links contained herein to external website(s) are provided for convenience only. Red Hat has not reviewed the links and is not responsible for the content or its availability. The inclusion of any link to an external website does not imply endorsement by Red Hat of the website or their entities, products or services. You agree that Red Hat is not responsible or liable for any loss or expenses that may result due to your use of (or reliance on) the external site or content.

The systemReserved parameter for CPU and memory resources avoid that scheduled pods consume the whole resources of a node, causing the node to hang. The recommended values depends on specific node resources.
The kubeReserved parameter is not supported in OpenShift 4 as explained in kube-reserved setting is not being set in OpenShift

IMPORTANT NOTES

Recommended reserved memory (for the default configuration)

At the time of writing, recommended values for the reserved memory based on the default configuration are:

  • For nodes with 1 GiB of memory or less:

    • 255 MiB.
  • For nodes with more than 1 GiB of memory:

    • 25% of the first 4 GiB of memory.
    • 20% of the next 4 GiB of memory (between 4 GiB and 8 GiB).
    • 10% of the next 8 GiB of memory (between 8 GiB and 16 GiB).
    • 6% of the next 112 GiB of memory (up to 128GiB).
    • 2% of the remaining memory.
  • Starting with OpenShift 4.21, the automatic reservation for the memory introduced some changes for lower memory nodes by kubelet: Less aggressive low memory reservation:

    • 1 GiB for the first 8 GiB
    • 6% of the next 120 GiB of memory (up to 128 GiB).
    • 2% of any memory above 128 GiB.

Recommended reserved CPU (for the default configuration)

IMPORTANT NOTE: the minimal value to configure for the reserved CPU needs to be 0.5 (500m). For any value under 0.5 (500m) CPUs (when the node has 128 CPUs or less in 4.16 and older, or 64 CPUs or less in 4.17 and newer), configure 0.5 (500m).

At the time of writing, recommended values for the reserved CPU based on the default configuration are:

  • The recommended values till OpenShift 4.16 are as follows:

    • 6% of the first core.
    • 1% of the second core.
    • 0.5% of the next 2 cores.
    • 0.25% of any remaining core.
  • Starting with OpenShift 4.17, new recommendations for the CPU reservation are in place introduced by modify auto tuned system reserved cpu recommendation:

            # Base allocation for 1 CPU in fractions of a core (60 millicores = 0.06 CPU core)
            base_allocation_fraction=0.06
            # Increment per additional CPU in fractions of a core (12 millicores = 0.012 CPU core)
            increment_per_cpu_fraction=0.012
            if ((total_cpu > 1)); then
                # Calculate the total system-reserved CPU in fractions, starting with the base allocation
                # and adding the incremental fraction for each additional CPU
                recommended_systemreserved_cpu=$(awk -v base="$base_allocation_fraction" -v increment="$increment_per_cpu_fraction" -v cpus="$total_cpu" 'BEGIN {printf "%.2f\n", base + increment * (cpus - 1)}')
            else
                # For a single CPU, use the base allocation
                recommended_systemreserved_cpu=$base_allocation_fraction
            fi
    

    It is also possible to use a script to obtain the memory and CPU values. Refer to the "Diagnostic Steps" section.

Configuring the system resource reservation

The default system reservation values are explained in what is the default setting value of reserved and eviction in OpenShift, but those defaults are not the recommended values for all node sizes. To configure the system resource reservation:

  • Starting with OpenShift 4.21, the automatic reservation is configured by default for newly installed clusters as explained in allocating resources for nodes in an OpenShift Container Platform cluster. From that documentation:

    If you updated your cluster from a version earlier than 4.21, automatic allocation of system resources is disabled by default. To enable the feature, delete the 50-worker-auto-sizing-disabled machine config.

  • Starting with OpenShift 4.8, the reservation can be configured to be calculated automatically based on the resources in each node. See automatically allocating resources for nodes for more information. This feature is disabled by default.

    Note: there was a bug with the the automatic reservation already fixed in recent versions, which reserves less CPU than the recommended values when the number of CPUs is lower than 64 cores, which is explained in high system‑reserved cpu usage when autoSizingReserved: true is enabled. Use the manual calculation if upgrading to any version with the fix is not currently possible.

  • It is also possible to manually allocate resources for nodes (and required when the automatic reservation is not enough for specific cluster configurations).

IMPORTANT NOTE: the values recommended above are the ones applied when the calculation is automatic as per the source code. They could be not enough for specific cluster configurations, and testing higher values via the manual allocation will be required.

Root Cause

The systemReserved CPU and memory resources avoid that scheduled pods consume the whole resources of a node, at the expense of the system critical processes like cri-o, kubelet, sshd, NetworkManager. The entire list of processes using the systemReserved resources can be shown as explained in the "Diagnostic Steps" section.

Diagnostic Steps

  • Check the KubeletConfig configuration for systemReserved or autoSizingReserved:

    $ oc get kubeletconfigs
    [...]
    
    $ oc get kubeletconfig [kubeletconfig_name] -o yaml
    [...]
    spec:
      [...]
      kubeletConfig:
        systemReserved:
    
    $ oc get kubeletconfig [kubeletconfig_name] -o yaml
    [...]
    spec:
      autoSizingReserved: true 
    
  • Check the entire list of processes included in the resource reservation:

    $ oc debug node/[node_name]
    [...]
    sh-4.4# chroot /host bash
    # systemd-cgls /system.slice/
    
  • Check the reserved values configured in the nodes by checking the /etc/node-sizing.env file within the nodes:

    $ for node in $(oc get node -o name); do echo "--- ${node} ---"; oc debug -q ${node} -- cat /host/etc/node-sizing.env; done
    

    If the /etc/node-sizing.env is not changed after configuring the autoSizingReserved, ensure it was configured correctly in the KubeletConfig resource (directly under the spec), and that the new generated Machine Config has the variable NODE_SIZING_ENABLED=true:

    $ oc get mc [new_rendered_mc] -o yaml | grep -B5 "path: /etc/node-sizing-enabled.env" | grep source | tail -n 1 | cut -d, -f2 | base64 -d | grep NODE_SIZING_ENABLED
    NODE_SIZING_ENABLED=true
    

    If the autoSizingReserved is not correctly configured, the above field in the Machine Config will not be encrypted in base64 and will have a value similar to:

    NODE_SIZING_ENABLED%3Dfalse%0ASYSTEM_RESERVED_MEMORY%3D1Gi%0ASYSTEM_RESERVED_CPU%3D500m%0ASYSTEM_RESERVED_ES%3D1Gi
    
  • If the automatic allocation of resources is NOT enabled, it is possible to check the values it will generate for specific node using the script already included in the MachineConfigs starting with 00 in current OpenShift releases the following way:

    $ oc debug node/[node_name]
    [...]
    sh-4.4# chroot /host bash
    # NODE_SIZES_ENV=/tmp/node-sizing.txt /usr/local/sbin/dynamic-system-reserved-calc.sh true
    # cat /tmp/node-sizing.txt
    

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments