Which amount of CPU and memory are recommended to reserve for the system in OpenShift 4 nodes?
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4
- Red Hat Enterprise Linux CoreOS (RHCOS)
- 4
Issue
- Recommended
systemReservedvalues for OpenShift 4 clusters.
Resolution
Disclaimer: Links contained herein to external website(s) are provided for convenience only. Red Hat has not reviewed the links and is not responsible for the content or its availability. The inclusion of any link to an external website does not imply endorsement by Red Hat of the website or their entities, products or services. You agree that Red Hat is not responsible or liable for any loss or expenses that may result due to your use of (or reliance on) the external site or content.
The systemReserved CPU and memory resources avoid that scheduled pods consume the whole resources of a node, causing the node to hang. The recommended values depends on specific node resources.
IMPORTANT NOTE: the recommended values for the
systemReservedcould be different by OpenShift version. Refer to thedynamic_memory_sizinganddynamic_cpu_sizingfunctions code for each specific release.
Recommended reserved Memory
At the time of writing, recommended values are:
-
For nodes with
1 GiBof memory or less:255 MiB.
-
For nodes with more than
1 GiBof memory:25%of the first4 GiBof memory.20%of the next4 GiBof memory (between4 GiBand8 GiB).10%of the next8 GiBof memory (between8 GiBand16 GiB).6%of the next112 GBof memory (up to128GB)2%of the remaining memory.
Recommended reserved CPU
Important note: the minimal value to configure for the reserved CPU needs to be
0.5(500m). For any value under0.5(500m) CPUs (when the node has 128 CPUs or less in 4.16 and older, or 64 CPUs or less in 4.17 and newer), configure0.5(500m).
-
The recommended values till OpenShift 4.16 are as follows:
6%of the first core.1%of the second core.0.5%of the next 2 cores.0.25%of any remaining core.
-
Starting with OpenShift 4.17, new recommendations are in place introduced by modify auto tuned system reserved cpu recommendation:
# Base allocation for 1 CPU in fractions of a core (60 millicores = 0.06 CPU core) base_allocation_fraction=0.06 # Increment per additional CPU in fractions of a core (12 millicores = 0.012 CPU core) increment_per_cpu_fraction=0.012 if ((total_cpu > 1)); then # Calculate the total system-reserved CPU in fractions, starting with the base allocation # and adding the incremental fraction for each additional CPU recommended_systemreserved_cpu=$(awk -v base="$base_allocation_fraction" -v increment="$increment_per_cpu_fraction" -v cpus="$total_cpu" 'BEGIN {printf "%.2f\n", base + increment * (cpus - 1)}') else # For a single CPU, use the base allocation recommended_systemreserved_cpu=$base_allocation_fraction fiIt is also possible to use a script to obtain the memory and CPU values. Refer to the "Diagnostic Steps" section.
Configuring the system resource reservation
Default system reservation values are explained in what is the default setting value of reserved and eviction in OpenShift, but those defaults are not the recommended values for all node sizes.
Starting with OpenShift 4.8, the reservation can be configured to be calculated automatically based on the resources in each node. See automatically allocating resources for nodes for more information. This feature is disabled by default.
Note: the automatic reservation reserves less CPU than the recommended values when the number of CPUs is lower than 64 cores, and bug OCPBUGS-7747 for tracking that has been reported. Versions 4.19.3, 4.18.20, 4.17.36 and 4.16.44 (and newer releases) fixes the automatic calculation.
It is also possible to manually allocate resources for nodes.
Note: The values recommended above are the ones applied when the calculation is automatic as per the source code. If they are not suitable for specific cluster for any reason, please contact Red Hat Support for advice.
Example of the old recomendation
Let's say that there is a node with the following hardware:
- CPU: 16 cores.
- Memory: 32 GiB.
Values recommended for the above example for OpenShift 4.16 and older:
-
Recommended reserved CPU =
60m+10m+ (2 *5m) + (12 *2.5m) =110m.- Consequent allocatable CPU =
16000m-110m=15890m.
- Consequent allocatable CPU =
-
Recommended reserved memory = 25% of
4 GiB+ 20% of4 GiB+ 10% of8 GiB+ 6% of16 GiB=3.56 GiB.- Consequent allocatable memory =
32 GiB-3.56 GiB=28.44 GiB.
- Consequent allocatable memory =
Note: the
mstands for milicore.1000m=1 core.
Root Cause
The systemReserved CPU and memory resources avoid that scheduled pods consume the whole resources of a node, at the expense of the system critical processes like cri-o, kubelet, sshd, NetworkManager. The entire list of processes using the systemReserved resources can be shown as explained in the "Diagnostic Steps" section.
Diagnostic Steps
-
Check the
KubeletConfigconfiguration forsystemReservedorautoSizingReserved:$ oc get kubeletconfigs [...]$ oc get kubeletconfig [kubeletconfig_name] -o yaml [...] spec: [...] kubeletConfig: systemReserved:$ oc get kubeletconfig [kubeletconfig_name] -o yaml [...] spec: autoSizingReserved: true -
Check the entire list of processes included in the resource reservation:
$ oc debug node/[node_name] [...] sh-4.4# chroot /host bash # systemd-cgls /system.slice/ -
If the automatic allocation of resources is enabled, it is possible to check the values in use checking the
/etc/node-sizing.envfile within the nodes:$ oc debug node/[node_name] [...] sh-4.4# chroot /host bash # cat /etc/node-sizing.envIf the
/etc/node-sizing.envis not changed after configuring theautoSizingReserved, ensure it was configured correctly in theKubeletConfigresource (directly under thespec), and that the new generated Machine Config has the variableNODE_SIZING_ENABLED=true:$ oc get mc [new_rendered_mc] -o yaml | grep -B5 "path: /etc/node-sizing-enabled.env" | grep source | tail -n 1 | cut -d, -f2 | base64 -d | grep NODE_SIZING_ENABLED NODE_SIZING_ENABLED=trueIf the
autoSizingReservedis not correctly configured, the above field in the Machine Config will not be encrypted inbase64and will have a value similar to:NODE_SIZING_ENABLED%3Dfalse%0ASYSTEM_RESERVED_MEMORY%3D1Gi%0ASYSTEM_RESERVED_CPU%3D500m%0ASYSTEM_RESERVED_ES%3D1Gi -
If the automatic allocation of resources is NOT enabled, it is possible to check the values it will generate for specific node using the script already included in the
MachineConfigsstarting with00in current OpenShift releases the following way:$ oc debug node/[node_name] [...] sh-4.4# chroot /host bash # NODE_SIZES_ENV=/tmp/node-sizing.txt /usr/local/sbin/dynamic-system-reserved-calc.sh true # cat /tmp/node-sizing.txt
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments