kubelet is failing on node when the kubeletconfig is applied in RHOCP 4

Solution Verified - Updated -

Environment

Red Hat OpenShift Container Platform (RHOCP) 4

Issue

kubelet is failing to start with the below error:

Aug 08 01:54:15 worker-1.abc.lab.com  kubenswrapper[3293]: E0808 01:54:15.658783    3293 run.go:74] "command failed" err="failed to run Kubelet: --system-reserved value failed to parse: failed to parse quantity \"1%\" for \"memory\" resource: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'"

Resolution

  • Modify the invalid value mentioned in the kubeletconfig using the below command:
$ oc edit kubeletconfig set-allocatable
...
spec:
  kubeletConfig:
    systemReserved:
      cpu: 1000m
      memory: 1Gi  <-- Replace 1% with 1Gi
  • Log in to the node that entered the NotReady state and execute the below steps:
  1. Manually change the values in the /etc/node-sizing.env file to ensure they are valid again:
$ ssh -i <private key> core@[node_name]
[core@node_name ~]$ sudo -i
[root@node_name ~]# vi /etc/node-sizing.env
SYSTEM_RESERVED_MEMORY=1Gi
SYSTEM_RESERVED_CPU=1000m
SYSTEM_RESERVED_ES=1Gi                       
  1. Restart the kubelet service:
# systemctl restart kubelet
  1. Force a rollout of the rendered MC using /run/machine-config-daemon-force:
# touch /run/machine-config-daemon-force
  • The node will transition to the Ready state, and the MCP will proceed with the rollout as expected.

Root Cause

An invalid entry in the kubelet configuration caused the MCP to become stuck in the updating state, resulting in the specific node entering the NotReady state and the kubelet is failing.

Diagnostic Steps

  • Check if the following error is present in the kubelet logs:
Aug 08 01:54:15 worker-1.abc.lab.com  kubenswrapper[3293]: E0808 01:54:15.658783    3293 run.go:74] "command failed" err="failed to run Kubelet: --system-reserved value failed to parse: failed to parse quantity \"1%\" for \"memory\" resource: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'"
  • Check the values specified in the kubelet configuration:
$ oc get kubeletconfig set-allocatable -o yaml
...
spec:
  kubeletConfig:
    systemReserved:
      cpu: 1000m
      memory: 1%
  • Verify if the MCP is stuck in the updating state:
worker   rendered-worker-5ea2e169da4e5540c5c0997af037445d   False     True       False      3              0                   0                     0                      35d
  • Identify which nodes have entered the NotReady state:
$ oc get nodes
...
worker-1.abc.lab.com   NotReady,SchedulingDisabled   worker                 35d   v1.27.6+f67aeb3

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments