kubelet is failing on node when the kubeletconfig is applied in RHOCP 4
Environment
Red Hat OpenShift Container Platform (RHOCP) 4
Issue
kubelet is failing to start with the below error:
Aug 08 01:54:15 worker-1.abc.lab.com kubenswrapper[3293]: E0808 01:54:15.658783 3293 run.go:74] "command failed" err="failed to run Kubelet: --system-reserved value failed to parse: failed to parse quantity \"1%\" for \"memory\" resource: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'"
Resolution
- Modify the invalid value mentioned in the kubeletconfig using the below command:
$ oc edit kubeletconfig set-allocatable
...
spec:
kubeletConfig:
systemReserved:
cpu: 1000m
memory: 1Gi <-- Replace 1% with 1Gi
- Log in to the node that entered the NotReady state and execute the below steps:
- Manually change the values in the
/etc/node-sizing.envfile to ensure they are valid again:
$ ssh -i <private key> core@[node_name]
[core@node_name ~]$ sudo -i
[root@node_name ~]# vi /etc/node-sizing.env
SYSTEM_RESERVED_MEMORY=1Gi
SYSTEM_RESERVED_CPU=1000m
SYSTEM_RESERVED_ES=1Gi
- Restart the kubelet service:
# systemctl restart kubelet
- Force a rollout of the rendered MC using
/run/machine-config-daemon-force:
# touch /run/machine-config-daemon-force
- The node will transition to the Ready state, and the MCP will proceed with the rollout as expected.
Root Cause
An invalid entry in the kubelet configuration caused the MCP to become stuck in the updating state, resulting in the specific node entering the NotReady state and the kubelet is failing.
Diagnostic Steps
- Check if the following error is present in the kubelet logs:
Aug 08 01:54:15 worker-1.abc.lab.com kubenswrapper[3293]: E0808 01:54:15.658783 3293 run.go:74] "command failed" err="failed to run Kubelet: --system-reserved value failed to parse: failed to parse quantity \"1%\" for \"memory\" resource: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'"
- Check the values specified in the kubelet configuration:
$ oc get kubeletconfig set-allocatable -o yaml
...
spec:
kubeletConfig:
systemReserved:
cpu: 1000m
memory: 1%
- Verify if the MCP is stuck in the updating state:
worker rendered-worker-5ea2e169da4e5540c5c0997af037445d False True False 3 0 0 0 35d
- Identify which nodes have entered the NotReady state:
$ oc get nodes
...
worker-1.abc.lab.com NotReady,SchedulingDisabled worker 35d v1.27.6+f67aeb3
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments