Crio and kubelet services are unable to start in RHOCP 4
Environment
Red Hat OpenShift Container Platform (RHOCP) 4
Issue
In OCP 4 nodes, the Crio and kubelet services remain in a dead status when attempting to start.
# systemctl status crio
○ crio.service - Container Runtime Interface for OCI (CRI-O)
Loaded: loaded (/usr/lib/systemd/system/crio.service; disabled; preset: disabled)
Drop-In: /etc/systemd/system/crio.service.d
└─01-kubens.conf, 05-mco-ordering.conf, 10-mco-default-env.conf, 10-mco-default-madv.conf, 10-mco-profile-unix-socket.conf, 20-nodenet.conf
Active: inactive (dead)
Docs: https://github.com/cri-o/cri-o
# systemctl status kubelet
○ kubelet.service - Kubernetes Kubelet
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; preset: disabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─01-kubens.conf, 10-mco-default-env.conf, 10-mco-default-madv.conf, 20-logging.conf, 20-nodenet.conf
Active: inactive (dead)
Resolution
Once the proxy server is brought back online, the services will be able to pull the required image and operate normally without any interruptions.
Root Cause
- The cluster uses a proxy for internet access.
- The proxy server was down, which prevented the image from being pulled.
Diagnostic Steps
- Verify if
systemctl list-jobsshows thenodeip-configuration.servicein a continuously running state on failed nodes:
# systemctl list-jobs
JOB UNIT TYPE STATE
314 nodeip-configuration.service start running
312 crio.service start waiting
306 kubelet.service start waiting
- Check the status of the
nodeip-configuration.serviceto identify the issue:
● nodeip-configuration.service - Writes IP address configuration so that kubelet and crio services select a valid node IP
Loaded: loaded (/etc/systemd/system/nodeip-configuration.service; enabled; preset: disabled)
Active: activating (start) since
...
CGroup: /system.slice/nodeip-configuration-vsphere-upi.service
├─1377 /bin/bash -c " until /usr/bin/podman run --rm --authfile /var/lib/kubelet/config.json --net=host --security-opt label=disable --volume /etc/systemd/system:/etc/systemd/system --volume /run/nodeip-configuration:/run/nodeip-configuration quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4328fc1a4408a07f8f0b1ea21d83b665b73772d4e4fe07dcc2bc41ffb3748662 node-ip set --retry-on-failure \${NODEIP_HINT:-\${KUBELET_NODEIP_HINT:-}}; do sleep 5; done"
└─7083 /usr/bin/podman run --rm --authfile /var/lib/kubelet/config.json --net=host --security-opt label=disable --volume /etc/systemd/system:/etc/systemd/system --volume /run/nodeip-configuration:/run/nodeip-configuration quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4328fc1a4408a07f8f0b1ea21d83b665b73772d4e4fe07dcc2bc41ffb3748662 node-ip set --retry-on-failure
node1.lab.example.com bash[3450]: time="xxx" level=warning msg="Failed, retrying in 1s ... (3/3). Error: initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4328fc1a4408a07f8f0b1ea21d83b665b73772d4e4fe07dcc2bc41ffb3748662: pinging container registry quay.io: Get \"https://quay.io/v2/\": proxyconnect tcp: dial tcp 10.xxx.xx.xxx:xxxx: connect: no route to host"
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments