Crio and kubelet services are unable to start in RHOCP 4

Solution Verified - Updated -

Environment

Red Hat OpenShift Container Platform (RHOCP) 4

Issue

In OCP 4 nodes, the Crio and kubelet services remain in a dead status when attempting to start.

# systemctl status crio
○ crio.service - Container Runtime Interface for OCI (CRI-O)
     Loaded: loaded (/usr/lib/systemd/system/crio.service; disabled; preset: disabled)
    Drop-In: /etc/systemd/system/crio.service.d
             └─01-kubens.conf, 05-mco-ordering.conf, 10-mco-default-env.conf, 10-mco-default-madv.conf, 10-mco-profile-unix-socket.conf, 20-nodenet.conf
     Active: inactive (dead)
       Docs: https://github.com/cri-o/cri-o

# systemctl status kubelet
○ kubelet.service - Kubernetes Kubelet
     Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; preset: disabled)
    Drop-In: /etc/systemd/system/kubelet.service.d
             └─01-kubens.conf, 10-mco-default-env.conf, 10-mco-default-madv.conf, 20-logging.conf, 20-nodenet.conf
     Active: inactive (dead)

Resolution

Once the proxy server is brought back online, the services will be able to pull the required image and operate normally without any interruptions.

Root Cause

  • The cluster uses a proxy for internet access.
  • The proxy server was down, which prevented the image from being pulled.

Diagnostic Steps

  • Verify if systemctl list-jobs shows the nodeip-configuration.service in a continuously running state on failed nodes:
# systemctl list-jobs
JOB UNIT                                     TYPE   STATE
314 nodeip-configuration.service             start  running
312 crio.service                             start  waiting
306 kubelet.service                          start  waiting
  • Check the status of the nodeip-configuration.service to identify the issue:
● nodeip-configuration.service - Writes IP address configuration so that kubelet and crio services select a valid node IP
     Loaded: loaded (/etc/systemd/system/nodeip-configuration.service; enabled; preset: disabled)
     Active: activating (start) since 
...
     CGroup: /system.slice/nodeip-configuration-vsphere-upi.service
             ├─1377 /bin/bash -c "    until    /usr/bin/podman run --rm    --authfile /var/lib/kubelet/config.json    --net=host    --security-opt label=disable    --volume /etc/systemd/system:/etc/systemd/system    --volume /run/nodeip-configuration:/run/nodeip-configuration    quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4328fc1a4408a07f8f0b1ea21d83b665b73772d4e4fe07dcc2bc41ffb3748662    node-ip    set    --retry-on-failure    \${NODEIP_HINT:-\${KUBELET_NODEIP_HINT:-}};    do    sleep 5;    done"
             └─7083 /usr/bin/podman run --rm --authfile /var/lib/kubelet/config.json --net=host --security-opt label=disable --volume /etc/systemd/system:/etc/systemd/system --volume /run/nodeip-configuration:/run/nodeip-configuration quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4328fc1a4408a07f8f0b1ea21d83b665b73772d4e4fe07dcc2bc41ffb3748662 node-ip set --retry-on-failure

node1.lab.example.com bash[3450]: time="xxx" level=warning msg="Failed, retrying in 1s ... (3/3). Error: initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4328fc1a4408a07f8f0b1ea21d83b665b73772d4e4fe07dcc2bc41ffb3748662: pinging container registry quay.io: Get \"https://quay.io/v2/\": proxyconnect tcp: dial tcp 10.xxx.xx.xxx:xxxx: connect: no route to host"

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments