Exec probes fail clusterwide after upgrade to cri-o-1.19.2-4 in Red Hat OpenShift Container Platform 4.x
Issue
Exec probes fail clusterwide after upgrade to cri-o-1.19.2-4 in Red Hat OpenShift Container Platform 4.x
After upgrading the cluster, readiness and liveness probes cluster wide (for containers on the RHEL worker nodes) seemingly randomly fail a lot with timeouts.
Seemingly innocuous probes like this one here:
name: service
readinessProbe:
exec:
command:
- cat
- /etc/hosts
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
Time out with messages such as:
Jun 30 20:17:59 host hyperkube[1887]: I0630 20:17:59.805952 1887 prober.go:117] Liveness probe for "service(uuid):service" failed (failure): command timed out
Jun 30 20:17:59 host hyperkube[1887]: I0630 20:17:59.806074 1887 event.go:291] "Event occurred" object="namespace/service" kind="Pod" apiVersion="v1" type="Warning" reason="Unhealthy" message="Liveness probe failed: command timed out"
Environment
Red Hat OpenShift Container Platform 4.x
cri-o-1.19.2-4, cri-o-1.19.2-6
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.