Exec probes fail clusterwide after upgrade to cri-o-1.19.2-4 in Red Hat OpenShift Container Platform 4.x

Solution In Progress - Updated -

Issue

Exec probes fail clusterwide after upgrade to cri-o-1.19.2-4 in Red Hat OpenShift Container Platform 4.x

After upgrading the cluster, readiness and liveness probes cluster wide (for containers on the RHEL worker nodes) seemingly randomly fail a lot with timeouts.

Seemingly innocuous probes like this one here:

    name: service
    readinessProbe:
      exec:
        command:
        - cat
        - /etc/hosts
      failureThreshold: 3
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1

Time out with messages such as:

Jun 30 20:17:59 host hyperkube[1887]: I0630 20:17:59.805952    1887 prober.go:117] Liveness probe for "service(uuid):service" failed (failure): command timed out
Jun 30 20:17:59 host hyperkube[1887]: I0630 20:17:59.806074    1887 event.go:291] "Event occurred" object="namespace/service" kind="Pod" apiVersion="v1" type="Warning" reason="Unhealthy" message="Liveness probe failed: command timed out"

Environment

Red Hat OpenShift Container Platform 4.x
cri-o-1.19.2-4, cri-o-1.19.2-6

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content