CrashLoopBackOff of certified-operators & community-operators pods after cluster upgrade

Solution Verified - Updated 2024-06-14T00:45:45+00:00 -

Issue

After cluster upgrade to 4.4.x, community and certified operators pods are continuosly crashing with Liveness and/or Readiness probes failures:

NAME                                    READY   STATUS             RESTARTS   AGE
certified-operators-5bcb56768c-64fxs    0/1     CrashLoopBackOff   27         118m
certified-operators-cddd74b58-k86fv     0/1     Running            6          14m
community-operators-698654bb96-zd4s6    0/1     CrashLoopBackOff   13         51m
community-operators-786f694c8d-gl7bj    0/1     Running            6          14m
marketplace-operator-7c4959c648-fwmn7   1/1     Running            0          15m
redhat-marketplace-5874897f8f-527hz     1/1     Running            0          14m
redhat-operators-7d877d5977-jp8wz       1/1     Running            0          14m

Events:
  Type     Reason     Age                    From                                                                     Message
  ----     ------     ----                   ----                                                                     -------
  Normal   Scheduled  51m                    default-scheduler                                                        Successfully assigned openshift-marketplace/community-operators-698654bb96-zd4s6 to node01.example.com
  Normal   Started    51m                    kubelet, node01.example.com  Started container community-operators
  Warning  Unhealthy  49m (x9 over 51m)      kubelet, node01.example.com  Readiness probe failed: timeout: failed to connect service "localhost:50051" within 1s
  Normal   Created    49m (x2 over 51m)      kubelet, node01.example.com  Created container community-operators
  Normal   Killing    49m                    kubelet, node01.example.com  Container community-operators failed liveness probe, will be restarted
  Normal   Pulled     26m (x9 over 51m)      kubelet, node01.example.com  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:821853c24977f49986d51cf2a3756dc3d067fc3122c27ef60db9445f67d66c5c" already present on machine
  Warning  Unhealthy  6m46s (x125 over 51m)  kubelet, node01.example.com  Liveness probe failed: timeout: failed to connect service "localhost:50051" within 1s
  Warning  BackOff    106s (x98 over 36m)    kubelet, node01.example.com  Back-off restarting failed container

The logs of the failing pods do not show errors, just packages being downloading.

Environment

OpenShift Container Platform
- 4.4

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Select Your Language

CrashLoopBackOff of certified-operators & community-operators pods after cluster upgrade

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links