CrashLoopBackOff of certified-operators & community-operators pods after cluster upgrade

Solution Verified - Updated -

Issue

  • After cluster upgrade to 4.4.x, community and certified operators pods are continuosly crashing with Liveness and/or Readiness probes failures:
NAME                                    READY   STATUS             RESTARTS   AGE
certified-operators-5bcb56768c-64fxs    0/1     CrashLoopBackOff   27         118m
certified-operators-cddd74b58-k86fv     0/1     Running            6          14m
community-operators-698654bb96-zd4s6    0/1     CrashLoopBackOff   13         51m
community-operators-786f694c8d-gl7bj    0/1     Running            6          14m
marketplace-operator-7c4959c648-fwmn7   1/1     Running            0          15m
redhat-marketplace-5874897f8f-527hz     1/1     Running            0          14m
redhat-operators-7d877d5977-jp8wz       1/1     Running            0          14m
Events:
  Type     Reason     Age                    From                                                                     Message
  ----     ------     ----                   ----                                                                     -------
  Normal   Scheduled  51m                    default-scheduler                                                        Successfully assigned openshift-marketplace/community-operators-698654bb96-zd4s6 to node01.example.com
  Normal   Started    51m                    kubelet, node01.example.com  Started container community-operators
  Warning  Unhealthy  49m (x9 over 51m)      kubelet, node01.example.com  Readiness probe failed: timeout: failed to connect service "localhost:50051" within 1s
  Normal   Created    49m (x2 over 51m)      kubelet, node01.example.com  Created container community-operators
  Normal   Killing    49m                    kubelet, node01.example.com  Container community-operators failed liveness probe, will be restarted
  Normal   Pulled     26m (x9 over 51m)      kubelet, node01.example.com  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:821853c24977f49986d51cf2a3756dc3d067fc3122c27ef60db9445f67d66c5c" already present on machine
  Warning  Unhealthy  6m46s (x125 over 51m)  kubelet, node01.example.com  Liveness probe failed: timeout: failed to connect service "localhost:50051" within 1s
  Warning  BackOff    106s (x98 over 36m)    kubelet, node01.example.com  Back-off restarting failed container
  • The logs of the failing pods do not show errors, just packages being downloading.

Environment

  • OpenShift Container Platform
    • 4.4

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content