Network Tuning Operator Can CrashLoop and Block Upgrades on Clusters With a Lot of CSVs

Solution Verified - Updated -

Issue

  • A cluster with a large number of ClusterServiceVersion (CSV) and Namespaces objects may cause the cluster-node-tuning-operator to CrashLoop during an upgrade due to the length of time needed to query all of the CSV objects
  • The cluster-node-tuning-operator-* pod is in a CrashLoopBackoff state
  • The cluster-node-tuning-operator-* pod is emitting a log similar to:
2023-05-29T20:16:09.413129360Z F0529 20:16:09.413090       1 main.go:136] unable to remove Performance addons OLM operator: stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 515; INTERNAL_ERROR; received from peer
2023-05-29T20:16:09.413255926Z goroutine 1 [running]:
2023-05-29T20:16:09.413255926Z k8s.io/klog/v2.stacks(0x1)
2023-05-29T20:16:09.413255926Z  /go/src/github.com/openshift/cluster-node-tuning-operator/vendor/k8s.io/klog/v2/klog.go:860 +0x89
  • The cluster upgrade is stuck on node-tuning:
Working towards 4.12.14: 664 of 831 done (79% complete), waiting on node-tuning

Environment

  • Red Hat OpenShift Container Platform (OCP)
    • 4.11+

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content