Network Tuning Operator Can CrashLoop and Block Upgrades on Clusters With a Lot of CSVs
Issue
- A cluster with a large number of
ClusterServiceVersion (CSV)
andNamespaces
objects may cause thecluster-node-tuning-operator
to CrashLoop during an upgrade due to the length of time needed to query all of the CSV objects - The
cluster-node-tuning-operator-*
pod is in aCrashLoopBackoff
state - The
cluster-node-tuning-operator-*
pod is emitting a log similar to:
2023-05-29T20:16:09.413129360Z F0529 20:16:09.413090 1 main.go:136] unable to remove Performance addons OLM operator: stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 515; INTERNAL_ERROR; received from peer
2023-05-29T20:16:09.413255926Z goroutine 1 [running]:
2023-05-29T20:16:09.413255926Z k8s.io/klog/v2.stacks(0x1)
2023-05-29T20:16:09.413255926Z /go/src/github.com/openshift/cluster-node-tuning-operator/vendor/k8s.io/klog/v2/klog.go:860 +0x89
- The cluster upgrade is stuck on
node-tuning
:
Working towards 4.12.14: 664 of 831 done (79% complete), waiting on node-tuning
Environment
- Red Hat OpenShift Container Platform (OCP)
- 4.11+
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.