ODF stretch cluster - rook-ceph-operator in CrashLoopBackOff after upgrade from 4.12.10 to 4.12.11
Issue
-
This is ODF 4.12.11 in stretch cluster, two nodes in one Datacenter "A" and two nodes in Datacenter "B" , ceph pools with replica 4
-
The rook-ceph-operator is constantly crashing
rook-ceph-operator-f7b7996f6-xxxxx 0/1 Running 11 1h1m
-
From
oc get pod rook-ceph-operator-f7b7996f6-xxxxx -o yaml
containerStatuses: - containerID: cri-o://6b79e6d22d7fc154ac33f1c9a0c9f33284a213e6652d4bca20715a30d10b31df image: registry.redhat.io/odf4/rook-ceph-rhel8-operator@sha256:403417b671b3b87e5823ff707d51e88b9a78e1bcfbb0287535aa96b4b9f166b9 imageID: registry.redhat.io/odf4/rook-ceph-rhel8-operator@sha256:403417b671b3b87e5823ff707d51e88b9a78e1bcfbb0287535aa96b4b9f166b9 lastState: terminated: containerID: cri-o://6b79e6d22d7fc154ac33f1c9a0c9f33284a213e6652d4bca20715a30d10b31df exitCode: 1 finishedAt: '2024-02-14T13:16:54Z' message: 'failed to run operator: gave up to run the operator manager: failed to set up overall controller-runtime manager: error listening on :8080: listen tcp :8080: bind: address already in use' reason: Error startedAt: '2024-02-14T13:16:05Z' name: rook-ceph-operator ready: false restartCount: 11 started: false state: waiting: message: back-off 5m0s restarting failed container=rook-ceph-operator pod=rook-ceph-operator-f7b7996f6-xxxxx_openshift-storage(463addc4-88fc-4cc8-8c18-c0d9652195bd) reason: CrashLoopBackOff
-
with errors like (from
oc logs rook-ceph-operator-f7b7996f6-xxxxx
) :2024-02-02T10:48:39.799337550Z 2024-02-02 10:48:39.799238 I | ceph-cluster-controller: CR has changed for "ocs-storagecluster-cephcluster". diff= v1.ClusterSpec{ 2024-02-02T10:48:39.799337550Z ... // 11 identical fields 2024-02-02T10:48:39.799337550Z WaitTimeoutForHealthyOSDInMinutes: s"0s", 2024-02-02T10:48:39.799337550Z DisruptionManagement: {ManagePodBudgets: true, MachineDisruptionBudgetNamespace: "openshift-machine-api"}, 2024-02-02T10:49:41.292040969Z Mon: v1.MonSpec{ 2024-02-02T10:49:41.292040969Z Count: 5, 2024-02-02T10:49:41.292040969Z AllowMultiplePerNode: false, 2024-02-02T10:49:41.292040969Z StretchCluster: &v1.StretchClusterSpec{ 2024-02-02T10:49:41.292040969Z FailureDomainLabel: "topology.kubernetes.io/zone", 2024-02-02T10:49:41.292040969Z SubFailureDomain: "", 2024-02-02T10:49:41.292040969Z Zones: []v1.StretchClusterZoneSpec{ 2024-02-02T10:49:41.292040969Z { 2024-02-02T10:49:41.292040969Z - Name: "DC-B", 2024-02-02T10:49:41.292040969Z + Name: "DC-A", 2024-02-02T10:49:41.292040969Z Arbiter: false, 2024-02-02T10:49:41.292040969Z VolumeClaimTemplate: nil, 2024-02-02T10:49:41.292040969Z }, 2024-02-02T10:49:41.292040969Z { 2024-02-02T10:49:41.292040969Z - Name: "DC-A", 2024-02-02T10:49:41.292040969Z + Name: "DC-B", 2024-02-02T10:49:41.292040969Z Arbiter: false, 2024-02-02T10:49:41.292040969Z VolumeClaimTemplate: nil, 2024-02-02T10:49:41.292040969Z }, 2024-02-02T10:49:41.292040969Z {Name: "arbiter", Arbiter: true}, 2024-02-02T10:49:41.292040969Z }, 2024-02-02T10:49:41.292040969Z }, 2024-02-02T10:49:41.292040969Z VolumeClaimTemplate: nil, 2024-02-02T10:49:41.292040969Z }, 2024-02-02T10:49:41.292040969Z CrashCollector: {}, 2024-02-02T10:49:41.292040969Z Dashboard: {}, 2024-02-02T10:49:41.292040969Z ... // 8 identical fields 2024-02-02T10:49:41.292040969Z }
Environment
- ODF 4.12.11
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.