ODF stretch cluster - rook-ceph-operator in CrashLoopBackOff after upgrade from 4.12.10 to 4.12.11

Solution Verified - Updated -

Issue

  • This is ODF 4.12.11 in stretch cluster, two nodes in one Datacenter "A" and two nodes in Datacenter "B" , ceph pools with replica 4

  • The rook-ceph-operator is constantly crashing

    rook-ceph-operator-f7b7996f6-xxxxx                               0/1    Running    11        1h1m
    
  • From oc get pod rook-ceph-operator-f7b7996f6-xxxxx -o yaml

    containerStatuses:
    - containerID: cri-o://6b79e6d22d7fc154ac33f1c9a0c9f33284a213e6652d4bca20715a30d10b31df
    image: registry.redhat.io/odf4/rook-ceph-rhel8-operator@sha256:403417b671b3b87e5823ff707d51e88b9a78e1bcfbb0287535aa96b4b9f166b9
    imageID: registry.redhat.io/odf4/rook-ceph-rhel8-operator@sha256:403417b671b3b87e5823ff707d51e88b9a78e1bcfbb0287535aa96b4b9f166b9
    lastState:
      terminated:
        containerID: cri-o://6b79e6d22d7fc154ac33f1c9a0c9f33284a213e6652d4bca20715a30d10b31df
        exitCode: 1
        finishedAt: '2024-02-14T13:16:54Z'
        message: 'failed to run operator: gave up to run the operator manager: failed
          to set up overall controller-runtime manager: error listening on :8080:
          listen tcp :8080: bind: address already in use'
        reason: Error
        startedAt: '2024-02-14T13:16:05Z'
    name: rook-ceph-operator
    ready: false
    restartCount: 11
    started: false
    state:
      waiting:
        message: back-off 5m0s restarting failed container=rook-ceph-operator pod=rook-ceph-operator-f7b7996f6-xxxxx_openshift-storage(463addc4-88fc-4cc8-8c18-c0d9652195bd)
        reason: CrashLoopBackOff
    
  • with errors like (from oc logs rook-ceph-operator-f7b7996f6-xxxxx) :

    2024-02-02T10:48:39.799337550Z 2024-02-02 10:48:39.799238 I | ceph-cluster-controller: CR has changed for "ocs-storagecluster-cephcluster". diff=  v1.ClusterSpec{
    2024-02-02T10:48:39.799337550Z      ... // 11 identical fields
    2024-02-02T10:48:39.799337550Z      WaitTimeoutForHealthyOSDInMinutes: s"0s",
    2024-02-02T10:48:39.799337550Z      DisruptionManagement:              {ManagePodBudgets: true, MachineDisruptionBudgetNamespace: "openshift-machine-api"},
    2024-02-02T10:49:41.292040969Z      Mon: v1.MonSpec{
    2024-02-02T10:49:41.292040969Z          Count:                5,
    2024-02-02T10:49:41.292040969Z          AllowMultiplePerNode: false,
    2024-02-02T10:49:41.292040969Z          StretchCluster: &v1.StretchClusterSpec{
    2024-02-02T10:49:41.292040969Z              FailureDomainLabel: "topology.kubernetes.io/zone",
    2024-02-02T10:49:41.292040969Z              SubFailureDomain:   "",
    2024-02-02T10:49:41.292040969Z              Zones: []v1.StretchClusterZoneSpec{
    2024-02-02T10:49:41.292040969Z                  {
    2024-02-02T10:49:41.292040969Z -                    Name:                "DC-B",
    2024-02-02T10:49:41.292040969Z +                    Name:                "DC-A",
    2024-02-02T10:49:41.292040969Z                      Arbiter:             false,
    2024-02-02T10:49:41.292040969Z                      VolumeClaimTemplate: nil,
    2024-02-02T10:49:41.292040969Z                  },
    2024-02-02T10:49:41.292040969Z                  {
    2024-02-02T10:49:41.292040969Z -                    Name:                "DC-A",
    2024-02-02T10:49:41.292040969Z +                    Name:                "DC-B",
    2024-02-02T10:49:41.292040969Z                      Arbiter:             false,
    2024-02-02T10:49:41.292040969Z                      VolumeClaimTemplate: nil,
    2024-02-02T10:49:41.292040969Z                  },
    2024-02-02T10:49:41.292040969Z                  {Name: "arbiter", Arbiter: true},
    2024-02-02T10:49:41.292040969Z              },
    2024-02-02T10:49:41.292040969Z          },
    2024-02-02T10:49:41.292040969Z          VolumeClaimTemplate: nil,
    2024-02-02T10:49:41.292040969Z      },
    2024-02-02T10:49:41.292040969Z      CrashCollector: {},
    2024-02-02T10:49:41.292040969Z      Dashboard:      {},
    2024-02-02T10:49:41.292040969Z      ... // 8 identical fields
    2024-02-02T10:49:41.292040969Z   }
    

Environment

  • ODF 4.12.11

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content