OCS operator in CrashLoopBackOff state after deployment

Environment

Red Hat Openshift Data Foundation 4.x

Issue

ocs operator pod in CrashLoopBackOff state after new ODF deployment
ocs-operator csv in Installing state

Resolution

Enable the CSISnapshot Capability in the cluster
Review yaml of clusterversion after adding CSISnapshot to additionalEnabledCapabilities.

oc get clusterversion -o yaml
apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
  kind: ClusterVersion
  metadata:
    creationTimestamp: "2023-10-22T21:22:59Z"
    generation: 6
    name: version
    resourceVersion: "179277544"
    uid: 2xxxx3-exx5-4xxf-axx6-exxxxxxxx5
  spec:
    capabilities:
      additionalEnabledCapabilities:
      - CSISnapshot
      baselineCapabilitySet: None
...
...

  status:
    availableUpdates: null
    capabilities:
      enabledCapabilities:
      - CSISnapshot
      - Console
      - marketplace
      - openshift-samples
      knownCapabilities:
      - CSISnapshot
..
      - marketplace
      - openshift-samples

Review the status of ocs-operator pod post enabling CSISnapshot Capability

$ oc get po | grep ocs-operator
ocs-operator-5f7ffb7765-7r4l7                                     1/1     Running             3088 (3d22h ago)   19d

Root Cause

The api 'VolumeSnapshotClass` was not accessible in the cluster and by the ocs-operator.
volumesnapshot CRD was missing from the cluster since csisnapshot was disabled at install.
This caused the operator going to CLBO state.

Diagnostic Steps

Check the state of the ocs-operator pod

$ oc get pods -n openshift-storage | grep operator
noobaa-operator-d484cdd-574lf                                     1/1     Running             0              17h
ocs-operator-6cd7cc845c-2kvz8                                     0/1     ContainerCreating   0              3s
odf-operator-controller-manager-7656c9d4fb-tmjrh                  2/2     Running             0              12d
rook-ceph-operator-6f8c69f9bf-5vnrl

Review ocs-operator pod logs for the error "VolumeSnapshotClass.snapshot.storage.k8s.io","error":"no matches for kind \"VolumeSnapshotClass

{"level":"error","ts":"2023-12-12T21:50:52Z","logger":"controllers.StorageCluster","msg":"Failed to 'Get' SnapshotClass.","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","SnapshotClass":{"name":"ocs-storagecluster-cephfsplugin-snapclass"},"error":"no matches for kind \"VolumeSnapshotClass\" in version \"snapshot.storage.k8s.io/v1\"","stacktrace":"github.com/red-hat-storage/ocs-operator/controllers/storagecluster.(*StorageClusterReconciler).createSnapshotClasses\n\t/remote-source/app/controllers/storagecluster/volumesnapshotterclasses.go:116\ngithub.com/red-hat-storage/ocs-
...
(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235"}

{"level":"error","ts":"2023-12-12T21:50:54Z","logger":"controller-runtime.source","msg":"if kind is a CRD, it should be installed before calling Start","kind":"VolumeSnapshotClass.snapshot.storage.k8s.io","error":"no matches for kind \"VolumeSnapshotClass\" in version \"snapshot.storage.k8s.io/v1\""
..
source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:547\nsigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/source/source.go:136"}

Review yaml of ocs-operator pod to check the state

oc get ocs-operator -n openshift-storage -o yaml
..

Command:
      ocs-operator
    Args:
      --enable-leader-election
      --health-probe-bind-address=:8081
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
...
Events:
  Type     Reason      Age                     From     Message
  ----     ------      ----                    ----     -------
  Warning  BackOff     8m (x13983 over 2d23h)  kubelet  Back-off restarting failed container ocs-operator in pod ocs-operator-6cd7cc845c-2kvz8_openshift-storage(4d98fe56-59e3-4f66-8840-f0a1d5984260)
  Warning  ProbeError  53s (x398 over 2d23h)   kubelet  Readiness probe error: Get "http://10.129.0.115:8081/readyz": dial tcp 10.129.0.115:8081: connect: connection refused
body:

Review yaml of clusterversion to check which capabilities are enabled

oc get clusterversion -o yaml

---snip---
  status:
    availableUpdates: null
    capabilities:
      enabledCapabilities:
      - Console
      - marketplace
      - openshift-samples
      knownCapabilities:
      - CSISnapshot

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Select Your Language

OCS operator in CrashLoopBackOff state after deployment

Environment

Issue

Resolution

Root Cause

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Environment

Issue

Resolution

Root Cause

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links