OCS operator in CrashLoopBackOff state after deployment
Environment
- Red Hat Openshift Data Foundation 4.x
Issue
- ocs operator pod in CrashLoopBackOff state after new ODF deployment
- ocs-operator csv in Installing state
Resolution
- Enable the CSISnapshot Capability in the cluster
- Review yaml of clusterversion after adding CSISnapshot to additionalEnabledCapabilities.
oc get clusterversion -o yaml
apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
kind: ClusterVersion
metadata:
creationTimestamp: "2023-10-22T21:22:59Z"
generation: 6
name: version
resourceVersion: "179277544"
uid: 2xxxx3-exx5-4xxf-axx6-exxxxxxxx5
spec:
capabilities:
additionalEnabledCapabilities:
- CSISnapshot
baselineCapabilitySet: None
...
...
status:
availableUpdates: null
capabilities:
enabledCapabilities:
- CSISnapshot
- Console
- marketplace
- openshift-samples
knownCapabilities:
- CSISnapshot
..
- marketplace
- openshift-samples
- Review the status of ocs-operator pod post enabling CSISnapshot Capability
$ oc get po | grep ocs-operator
ocs-operator-5f7ffb7765-7r4l7 1/1 Running 3088 (3d22h ago) 19d
Root Cause
- The api 'VolumeSnapshotClass` was not accessible in the cluster and by the ocs-operator.
- volumesnapshot CRD was missing from the cluster since csisnapshot was disabled at install.
- This caused the operator going to CLBO state.
Diagnostic Steps
- Check the state of the ocs-operator pod
$ oc get pods -n openshift-storage | grep operator
noobaa-operator-d484cdd-574lf 1/1 Running 0 17h
ocs-operator-6cd7cc845c-2kvz8 0/1 ContainerCreating 0 3s
odf-operator-controller-manager-7656c9d4fb-tmjrh 2/2 Running 0 12d
rook-ceph-operator-6f8c69f9bf-5vnrl
- Review ocs-operator pod logs for the error "VolumeSnapshotClass.snapshot.storage.k8s.io","error":"no matches for kind \"VolumeSnapshotClass
{"level":"error","ts":"2023-12-12T21:50:52Z","logger":"controllers.StorageCluster","msg":"Failed to 'Get' SnapshotClass.","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","SnapshotClass":{"name":"ocs-storagecluster-cephfsplugin-snapclass"},"error":"no matches for kind \"VolumeSnapshotClass\" in version \"snapshot.storage.k8s.io/v1\"","stacktrace":"github.com/red-hat-storage/ocs-operator/controllers/storagecluster.(*StorageClusterReconciler).createSnapshotClasses\n\t/remote-source/app/controllers/storagecluster/volumesnapshotterclasses.go:116\ngithub.com/red-hat-storage/ocs-
...
(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235"}
{"level":"error","ts":"2023-12-12T21:50:54Z","logger":"controller-runtime.source","msg":"if kind is a CRD, it should be installed before calling Start","kind":"VolumeSnapshotClass.snapshot.storage.k8s.io","error":"no matches for kind \"VolumeSnapshotClass\" in version \"snapshot.storage.k8s.io/v1\""
..
source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:547\nsigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/source/source.go:136"}
- Review yaml of ocs-operator pod to check the state
oc get ocs-operator -n openshift-storage -o yaml
..
Command:
ocs-operator
Args:
--enable-leader-election
--health-probe-bind-address=:8081
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 8m (x13983 over 2d23h) kubelet Back-off restarting failed container ocs-operator in pod ocs-operator-6cd7cc845c-2kvz8_openshift-storage(4d98fe56-59e3-4f66-8840-f0a1d5984260)
Warning ProbeError 53s (x398 over 2d23h) kubelet Readiness probe error: Get "http://10.129.0.115:8081/readyz": dial tcp 10.129.0.115:8081: connect: connection refused
body:
- Review yaml of clusterversion to check which capabilities are enabled
oc get clusterversion -o yaml
---snip---
status:
availableUpdates: null
capabilities:
enabledCapabilities:
- Console
- marketplace
- openshift-samples
knownCapabilities:
- CSISnapshot
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments