Ocs-operator pod fails to reconcile the cluster due to StorageClassName set to null
Environment
- Red Hat OpenShift Container Platform (RHOCP) 4
- Red Hat OpenShift Container Storage (OCS) 4.3+
Issue
noobaa-endpoint
keeps starting and ending atCrashLoopBackOff
state after upgrading the OCS cluster.OCS
cluster is upgraded to the next version, but theCeph
cluster is not upgraded.ocs-operator
fails to reconcile the cluster due toStorageClassName
set tonull
inStorageClassDeviceSet
.
Resolution
- Take a backup of StorageCluster:
$ oc get StorageCluster ocs-storagecluster -oyaml > StorageCluster.yaml
- Modify the
StorageCluster
to explicitly specify the desiredStorageClass
for creatingOSD PVCs
.
$ oc edit StorageCluster ocs-storagecluster
<...>
spec:
storageDeviceSets:
- config: {}
count: 1
dataPVCTemplate:
metadata:
creationTimestamp: null
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Ti
storageClassName: null <----------------- Change here
volumeMode: Block
status: {}
name: ocs-deviceset
placement: {}
portable: true
replica: 3
resources: {}
version: 4.3.0
<...>
Root Cause
-
The
OCS v4.2
did not have any check and it allowed theStorageCluster
creation to go through. When upgrading toOCS v4.3
, the check was introduced in theocs-operator
, and it refused to reconcile. -
OCS management-console
was inappropriately setting an empty string when selecting aStorageClass
at the time of deployingOCS
cluster A check was introduced in the bug fix to not allow an empty string as theStorageClassName
for theStorageClassDeviceSet
. -
The issue has been identified as a bug in RHOCP v4.3 and was being tracked by the Red Hat Engineering team under BZ-1812448.
-
The bug has been fixed in the RHOCP v4.4 and later backported to RHOCP v4.3 as per Errata RHBA-2020:1437. If this issue still occurs after updating, open a support case in the Red Hat Customer Portal referring to this solution.
Diagnostic Steps
- Check the ocs-operator logs and see the following error:
$ oc logs ocs-operator-<pod-suffix> | grep 'no StorageClass specified'
2020-06-11T16:16:53.986834199Z{"level":"error","ts":"2020-06-11T16:16:53.986Z","logger":"controller_storagecluster","msg":"Failed to validate StorageDeviceSets","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","error":"failed to validate StorageDeviceSet 0: no StorageClass specified", ...}
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments