Ocs-operator pod fails to reconcile the cluster due to StorageClassName set to null
Environment
- Red Hat OpenShift Container Platform (RHOCP) 4
- Red Hat OpenShift Container Storage (OCS) 4.3+
Issue
noobaa-endpointkeeps starting and ending atCrashLoopBackOffstate after upgrading the OCS cluster.OCScluster is upgraded to the next version, but theCephcluster is not upgraded.ocs-operatorfails to reconcile the cluster due toStorageClassNameset tonullinStorageClassDeviceSet.
Resolution
- Take a backup of StorageCluster:
$ oc get StorageCluster ocs-storagecluster -oyaml > StorageCluster.yaml
- Modify the
StorageClusterto explicitly specify the desiredStorageClassfor creatingOSD PVCs.
$ oc edit StorageCluster ocs-storagecluster
<...>
spec:
storageDeviceSets:
- config: {}
count: 1
dataPVCTemplate:
metadata:
creationTimestamp: null
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Ti
storageClassName: null <----------------- Change here
volumeMode: Block
status: {}
name: ocs-deviceset
placement: {}
portable: true
replica: 3
resources: {}
version: 4.3.0
<...>
Root Cause
-
The
OCS v4.2did not have any check and it allowed theStorageClustercreation to go through. When upgrading toOCS v4.3, the check was introduced in theocs-operator, and it refused to reconcile. -
OCS management-consolewas inappropriately setting an empty string when selecting aStorageClassat the time of deployingOCScluster A check was introduced in the bug fix to not allow an empty string as theStorageClassNamefor theStorageClassDeviceSet. -
The issue has been identified as a bug in RHOCP v4.3 and was being tracked by the Red Hat Engineering team under BZ-1812448.
-
The bug has been fixed in the RHOCP v4.4 and later backported to RHOCP v4.3 as per Errata RHBA-2020:1437. If this issue still occurs after updating, open a support case in the Red Hat Customer Portal referring to this solution.
Diagnostic Steps
- Check the ocs-operator logs and see the following error:
$ oc logs ocs-operator-<pod-suffix> | grep 'no StorageClass specified'
2020-06-11T16:16:53.986834199Z{"level":"error","ts":"2020-06-11T16:16:53.986Z","logger":"controller_storagecluster","msg":"Failed to validate StorageDeviceSets","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","error":"failed to validate StorageDeviceSet 0: no StorageClass specified", ...}
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments