'ocs-operator' fails to reconcile the cluster due to 'StorageClassName' set to 'null' in StorageClassDeviceSet.

Solution Verified - Updated -

Environment

  • Red Hat OpenShift Container Storage 4.3
  • Red Hat OpenShift Container Storage 4.4

Issue

  • noobaa-endpoint keeps starting and ending at CrashLoopBackOff state after upgrading OCS cluster

  • OCS cluster is upgraded to the next version, but the Ceph cluster is not upgraded.

  • ocs-operator fails to reconcile the cluster due to StorageClassName set to null in StorageClassDeviceSet.

Resolution

  • The solution is to modify the StorageCluster to explicitly specify the desired StorageClass for creating OSD PVCs.

     1. Take a backup of StorageCluster:
     # oc get StorageCluster ocs-storagecluster -oyaml > StorageCluster.yaml
    
     2. Edit the StorageCluster:
     # oc edit StorageCluster ocs-storagecluster
     <...>
     spec:
       storageDeviceSets:
       - config: {}
         count: 1
         dataPVCTemplate:
           metadata:
             creationTimestamp: null
           spec:
             accessModes:
             - ReadWriteOnce
             resources:
               requests:
                 storage: 2Ti
             storageClassName: null                   <----------------- Change here
             volumeMode: Block
           status: {}
         name: ocs-deviceset
         placement: {}
         portable: true
         replica: 3
         resources: {}
       version: 4.3.0
     <...>
    

Root Cause

  • In OCP v4.3 the UI had a bug where it was inappropriately setting an empty string when selecting a StorageClass at the time of deploying OCS cluster.

  • In OCS v4.3 a check was introduced to not allow an empty string as the StorageClassName for the StorageClassDeviceSet.

  • The OCS v4.2 did not have any check, so it allowed the StorageCluster creation to go through. And on upgrade to OCS v4.3, the check was present in ocs-operator and it refused to reconcile.

  • This issue is fixed in OCP 4.4, BZ #1812448.

Diagnostic Steps

  • The ocs-operator logs will have following Error:
$ oc logs ocs-operator-<pod-suffix> | grep 'no StorageClass specified'
2020-06-11T16:16:53.986834199Z {"level":"error","ts":"2020-06-11T16:16:53.986Z","logger":"controller_storagecluster","msg":"Failed to validate StorageDeviceSets","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","error":"failed to validate StorageDeviceSet 0: no StorageClass specified", ...}

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.