How to Rebuild NooBaa in Openshift Data Foundation (ODF) 4.x?

Solution Verified - Updated -

Environment

Red Hat OpenShift Container Platform (RHOCP) v4.x
Red Hat OpenShift Data Foundations (RHODF) v4.x
Red Hat OpenShift Container Storage (RHOCS) v4.x

Issue

  • The db-noobaa-db-0 PVC was deleted, due to which Backingstores went into NotReady state. How can I get noobaa working?
  • Installation of openshift-storage/noobaa-default-backing-store fails with an AUTH error stating "account not found".
  • When OCS Operator is stuck at "Installing State" while Object Storage resources are not detected.
  • When the Noobaa management endpoint is accessible but no configuration is found (Noobaa Web GUI indicates the need to create a user as the system is not ready).
  • When the OCS operator is running 0/1 pods.
time="2021-03-18T08:42:07Z" level=warning msg="RPC: GetConnection creating connection to wss://noobaa-mgmt.openshift-storage.svc.cluster.local:443/rpc/ 0xc00059acd0"
time="2021-03-18T08:42:07Z" level=info msg="RPC: Connecting websocket (0xc00059acd0) &{RPC:0xc0003fd860 Address:wss://noobaa-mgmt.openshift-storage.svc.cluster.local:443/rpc/ State:init WS:<nil> PendingRequests:
map[] NextRequestID:0 Lock:{state:1 sema:0} ReconnectDelay:0s}"
time="2021-03-18T08:42:07Z" level=info msg="RPC: Connected websocket (0xc00059acd0) &{RPC:0xc0003fd860 Address:wss://noobaa-mgmt.openshift-storage.svc.cluster.local:443/rpc/ State:init WS:<nil> PendingRequests:m
ap[] NextRequestID:0 Lock:{state:1 sema:0} ReconnectDelay:0s}"
time="2021-03-18T08:42:07Z" level=error msg="⚠️  RPC: system.read_system() Response Error: Code=UNAUTHORIZED Message=account not found 5e96a72c498f080030a76337"

Resolution

WARNING : The solution below involves the deletion and recreation of ALL of noobaa resources, which means ALL data inside noobaa buckets will be lost. Doing this means ALL Noobaa resources will be gone. Please reach out Red Hat Technical Support (sbr-ocs) team for a review of cluster before applying the changes outlined in this KCS.

Steps for ODF versions 4.11+

  1. Patch and Deletion Commands:
$ oc patch -n openshift-storage noobaa noobaa --type='merge' -p '{"spec":{"cleanupPolicy":{"allowNoobaaDeletion":true}}}'
  1. Delete NooBaa/Multcloud Gateway (MCG):
$ oc delete -n openshift-storage noobaas.noobaa.io  --all

If there are existing buckets the above step will hang. If the command hangs, please do the following in a separate terminal window

$ oc patch -n openshift-storage noobaas/noobaa --type=merge -p '{"metadata": {"finalizers":null}}'
  1. After waiting some time for the termination/re-creation of all NooBaa resources validate the new age of all MCG resources with the following command:
oc get pv,deployment,pods,sts -n openshift-storage|grep noobaa

Note: The above steps won't delete any custom backingstore, custom bucketclass, custom noobaa storageclass, obc, ob. You need to delete them manually:

  1. Delete existing OBC, that should delete the associated OB and secret. Then recreate new OBC with the same name as before (in the same namespaces as before), that will create a new OB, noobaa bucket and new secret with new AWS credentials. Thew new values will need to be configured on the applications that were using noobaa buckets.

Example with only one OBC "test-todelete" on namespace "openshift-storage"

% oc get obc -A
NAMESPACE           NAME            STORAGE-CLASS                 PHASE   AGE
openshift-storage   test-todelete   openshift-storage.noobaa.io   Bound   81s

% oc get secret test-todelete -n openshift-storage
NAME            TYPE     DATA   AGE
test-todelete   Opaque   2      116s

% oc get ob                  
NAME                                  STORAGE-CLASS                 CLAIM-NAMESPACE   CLAIM-NAME   RECLAIM-POLICY   PHASE   AGE
obc-openshift-storage-test-todelete   openshift-storage.noobaa.io                                  Delete           Bound   2m22s

% oc -n openshift-storage delete obc test-todelete
objectbucketclaim.objectbucket.io "test-todelete" deleted
% 

After that we can see that these resources are gone:

% oc get obc -A
No resources found

% oc get ob                                       
No resources found

% oc get secret test-todelete -n openshift-storage
Error from server (NotFound): secrets "test-todelete" not found

For OCS/ODF versions 4.10 and below:

Follow the below steps to safely recreate noobaa system in OCS/ODF versions 4.10 and below :

  • For ODF version 4.9+ , scale down the odf-operator-controller-manager, ocs operator, and noobaa-operator:
$ oc scale deployment odf-operator-controller-manager ocs-operator noobaa-operator --replicas=0
  • Delete the noobaa deployments\statefulsets.
    ** NOTE: For OCS 4.7.x+ the statefulset is called 'noobaa-db-pg'
$ oc delete deployments.apps noobaa-endpoint
$ oc delete statefulsets.apps noobaa-db noobaa-core
  • Delete the PVC db-noobaa-db-0
    ** NOTE: For OCS 4.7.x+ the PVC is called 'db-noobaa-db-pg-0'
$ oc delete pvc db-noobaa-db-0
  • Delete the backingstores and bucket-class.
$ oc delete bucketclasses.noobaa.io,backingstores.noobaa.io --all

NOTE: You might need to remove finalizers for the two resources in order for the deletion to complete. Run the following patch commands if this is true:
$ oc patch backingstore/noobaa-default-backing-store -n openshift-storage  --type=merge -p '{"metadata": {"finalizers":null}}'
$ oc patch bucketclass/noobaa-default-bucket-class -n openshift-storage  --type=merge -p '{"metadata": {"finalizers":null}}'
  • Delete the noobaa secrets.
    Note: If an external KMS is being used, we can skip the removal of the "noobaa-root-master-key"
    By default, the NooBaa master root secret is stored in the k8s secret named 'noobaa-root-master-key'. However, if an external KMS is defined in the NooBaa system CR, the master root key will instead be stored in the specified backend, and the 'noobaa-root-master-key' secret will not be created by the operator.
$ oc delete secrets noobaa-admin noobaa-endpoints noobaa-operator noobaa-server noobaa-root-master-key
  • Restart ocs-operator by setting the replicas back to 1.
$ oc scale deployment ocs-operator --replicas=1
  • For ODF version 4.9+ , scale up the odf-operator-controller-manager operator as an additional step:
$ oc scale deployment odf-operator-controller-manager --replicas=1
  • Restart noobaa-operator by setting the replicas back to 1.
$ oc scale deployment noobaa-operator --replicas=1

The operator should start a fresh install of noobaa system.

  • Monitor the pods in openshift-storage for noobaa pods to be Running. Also check for the status of backingstore/bucketclass.
$ oc get pods,backingstore,bucketclass -n openshift-storage -o wide

Both should be in Ready status.

Note: In case the rebuilding of noobaa is stuck or not progressing try to delete the noobaa Custom resource while deleting the other resources.

$ oc delete noobaa noobaa

Root Cause

  • The noobaa DB holds all of noobaa's metadata - system configurations, authentication info, buckets info, and objects metadata. Deleting the PV would mean losing the entire data, and only a rebuild can bring noobaa to Ready status.
  • The "Code=UNAUTHORIZED Message=account not found" messages are seen due to account information missing from the noobaa-db.

Diagnostic Steps

  • Check for the Noobaa resources creation timestamps.
noobaa CR - created by OCS operator on storage cluster installation.
$ oc get noobaa noobaa -o yaml
    creationTimestamp: "2020-04-15T06:17:27Z"

noobaa-operator secret - craeted by noobaa-operator on noobaa CR reconcilation 
# oc get secrets noobaa-operator -o yaml
    creationTimestamp: "2020-04-15T06:18:20Z"

noobaa-db Statefulset - created by noobaa-operator
# oc get sts noobaa-db -o yaml
    creationTimestamp: "2020-06-22T17:05:18Z"

PVC noobaa-db-0 - this should match the Statefulset creation time (in some cases it can be older, but not newer)
# oc get pv db-noobaa-db-0 -o yaml
    creationTimestamp: "2021-03-14T13:58:43Z"

It can be noticed that the noobaa-db-0 PVC was created recently.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments