Chapter 9. Restoring the monitor pods in OpenShift Container Storage

Restore the monitor pods if all three of them go down, and when OpenShift Container Storage is not able to recover the monitor pods automatically.

Procedure

  1. Scale down the rook-ceph-operator and ocs operator deployments.

    # oc scale deployment rook-ceph-operator --replicas=0 -n openshift-storage
    # oc scale deployment ocs-operator --replicas=0 -n openshift-storage
  2. Create a backup of all deployments in the openshift-storage namespace.

    # mkdir backup
    # cd backup
    # oc project openshift-storage
    # for d in $(oc get deployment|awk -F' ' '{print $1}'|grep -v NAME); do echo $d;oc get deployment $d -o yaml > oc_get_deployment.${d}.yaml; done
  3. Patch the OSD deployments to remove the livenessProbe parameter, and run it with the command parameter as sleep.

    # for i in $(oc get deployment -l app=rook-ceph-osd -oname);do oc patch ${i} -n openshift-storage --type='json' -p '[{"op":"remove", "path":"/spec/template/spec/containers/0/livenessProbe"}]' ; oc patch ${i} -n openshift-storage -p '{"spec": {"template": {"spec": {"containers": [{"name": "osd", "command": ["sleep", "infinity"], "args": []}]}}}}' ; done
  4. Retrieve the monstore cluster map from all the OSDs.

    1. Create the recover_mon.sh script.

      #!/bin/bash
      ms=/tmp/monstore
      
      rm -rf $ms
      mkdir $ms
      
      for osd_pod in $(oc get po -l app=rook-ceph-osd -oname -n openshift-storage); do
      
        echo "Starting with pod: $osd_pod"
      
        podname=$(echo $osd_pod|sed 's/pod\///g')
        oc exec $osd_pod -- rm -rf $ms
        oc cp $ms $podname:$ms
      
        rm -rf $ms
        mkdir $ms
      
        echo "pod in loop: $osd_pod ; done deleting local dirs"
      
        oc exec $osd_pod -- ceph-objectstore-tool --type bluestore --data-path /var/lib/ceph/osd/ceph-$(oc get $osd_pod -ojsonpath='{ .metadata.labels.ceph_daemon_id }') --op update-mon-db --no-mon-config --mon-store-path $ms
        echo "Done with COT on pod: $osd_pod"
      
        oc cp $podname:$ms $ms
      
        echo "Finished pulling COT data from pod: $osd_pod"
      done
    2. Run the recover_mon.sh script.

      # chmod +x recover_mon.sh
      # ./recover_mon.sh
  5. Patch the MON deployments, and run it with the command parameter as sleep.

    1. Edit the MON deployments.

      # for i in $(oc get deployment -l app=rook-ceph-mon -oname);do oc patch ${i} -n openshift-storage -p '{"spec": {"template": {"spec": {"containers": [{"name": "mon", "command": ["sleep", "infinity"], "args": []}]}}}}'; done
    2. Patch the MON deployments to increase the initialDelaySeconds.

      # oc get deployment rook-ceph-mon-a -o yaml | sed "s/initialDelaySeconds: 10/initialDelaySeconds: 2000/g" | oc replace -f -
      # oc get deployment rook-ceph-mon-b -o yaml | sed "s/initialDelaySeconds: 10/initialDelaySeconds: 2000/g" | oc replace -f -
      # oc get deployment rook-ceph-mon-c -o yaml | sed "s/initialDelaySeconds: 10/initialDelaySeconds: 2000/g" | oc replace -f -
  6. Copy the previously retrieved monstore to the mon-a pod.

    # oc cp /tmp/monstore/ $(oc get po -l app=rook-ceph-mon,mon=a -oname |sed 's/pod\///g'):/tmp/
  7. Navigate into the MON pod and change the ownership of the retrieved monstore.

    # oc rsh $(oc get po -l app=rook-ceph-mon,mon=a -oname)
    # chown -R ceph:ceph /tmp/monstore
  8. Set the appropriate capabilities before rebuilding the MON DB.

    # oc rsh $(oc get po -l app=rook-ceph-mon,mon=a -oname)
    # cp /etc/ceph/keyring-store/keyring /tmp/keyring
    # cat /tmp/keyring
      [mon.]
        key = AQCleqldWqm5IhAAgZQbEzoShkZV42RiQVffnA==
        caps mon = "allow *"
      [client.admin]
        key = AQCmAKld8J05KxAArOWeRAw63gAwwZO5o75ZNQ==
        auid = 0
        caps mds = "allow *"
        caps mgr = "allow *"
        caps mon = "allow *"
        caps osd = "allow *”
  9. Identify the keyring of all the other Ceph daemons (MGR, MDS, RGW, Crash, CSI and CSI provisioners) from its respective secrets.

    # oc get secret rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-keyring -ojson  | jq .data.keyring | xargs echo | base64 -d
    
    [mds.ocs-storagecluster-cephfilesystem-a]
    key = AQB3r8VgAtr6OhAAVhhXpNKqRTuEVdRoxG4uRA==
    caps mon = "allow profile mds"
    caps osd = "allow *"
    caps mds = "allow"

    Example keyring file, /etc/ceph/ceph.client.admin.keyring:

    [mon.]
    	key = AQDxTF1hNgLTNxAAi51cCojs01b4I5E6v2H8Uw==
    	caps mon = "allow " [mds.ocs-storagecluster-cephfilesystem-a] key = AQCKTV1horgjARAA8aF/BDh/4+eG4RCNBCl+aw== caps mds = "allow" caps mon = "allow profile mds" caps osd = "allow *" [mds.ocs-storagecluster-cephfilesystem-b] key = AQCKTV1hN4gKLBAA5emIVq3ncV7AMEM1c1RmGA== caps mds = "allow" caps mon = "allow profile mds" caps osd = "allow *" [client.admin] key = AQDxTF1hpzguOxAA0sS8nN4udoO35OEbt3bqMQ== caps mds = "allow *" caps mgr = "allow *" caps mon = "allow *" caps osd = "allow *" [client.crash] key = AQBOTV1htO1aGRAAe2MPYcGdiAT+Oo4CNPSF1g== caps mgr = "allow rw" caps mon = "allow profile crash" [client.csi-cephfs-node] key = AQBOTV1hiAtuBBAAaPPBVgh1AqZJlDeHWdoFLw== caps mds = "allow rw" caps mgr = "allow rw" caps mon = "allow r" caps osd = "allow rw tag cephfs *="
    [client.csi-cephfs-provisioner]
    	key = AQBNTV1hHu6wMBAAzNXZv36aZJuE1iz7S7GfeQ==
    	caps mgr = "allow rw"
    	caps mon = "allow r"
    	caps osd = "allow rw tag cephfs metadata=*"
    [client.csi-rbd-node]
    	key = AQBNTV1h+LnkIRAAWnpIN9bUAmSHOvJ0EJXHRw==
    	caps mgr = "allow rw"
    	caps mon = "profile rbd"
    	caps osd = "profile rbd"
    [client.csi-rbd-provisioner]
    	key = AQBNTV1hMNcsExAAvA3gHB2qaY33LOdWCvHG/A==
    	caps mgr = "allow rw"
    	caps mon = "profile rbd"
    	caps osd = "profile rbd"
    [mgr.a]
    	key = AQBOTV1hGYOEORAA87471+eIZLZtptfkcHvTRg==
    	caps mds = "allow *"
    	caps mon = "allow profile mgr"
    	caps osd = "allow *"
    Important
    • For client.csi related keyring, add the default caps after fetching the key from its respective OpenShift Container Storage secret.
    • OSD keyring is added automatically post recovery.
  10. Navigate into the mon-a pod.

    Verify that the monstore has monmap.

    # ceph-monstore-tool /tmp/monstore get monmap -- --out /tmp/monmap
    # monmaptool /tmp/monmap --print

    If the monmap is missing then create a new monmap.

    # monmaptool --create --add <mon-a-id> <mon-a-ip> --add <mon-b-id> <mon-b-ip> --add <mon-c-id> <mon-c-ip> --enable-all-features --clobber /root/monmap --fsid <fsid>
    <mon-a-id>
    Is the ID of the mon-a pod
    <mon-a-ip>
    Is the IP address of the mon-a pod
    <mon-b-id>
    Is the ID of the mon-b pod
    <mon-b-ip>
    Is the IP address of the mon-b pod
    <mon-c-id>
    Is the ID of the mon-c pod
    <mon-c-ip>
    Is the IP address of the mon-c pod
    <fsid>
    Is the file system ID
  11. Verify the monmap.

    # monmaptool /root/monmap --print
  12. Import the monmap.

    Important

    Use the previously created keyring file.

    # ceph-monstore-tool /tmp/monstore rebuild -- --keyring /tmp/keyring --monmap /root/monmap
    # chown -R ceph:ceph /tmp/monstore
  13. Create a backup of the old store.db file.

    # mv /var/lib/ceph/mon/ceph-a/store.db /var/lib/ceph/mon/ceph-a/store.db.corrupted
    # mv /var/lib/ceph/mon/ceph-b/store.db /var/lib/ceph/mon/ceph-b/store.db.corrupted
    # mv /var/lib/ceph/mon/ceph-c/store.db /var/lib/ceph/mon/ceph-c/store.db.corrupted
  14. Copy the rebuild store.db file to the monstore directory.

    # mv /tmp/monstore/store.db /var/lib/ceph/mon/ceph-a/store.db
    # chown -R ceph:ceph /var/lib/ceph/mon/ceph-a/store.db
  15. After rebuilding the monstore directory, copy the store.db file from local to the rest of the MON pods.

    # oc cp $(oc get po -l app=rook-ceph-mon,mon=a -oname | sed 's/pod\///g'):/var/lib/ceph/mon/ceph-a/store.db /tmp/store.db
    # oc cp /tmp/store.db $(oc get po -l app=rook-ceph-mon,mon=<id> -oname | sed 's/pod\///g'):/var/lib/ceph/mon/ceph-<id>
    <id>
    Is the ID of the MON pod
  16. Navigate into the rest of the MON pods and change the ownership of the copied monstore.

    # oc rsh $(oc get po -l app=rook-ceph-mon,mon=<id> -oname)
    # chown -R ceph:ceph /var/lib/ceph/mon/ceph-<id>/store.db
    <id>
    Is the ID of the MON pod
  17. Revert the patched changes.

    • For MON deployments:

      # oc replace --force -f <mon-deployment.yaml>
      <mon-deployment.yaml>
      Is the MON deployment yaml file
    • For OSD deployments:

      # oc replace --force -f <osd-deployment.yaml>
      <osd-deployment.yaml>
      Is the OSD deployment yaml file
    • For MGR deployments:

      # oc replace --force -f <mgr-deployment.yaml>
      <mgr-deployment.yaml>

      Is the MGR deployment yaml file

      Important

      Ensure that the MON, MGR and OSD pods are up and running.

  18. Scale up the rook-ceph-operator and ocs-operator deployments.

    # oc -n openshift-storage scale deployment rook-ceph-operator --replicas=1
    # oc -n openshift-storage scale deployment ocs-operator --replicas=1

Restoring the CephFS

Check the Ceph status to confirm that CephFS is running.

# ceph -s

Example output:

cluster:
   id:     f111402f-84d1-4e06-9fdb-c27607676e55
   health: HEALTH_ERR
            1 filesystem is offline
            1 filesystem is online with fewer MDS than max_mds
            3 daemons have recently crashed

   services:
     mon: 3 daemons, quorum b,c,a (age 15m)
     mgr: a(active, since 14m)
     mds: ocs-storagecluster-cephfilesystem:0
     osd: 3 osds: 3 up (since 15m), 3 in (since 2h)

   data:
     pools:   3 pools, 96 pgs
     objects: 500 objects, 1.1 GiB
     usage:   5.5 GiB used, 295 GiB / 300 GiB avail
     pgs:     96 active+clean

If the filesystem is offline or MDS service is missing, perform the following steps:

  1. Scale down the rook-ceph-operator and ocs operator deployments.

    # oc scale deployment rook-ceph-operator --replicas=0 -n openshift-storage
    # oc scale deployment ocs-operator --replicas=0 -n openshift-storage
  2. Patch the MDS deployments to remove the livenessProbe parameter and run it with the command parameter as sleep.

    # for i in $(oc get deployment -l app=rook-ceph-mds -oname);do oc patch ${i} -n openshift-storage --type='json' -p '[{"op":"remove", "path":"/spec/template/spec/containers/0/livenessProbe"}]' ; oc patch ${i} -n openshift-storage -p '{"spec": {"template": {"spec": {"containers": [{"name": "mds", "command": ["sleep", "infinity"], "args": []}]}}}}' ; done
  3. Recover the CephFS.

    # ceph fs reset ocs-storagecluster-cephfilesystem --yes-i-really-mean-it

    If the reset command fails, force create the default filesystem with the data and metadata pools, and then reset it.

    Note

    The reset command might fail if the cephfilesystem is missing.

    # ceph fs new ocs-storagecluster-cephfilesystem ocs-storagecluster-cephfilesystem-metadata ocs-storagecluster-cephfilesystem-data0 --force
    # ceph fs reset ocs-storagecluster-cephfilesystem --yes-i-really-mean-it
  4. Replace the MDS deployments.

    # oc replace --force -f oc_get_deployment.rook-ceph-mds-ocs-storagecluster-cephfilesystem-a.yaml
    # oc replace --force -f oc_get_deployment.rook-ceph-mds-ocs-storagecluster-cephfilesystem-b.yaml
  5. Scale up the rook-ceph-operator and ocs-operator deployments.

    # oc scale deployment rook-ceph-operator --replicas=1 -n openshift-storage
    # oc scale deployment ocs-operator --replicas=1 -n openshift-storage
  6. Check the CephFS status.

    # ceph fs status

    The status should be active.

Important

If the application pods attached to the deployments which were using the CephFS Persistent Volume Claims (PVCs) get stuck in CreateContainerError state post restoring the CephFS, restart the application pods.

# oc -n <namespace> delete pods <cephfs-app-pod>
<namespace>
Is the project namespace
<cephfs-app-pod>
Is the name of the CephFS application pod

Restoring the Multicloud Object Gateway

Post restoring the MON pods, check the Multicloud Object Gateway (MCG) status, it should be active, and the backingstore and bucketclass should be in Ready state. If not, restart all the MCG related pods and check the MCG status to confirm that the MCG is back up and running.

  1. Check the MCG status.

    noobaa status -n openshift-storage
  2. Restart all the pods related to the MCG.

    # oc delete pods <noobaa-operator> -n openshift-storage
    # oc delete pods <noobaa-core> -n openshift-storage
    # oc delete pods <noobaa-endpoint> -n openshift-storage
    # oc delete pods <noobaa-db> -n openshift-storage
    <noobaa-operator>
    Is the name of the MCG operator
    <noobaa-core>
    Is the name of the MCG core pod
    <noobaa-endpoint>
    Is the name of the MCG endpoint
    <noobaa-db>
    Is the name of the MCG db pod
  3. If the RADOS Object Gateway (RGW) is configured, restart the pod.

    # oc delete pods <rgw-pod> -n openshift-storage
    <rgw-pod>
    Is the name of the RGW pod