Chapter 6. Backing up and restoring the undercloud and control plane nodes with collocated Ceph monitors

If an error occurs during an update or upgrade, you can use ReaR backups to restore either the undercloud or overcloud control plane nodes, or both, to their previous state.

Prerequisites

Procedure

  1. On the backup node, export the NFS directory to host the Ceph backups. Replace <IP_ADDRESS/24> with the IP address and subnet mask of the network:

    [root@backup ~]# cat >> /etc/exports << EOF
    /ceph_backups <IP_ADDRESS/24>(rw,sync,no_root_squash,no_subtree_check)
    EOF
  2. On the undercloud node, source the undercloud credentials and run the following script:

    # source stackrc
    #! /bin/bash
    for i in `openstack server list -c Name -c Networks -f value | grep controller | awk -F'=' '{print $2}' | awk -F' ' '{print $1}'`; do ssh -q heat-admin@$i 'sudo systemctl stop ceph-mon@$(hostname -s) ceph-mgr@$(hostname -s)'; done

    To verify that the ceph-mgr@controller.service container has stopped, enter the following command:

    [heat-admin@overcloud-controller-x ~]# sudo podman ps | grep ceph
  3. On the undercloud node, source the undercloud credentials and run the following script. Replace <BACKUP_NODE_IP_ADDRESS> with the IP address of the backup node:

    # source stackrc
    #! /bin/bash
    for i in `openstack server list -c Name -c Networks -f value | grep controller | awk -F'=' '{print $2}' | awk -F' ' '{print $1}'`; do ssh -q heat-admin@$i 'sudo mkdir /ceph_backups'; done
    
    #! /bin/bash
    for i in `openstack server list -c Name -c Networks -f value | grep controller | awk -F'=' '{print $2}' | awk -F' ' '{print $1}'`; do ssh -q heat-admin@$i 'sudo mount -t nfs  <BACKUP_NODE_IP_ADDRESS>:/ceph_backups /ceph_backups'; done
    
    #! /bin/bash
    for i in `openstack server list -c Name -c Networks -f value | grep controller | awk -F'=' '{print $2}' | awk -F' ' '{print $1}'`; do ssh -q heat-admin@$i 'sudo mkdir /ceph_backups/$(hostname -s)'; done
    
    #! /bin/bash
    for i in `openstack server list -c Name -c Networks -f value | grep controller | awk -F'=' '{print $2}' | awk -F' ' '{print $1}'`; do ssh -q heat-admin@$i 'sudo tar -zcv --xattrs-include=*.* --xattrs  --xattrs-include=security.capability --xattrs-include=security.selinux --acls -f /ceph_backups/$(hostname -s)/$(hostname -s).tar.gz  /var/lib/ceph'; done
  4. On the node that you want to restore, complete the following tasks:

    1. Power off the node before you proceed.
    2. Restore the node with the ReaR backup file that you have created during the backup process. The file is located in the /ceph_backups directory of the backup node.
    3. From the Relax-and-Recover boot menu, select Recover <CONTROL_PLANE_NODE>, where <CONTROL_PLANE_NODE> is the name of the control plane node.
    4. At the prompt, enter the following command:

      RESCUE <CONTROL_PLANE_NODE> :~ # rear recover

      When the image restoration process completes, the console displays the following message:

    Finished recovering your system
    Exiting rear recover
    Running exit tasks
  5. For the node that you want to restore, copy the Ceph backup from the /ceph_backups directory into the /var/lib/ceph directory:

    1. Identify the system mount points:

      RESCUE <CONTROL_PLANE_NODE>:~# df -h
      Filesystem      Size  Used Avail Use% Mounted on
      devtmpfs         16G     0   16G   0% /dev
      tmpfs            16G     0   16G   0% /dev/shm
      tmpfs            16G  8.4M   16G   1% /run
      tmpfs            16G     0   16G   0% /sys/fs/cgroup
      /dev/vda2        30G   13G   18G  41% /mnt/local

      The /dev/vda2 file system is mounted on /mnt/local.

    2. Create a temporary directory:

      RESCUE <CONTROL_PLANE_NODE>:~ # mkdir /tmp/restore
      RESCUE <CONTROL_PLANE_NODE>:~ # mount -v -t nfs -o rw,noatime <BACKUP_NODE_IP_ADDRESS>:/ceph_backups /tmp/restore/
    3. On the control plane node, remove the existing /var/lib/ceph directory:

      RESCUE <CONTROL_PLANE_NODE>:~ # rm -rf /mnt/local/var/lib/ceph/*
    4. Restore the previous Ceph maps. Replace <CONTROL_PLANE_NODE> with the name of your control plane node:

      RESCUE <CONTROL_PLANE_NODE>:~ # tar -xvC /mnt/local/ -f /tmp/restore/<CONTROL_PLANE_NODE>/<CONTROL_PLANE_NODE>.tar.gz --xattrs --xattrs-include='*.*' var/lib/ceph
    5. Verify that the files are restored:

      RESCUE <CONTROL_PLANE_NODE>:~ # ls -l
      total 0
      drwxr-xr-x 2 root 107 26 Jun 18 18:52 bootstrap-mds
      drwxr-xr-x 2 root 107 26 Jun 18 18:52 bootstrap-osd
      drwxr-xr-x 2 root 107 26 Jun 18 18:52 bootstrap-rbd
      drwxr-xr-x 2 root 107 26 Jun 18 18:52 bootstrap-rgw
      drwxr-xr-x 3 root 107 31 Jun 18 18:52 mds
      drwxr-xr-x 3 root 107 31 Jun 18 18:52 mgr
      drwxr-xr-x 3 root 107 31 Jun 18 18:52 mon
      drwxr-xr-x 2 root 107  6 Jun 18 18:52 osd
      drwxr-xr-x 3 root 107 35 Jun 18 18:52 radosgw
      drwxr-xr-x 2 root 107  6 Jun 18 18:52 tmp
  6. Power off the node:

    RESCUE <CONTROL_PLANE_NODE> :~ #  poweroff
  7. Power on the node. The node resumes its previous state.