Chapter 6. Backing up and restoring the undercloud and control plane nodes with collocated Ceph monitors
If an error occurs during an update or upgrade, you can use ReaR backups to restore either the undercloud or overcloud control plane nodes, or both, to their previous state.
Prerequisites
- Install and configure ReaR. For more information, see Install and configure ReaR.
- Prepare the backup node. For more information, see Prepare the backup node.
- Execute the backup procedure. For more information, see Execute the backup procedure.
Procedure
On the backup node, export the NFS directory to host the Ceph backups. Replace
<IP_ADDRESS/24>
with the IP address and subnet mask of the network:[root@backup ~]# cat >> /etc/exports << EOF /ceph_backups <IP_ADDRESS/24>(rw,sync,no_root_squash,no_subtree_check) EOF
On the undercloud node, source the undercloud credentials and run the following script:
# source stackrc
#! /bin/bash for i in `openstack server list -c Name -c Networks -f value | grep controller | awk -F'=' '{print $2}' | awk -F' ' '{print $1}'`; do ssh -q heat-admin@$i 'sudo systemctl stop ceph-mon@$(hostname -s) ceph-mgr@$(hostname -s)'; done
To verify that the
ceph-mgr@controller.service
container has stopped, enter the following command:[heat-admin@overcloud-controller-x ~]# sudo podman ps | grep ceph
On the undercloud node, source the undercloud credentials and run the following script. Replace
<BACKUP_NODE_IP_ADDRESS>
with the IP address of the backup node:# source stackrc
#! /bin/bash for i in `openstack server list -c Name -c Networks -f value | grep controller | awk -F'=' '{print $2}' | awk -F' ' '{print $1}'`; do ssh -q heat-admin@$i 'sudo mkdir /ceph_backups'; done #! /bin/bash for i in `openstack server list -c Name -c Networks -f value | grep controller | awk -F'=' '{print $2}' | awk -F' ' '{print $1}'`; do ssh -q heat-admin@$i 'sudo mount -t nfs <BACKUP_NODE_IP_ADDRESS>:/ceph_backups /ceph_backups'; done #! /bin/bash for i in `openstack server list -c Name -c Networks -f value | grep controller | awk -F'=' '{print $2}' | awk -F' ' '{print $1}'`; do ssh -q heat-admin@$i 'sudo mkdir /ceph_backups/$(hostname -s)'; done #! /bin/bash for i in `openstack server list -c Name -c Networks -f value | grep controller | awk -F'=' '{print $2}' | awk -F' ' '{print $1}'`; do ssh -q heat-admin@$i 'sudo tar -zcv --xattrs-include=*.* --xattrs --xattrs-include=security.capability --xattrs-include=security.selinux --acls -f /ceph_backups/$(hostname -s)/$(hostname -s).tar.gz /var/lib/ceph'; done
On the node that you want to restore, complete the following tasks:
- Power off the node before you proceed.
-
Restore the node with the ReaR backup file that you have created during the backup process. The file is located in the
/ceph_backups
directory of the backup node. -
From the
Relax-and-Recover
boot menu, selectRecover <CONTROL_PLANE_NODE>
, where<CONTROL_PLANE_NODE>
is the name of the control plane node. At the prompt, enter the following command:
RESCUE <CONTROL_PLANE_NODE> :~ # rear recover
When the image restoration process completes, the console displays the following message:
Finished recovering your system Exiting rear recover Running exit tasks
For the node that you want to restore, copy the Ceph backup from the
/ceph_backups
directory into the/var/lib/ceph
directory:Identify the system mount points:
RESCUE <CONTROL_PLANE_NODE>:~# df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 16G 0 16G 0% /dev tmpfs 16G 0 16G 0% /dev/shm tmpfs 16G 8.4M 16G 1% /run tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/vda2 30G 13G 18G 41% /mnt/local
The
/dev/vda2
file system is mounted on/mnt/local
.Create a temporary directory:
RESCUE <CONTROL_PLANE_NODE>:~ # mkdir /tmp/restore RESCUE <CONTROL_PLANE_NODE>:~ # mount -v -t nfs -o rw,noatime <BACKUP_NODE_IP_ADDRESS>:/ceph_backups /tmp/restore/
On the control plane node, remove the existing
/var/lib/ceph
directory:RESCUE <CONTROL_PLANE_NODE>:~ # rm -rf /mnt/local/var/lib/ceph/*
Restore the previous Ceph maps. Replace
<CONTROL_PLANE_NODE>
with the name of your control plane node:RESCUE <CONTROL_PLANE_NODE>:~ # tar -xvC /mnt/local/ -f /tmp/restore/<CONTROL_PLANE_NODE>/<CONTROL_PLANE_NODE>.tar.gz --xattrs --xattrs-include='*.*' var/lib/ceph
Verify that the files are restored:
RESCUE <CONTROL_PLANE_NODE>:~ # ls -l total 0 drwxr-xr-x 2 root 107 26 Jun 18 18:52 bootstrap-mds drwxr-xr-x 2 root 107 26 Jun 18 18:52 bootstrap-osd drwxr-xr-x 2 root 107 26 Jun 18 18:52 bootstrap-rbd drwxr-xr-x 2 root 107 26 Jun 18 18:52 bootstrap-rgw drwxr-xr-x 3 root 107 31 Jun 18 18:52 mds drwxr-xr-x 3 root 107 31 Jun 18 18:52 mgr drwxr-xr-x 3 root 107 31 Jun 18 18:52 mon drwxr-xr-x 2 root 107 6 Jun 18 18:52 osd drwxr-xr-x 3 root 107 35 Jun 18 18:52 radosgw drwxr-xr-x 2 root 107 6 Jun 18 18:52 tmp
Power off the node:
RESCUE <CONTROL_PLANE_NODE> :~ # poweroff
- Power on the node. The node resumes its previous state.