Undercloud and Control Plane Back Up and Restore
Procedures for backing up and restoring the undercloud and the overcloud control plane during updates and upgrades
Abstract
Chapter 1. Introduction to undercloud and control plane back up and restore
Undercloud and Control Plane Back Up and Restore describes the tasks that are required to back up the state of the Red Hat OpenStack Platform 16.1 undercloud and overcloud controller nodes, also known as control plane nodes, before updates and upgrades. You can use the created backups to restore the undercloud and overcloud control plane nodes to their previous state if an error occurs during an update or upgrade.
1.1. About the ReaR disaster recovery solution
The tasks described in the Undercloud and Control Plane Back Up and Restore guide use the open source Relax and Recover (ReaR) disaster recovery solution that is written in Bash. You can use ReaR to create bootable images of the latest state of the undercloud or control plane nodes, or to back up specific files.
ReaR supports the following boot media formats:
- ISO
- USB
- eSATA
- PXE
The examples in this document were tested using the ISO
bootable files format.
ReaR can use the following protocols to transport files:
- HTTP/HTTPS
- SSH/SCP
- FTP/SFTP
- NFS
- CIFS (SMB)
For the purposes of backing up and restoring the Red Hat OpenStack Platform 16.1 undercloud and overcloud control plane nodes, the examples in this document were tested using NFS.
1.2. ReaR backup management options
You can use ReaR with internal and external backup management options.
Internal backup management
You can use ReaR with the following internal backup options:
-
tar
-
rsync
External backup management
External backup management options include open source and proprietary solutions. You can use ReaR with the following open source solutions:
- Bacula
- Bareos
You can use ReaR with the following proprietary solutions:
- EMC NetWorker (Legato)
- HP DataProtector
- IBM Tivoli Storage Manager (TSM)
- Symantec NetBackup
Chapter 2. Configuring the backup node
Before you can create a backup of the undercloud or control plane nodes, you must configure the backup node. You can install and configure an NFS server on the backup node using the backup-and-restore
Ansible role.
Procedure
On the undercloud node, source the undercloud credentials:
[stack@undercloud-0 ~]$ source stackrc (undercloud) [stack@undercloud ~]$
On the undercloud node, create an inventory file for the backup node and replace the
<IP_ADDRESS>
and<USER>
with the values that apply to your environment:(undercloud) [stack@undercloud ~]$ cat <<'EOF'> ~/nfs-inventory.yaml [BACKUP_NODE] serverX ansible_host=<IP_ADDRESS> ansible_user=<USER> EOF
On the undercloud node, create the following Ansible playbook and replace
<BACKUP_NODE>
with the host name of the backup node:(undercloud) [stack@undercloud ~]$ cat <<'EOF' > ~/bar_nfs_setup.yaml # Playbook # Substitute <BACKUP_NODE> with the host name of your backup node. - become: true hosts: <BACKUP_NODE> name: Setup NFS server for ReaR roles: - role: backup-and-restore EOF
On the undercloud node, enter the following
ansible-playbook
command, to configure the backup node:(undercloud) [stack@undercloud ~]$ ansible-playbook \ -v -i ~/nfs-inventory.yaml \ --extra="ansible_ssh_common_args='-o StrictHostKeyChecking=no'" \ --become \ --become-user root \ --tags bar_setup_nfs_server \ ~/bar_nfs_setup.yaml
Chapter 3. Installing ReaR on the undercloud and control plane nodes
Before creating a backup of the undercloud and control plane nodes, you must install the Relax and Recover (ReaR) packages on the undercloud node and on each of the controller nodes.
To install ReaR using the backup-and-restore
Ansible role, complete the following procedures:
3.1. Installing ReaR on the undercloud node
To create a backup of the undercloud node, you must install and configure Relax and Recover (ReaR) on the undercloud.
Prerequisites
- You have configured the backup node. For more information, see Configuring the backup node.
Procedure
On the undercloud node, source the undercloud credentials and use the
tripleo-ansible-inventory
command to generate a static inventory file that contains hosts and variables for all the overcloud nodes:[stack@undercloud-0 ~]$ source stackrc (undercloud) [stack@undercloud ~]$ tripleo-ansible-inventory \ --ansible_ssh_user heat-admin \ --static-yaml-inventory /home/stack/tripleo-inventory.yaml
On the undercloud node, create the following Ansible playbook:
(undercloud) [stack@undercloud ~]$ cat <<'EOF' > ~/bar_rear_setup-undercloud.yaml # Playbook # Installing and configuring ReaR on the undercloud node - become: true hosts: undercloud name: Install ReaR roles: - role: backup-and-restore EOF
On the undercloud node, enter the following
ansible-playbook
command to install Rear:(undercloud) [stack@undercloud ~]$ ansible-playbook \ -v -i ~/tripleo-inventory.yaml \ --extra="ansible_ssh_common_args='-o StrictHostKeyChecking=no'" \ --become \ --become-user root \ --tags bar_setup_rear \ ~/bar_rear_setup-undercloud.yaml
3.2. Installing ReaR on the control plane nodes
To create a backup of the overcloud control plane, you must install and configure Relax and Recover (ReaR) on each of the control plane nodes.
Prerequisites
- You have configured the backup node. For more information, see Configuring the backup node.
Procedure
On the undercloud node, create the following Ansible playbook:
(undercloud) [stack@undercloud ~]$ cat <<'EOF' > ~/bar_rear_setup-controller.yaml # Playbook # Install and configuring ReaR on the control plane nodes - become: true hosts: Controller name: Install ReaR roles: - role: backup-and-restore EOF
On the undercloud node, enter the following
ansible-playbook
command to install Rear on the control plane nodes:(undercloud) [stack@undercloud ~]$ ansible-playbook \ -v -i ~/tripleo-inventory.yaml \ -e tripleo_backup_and_restore_exclude_paths_controller_non_bootrapnode=false \ --extra="ansible_ssh_common_args='-o StrictHostKeyChecking=no'" \ --become \ --become-user root \ --tags bar_setup_rear \ ~/bar_rear_setup-controller.yaml
Chapter 4. Creating a backup of the undercloud and control plane nodes
To create a backup of the undercloud and control plane nodes using the backup-and-restore
Ansible role, complete the following procedures:
4.1. Creating a backup of the undercloud node
You can use the backup-and-restore
Ansible role to create a backup of the undercloud node.
Prerequisites
- You have configured the backup node. For more information, see Configuring the backup node.
- You have installed ReaR on the undercloud node. For more information, see Installing ReaR on the undercloud node.
Procedure
On the undercloud node, create the following Ansible playbook:
(undercloud) [stack@undercloud ~]$ cat <<'EOF' > ~/bar_rear_create_restore_images-undercloud.yaml # Playbook # Using ReaR on the undercloud node. - become: true hosts: undercloud name: Create the recovery images for the undercloud roles: - role: backup-and-restore EOF
To create a backup of the undercloud node, enter the following
ansible-playbook
command:(undercloud) [stack@undercloud ~]$ ansible-playbook \ -v -i ~/tripleo-inventory.yaml \ --extra="ansible_ssh_common_args=-o StrictHostKeyChecking=no" \ --become \ --become-user root \ --tags bar_create_recover_image \ ~/bar_rear_create_restore_images-undercloud.yaml
4.2. Creating a backup of the control plane nodes
You can use the backup-and-restore
Ansible role to create a backup of the control plane nodes.
Prerequisites
- You have configured the backup node. For more information, see Configuring the backup node.
- You have installed ReaR on the control plane nodes. For more information, see Installing ReaR on the control plane nodes.
Procedure
On the undercloud node, create the following Ansible playbook:
(undercloud) [stack@undercloud ~]$ cat <<'EOF' > ~/bar_rear_create_restore_images-controller.yaml # Playbook # Using ReaR on the control plane nodes. - become: true hosts: ceph_mon name: Backup ceph authentication tasks: - name: Backup ceph authentication role include_role: name: backup-and-restore tasks_from: ceph_authentication tags: - bar_create_recover_image - become: true hosts: Controller name: Create the recovery images for the control plane roles: - role: backup-and-restore EOF
On the undercloud node, enter the following
ansible-playbook
command, to create a backup of the control plane nodes:ImportantDo not operate the stack. When you stop the pacemaker cluster and the containers, this results in the temporary interruption of control plane services to Compute nodes. There is also disruption to network connectivity, Ceph, and the NFS data plane service. You cannot create instances, migrate instances, authenticate requests, or monitor the health of the cluster until the pacemaker cluster and the containers return to service following the final step of this procedure.
(undercloud) [stack@undercloud ~]$ ansible-playbook \ -v -i ~/tripleo-inventory.yaml \ --extra="ansible_ssh_common_args='-o StrictHostKeyChecking=no'" \ --become \ --become-user root \ --tags bar_create_recover_image \ ~/bar_rear_create_restore_images-controller.yaml
Chapter 5. Restoring the undercloud and control plane nodes
If an error occurs during an update or upgrade, you can restore either the undercloud or overcloud control plane nodes, or both to their previous state using backups.
To restore the undercloud and control plane nodes using backups, complete the following procedures:
5.1. Restoring the undercloud node
If an error occurs during an update or upgrade, you can restore the undercloud node to its previous state using the backup ISO image that you created using ReaR. You can find the backup ISO images on the backup node. Burn the bootable ISO image to a DVD or download it to the undercloud node through Integrated Lights-Out (iLO) remote access.
Prerequisites
- You have created a backup of the undercloud node using ReaR. For more information, see Creating a backup of the undercloud node.
- You have access to the backup node.
Procedure
- Power off the undercloud node. Ensure that the undercloud node is powered off completely before you proceed.
- Boot the undercloud node with the backup ISO image.
-
When the
Relax-and-Recover
boot menu displays, selectRecover <UNDERCLOUD_NODE>
where<UNDERCLOUD_NODE>
is the name of your undercloud node. Log in as the
root
user and restore the node:The following message displays:
Welcome to Relax-and-Recover. Run "rear recover" to restore your system! RESCUE <UNDERCLOUD_NODE>:~ # rear recover
When the undercloud node restoration process completes, the console displays the following message:
Finished recovering your system Exiting rear recover Running exit tasks
When the command line interface is available, power off the node:
RESCUE <UNDERCLOUD_NODE>:~ # poweroff
On boot up, the node resumes its previous state.
5.2. Restoring the control plane nodes
If an error occurs during an update or upgrade, you can restore the control plane nodes to their previous state using the backup ISO image that you have created using ReaR.
To restore the control plane, you must restore all control plane nodes to ensure state consistency.
You can find the backup ISO images on the backup node. Burn the bootable ISO image to a DVD or download it to the undercloud node through Integrated Lights-Out (iLO) remote access.
Red Hat supports backups of Red Hat OpenStack Platform with native SDNs, such as Open vSwitch (OVS) and the default Open Virtual Network (OVN). For information about third-party SDNs, refer to the third-party SDN documentation.
Prerequisites
- You have created a backup of the control plane nodes using ReaR. For more information, see Creating a backup of the control plane nodes.
- You have access to the backup node.
Procedure
- Power off each control plane node. Ensure that the control plane nodes are powered off completely before you proceed.
- Boot each control plane node with the corresponding backup ISO image.
-
When the
Relax-and-Recover
boot menu displays, on each control plane node, selectRecover <CONTROL PLANE NODE>
. Replace<CONTROL PLANE NODE>
with the name of the corresponding control plane node. On each control plane node, log in as the
root
user and restore the node:The following message displays:
Welcome to Relax-and-Recover. Run "rear recover" to restore your system! RESCUE <CONTROL PLANE NODE>:~ # rear recover
When the control plane node restoration process completes, the console displays the following message:
Finished recovering your system Exiting rear recover Running exit tasks
When the command line interface is available on each control plane node, power off the node:
RESCUE <CONTROL PLANE NODE>:~ # poweroff
- Set the boot sequence to the normal boot device. On boot up, the node resumes its previous state.
To ensure that the services are running correctly, check the status of pacemaker. Log in to a Controller node as the
root
user and enter the following command:# pcs status
- To view the status of the overcloud, use Tempest. For more information about Tempest, see Chapter 4 of the OpenStack Integration Test Suite Guide.
Chapter 6. Scheduling control plane node backups with cron
This feature is available in this release as a Technology Preview, and therefore is not fully supported by Red Hat. It should only be used for testing, and should not be deployed in a production environment. For more information about Technology Preview features, see Scope of Coverage Details.
You can configure a cron job to create backups of the control plane nodes with ReaR using the Ansible backup-and-restore
role. You can view the logs in the /var/log/rear-cron
directory.
Prerequisites
- You have installed ReaR on the undercloud and control plane nodes. For more information, see Installing ReaR on the undercloud and control plane nodes.
- You have configured the backup node. For more information, see Configuring the backup node.
- You have sufficient available disk space at your backup location to store the backup.
Procedure
On the undercloud node, enter the following command to create the backup script:
[stack@undercloud ~]$ cat <<'EOF' > /home/stack/execute-rear-cron.sh #!/bin/bash OWNER="stack" TODAY=`date +%Y%m%d` FILE="/var/log/rear-cron.${TODAY}" sudo touch ${FILE} sudo chown ${OWNER}:${OWNER} ${FILE} CURRENTTIME=`date` echo "[$CURRENTTIME] rear start" >> ${FILE} /usr/bin/ansible-playbook -v -i /home/stack/tripleo-inventory.yaml --extra="ansible_ssh_common_args='-o StrictHostKeyChecking=no'" --become --become-user root --tags bar_create_recover_image --extra="tripleo_backup_and_restore_service_manager=false" /home/stack/bar_rear_create_restore_images.yaml 2>&1 >> ${FILE} CURRENTTIME=`date` echo "[$CURRENTTIME] rear end" >> ${FILE} EOF
Set executable privileges for the
/home/stack/execute-rear-cron.sh
script:[stack@undercloud ~]$ chmod 755 /home/stack/execute-rear-cron.sh
Edit the crontab file with the
crontab -e
command and use an editor of your choice to add the following cron job. Ensure you save the changes to the file:[stack@undercloud ~]# $ crontab -e #adding the following line 0 0 * * * /home/stack/execute-rear-cron.sh
The
/home/stack/execute-rear-cron.sh
script is scheduled to be executed by the stack user at midnight.To verify that the cron job is scheduled, enter the following command:
[stack@undercloud ~]$ crontab -l
The command output displays the scheduled cron jobs:
0 0 * * * /home/stack/execute-rear-cron.sh
Chapter 7. Backing up and restoring the undercloud and control plane nodes with collocated Ceph monitors
If an error occurs during an update or upgrade, you can use ReaR backups to restore either the undercloud or overcloud control plane nodes, or both, to their previous state.
Prerequisites
- Install ReaR on the undercloud and control plane nodes. For more information, see Installing ReaR on the undercloud and control plane nodes.
- Configure the backup node. For more information, see Configuring the backup node.
- Create a backup of the undercloud and control plane nodes. For more information, see Creating a backup of the undercloud and control plane nodes.
Procedure
On the backup node, export the NFS directory to host the Ceph backups. Replace
<IP_ADDRESS/24>
with the IP address and subnet mask of the network:[root@backup ~]# cat >> /etc/exports << EOF /ceph_backups <IP_ADDRESS/24>(rw,sync,no_root_squash,no_subtree_check) EOF
On the undercloud node, source the undercloud credentials and run the following script:
# source stackrc
#! /bin/bash for i in `openstack server list -c Name -c Networks -f value | grep controller | awk -F'=' '{print $2}' | awk -F' ' '{print $1}'`; do ssh -q heat-admin@$i 'sudo systemctl stop ceph-mon@$(hostname -s) ceph-mgr@$(hostname -s)'; done
To verify that the
ceph-mgr@controller.service
container has stopped, enter the following command:[heat-admin@overcloud-controller-x ~]# sudo podman ps | grep ceph
On the undercloud node, source the undercloud credentials and run the following script. Replace
<BACKUP_NODE_IP_ADDRESS>
with the IP address of the backup node:# source stackrc
#! /bin/bash for i in `openstack server list -c Name -c Networks -f value | grep controller | awk -F'=' '{print $2}' | awk -F' ' '{print $1}'`; do ssh -q heat-admin@$i 'sudo mkdir /ceph_backups'; done #! /bin/bash for i in `openstack server list -c Name -c Networks -f value | grep controller | awk -F'=' '{print $2}' | awk -F' ' '{print $1}'`; do ssh -q heat-admin@$i 'sudo mount -t nfs <BACKUP_NODE_IP_ADDRESS>:/ceph_backups /ceph_backups'; done #! /bin/bash for i in `openstack server list -c Name -c Networks -f value | grep controller | awk -F'=' '{print $2}' | awk -F' ' '{print $1}'`; do ssh -q heat-admin@$i 'sudo mkdir /ceph_backups/$(hostname -s)'; done #! /bin/bash for i in `openstack server list -c Name -c Networks -f value | grep controller | awk -F'=' '{print $2}' | awk -F' ' '{print $1}'`; do ssh -q heat-admin@$i 'sudo tar -zcv --xattrs-include=*.* --xattrs --xattrs-include=security.capability --xattrs-include=security.selinux --acls -f /ceph_backups/$(hostname -s)/$(hostname -s).tar.gz /var/lib/ceph'; done
On the node that you want to restore, complete the following tasks:
- Power off the node before you proceed.
-
Restore the node with the ReaR backup file that you have created during the backup process. The file is located in the
/ceph_backups
directory of the backup node. -
From the
Relax-and-Recover
boot menu, selectRecover <CONTROL_PLANE_NODE>
, where<CONTROL_PLANE_NODE>
is the name of the control plane node. At the prompt, enter the following command:
RESCUE <CONTROL_PLANE_NODE> :~ # rear recover
When the image restoration process completes, the console displays the following message:
Finished recovering your system Exiting rear recover Running exit tasks
For the node that you want to restore, copy the Ceph backup from the
/ceph_backups
directory into the/var/lib/ceph
directory:Identify the system mount points:
RESCUE <CONTROL_PLANE_NODE>:~# df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 16G 0 16G 0% /dev tmpfs 16G 0 16G 0% /dev/shm tmpfs 16G 8.4M 16G 1% /run tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/vda2 30G 13G 18G 41% /mnt/local
The
/dev/vda2
file system is mounted on/mnt/local
.Create a temporary directory:
RESCUE <CONTROL_PLANE_NODE>:~ # mkdir /tmp/restore RESCUE <CONTROL_PLANE_NODE>:~ # mount -v -t nfs -o rw,noatime <BACKUP_NODE_IP_ADDRESS>:/ceph_backups /tmp/restore/
On the control plane node, remove the existing
/var/lib/ceph
directory:RESCUE <CONTROL_PLANE_NODE>:~ # rm -rf /mnt/local/var/lib/ceph/*
Restore the previous Ceph maps. Replace
<CONTROL_PLANE_NODE>
with the name of your control plane node:RESCUE <CONTROL_PLANE_NODE>:~ # tar -xvC /mnt/local/ -f /tmp/restore/<CONTROL_PLANE_NODE>/<CONTROL_PLANE_NODE>.tar.gz --xattrs --xattrs-include='*.*' var/lib/ceph
Verify that the files are restored:
RESCUE <CONTROL_PLANE_NODE>:~ # ls -l total 0 drwxr-xr-x 2 root 107 26 Jun 18 18:52 bootstrap-mds drwxr-xr-x 2 root 107 26 Jun 18 18:52 bootstrap-osd drwxr-xr-x 2 root 107 26 Jun 18 18:52 bootstrap-rbd drwxr-xr-x 2 root 107 26 Jun 18 18:52 bootstrap-rgw drwxr-xr-x 3 root 107 31 Jun 18 18:52 mds drwxr-xr-x 3 root 107 31 Jun 18 18:52 mgr drwxr-xr-x 3 root 107 31 Jun 18 18:52 mon drwxr-xr-x 2 root 107 6 Jun 18 18:52 osd drwxr-xr-x 3 root 107 35 Jun 18 18:52 radosgw drwxr-xr-x 2 root 107 6 Jun 18 18:52 tmp
Power off the node:
RESCUE <CONTROL_PLANE_NODE> :~ # poweroff
- Power on the node. The node resumes its previous state.