Appendix B. Restoring the overcloud
B.1. Restoring the overcloud control plane services
This procedure is currently under assessment by Red Hat due to some known issues.
The following procedure restores backups of the overcloud databases and configuration. In this situation, it is recommended to open three terminal windows so that you can perform certain operations simultaneously on all three Controller nodes. It is also recommended to select a Controller node to perform high availability operations. This procedure refers to this Controller node as the bootstrap Controller node.
This procedure only restores control plane services. It does not include restore Compute node workloads nor data on Ceph Storage nodes.
Procedure
If you are restoring from a failed major version upgrade, you might need to reverse any
yumtransactions that occurred on all nodes. This involves the following on each node:Enable the repositories for previous versions. For example:
# sudo subscription-manager repos --enable=rhel-7-server-openstack-10-rpms # sudo subscription-manager repos --enable=rhel-7-server-openstack-11-rpms # sudo subscription-manager repos --enable=rhel-7-server-openstack-12-rpms
Check the
yumhistory:# sudo yum history list all
Identify transactions that occurred during the upgrade process. Most of these operations will have occurred on one of the Controller nodes (the Controller node selected as the bootstrap node during the upgrade). If you need to view a particular transaction, view it with the
history infosubcommand:# sudo yum history info 25
NoteTo force
yum history list allto display the command ran from each transaction, sethistory_list_view=commandsin youryum.conffile.Revert any
yumtransactions that occurred since the upgrade. For example:# sudo yum history undo 25 # sudo yum history undo 24 # sudo yum history undo 23 ...
Make sure to start from the last transaction and continue in descending order. You can also revert multiple transactions in one execution using the
rollbackoption. For example, the following command rolls back transaction from the last transaction to 23:# sudo yum history rollback 23
ImportantIt is recommended to use
undofor each transaction instead ofrollbackso that you can verify the reversal of each transaction.Once the relevant
yumtransaction have reversed, enable only the original OpenStack Platform repository on all nodes. For example:# sudo subscription-manager repos --disable=rhel-7-server-openstack-*-rpms # sudo subscription-manager repos --enable=rhel-7-server-openstack-10-rpms
Restore the database:
- Copy the database backups to the bootstrap Controller node.
Stop connections to the database port on all Controller nodes:
# MYSQLIP=$(hiera mysql_bind_host) # sudo /sbin/iptables -I INPUT -d $MYSQLIP -p tcp --dport 3306 -j DROP
This isolates all the database traffic to the nodes.
On the bootstrap Controller node, disable pacemaker management of Galera:
# pcs resource unmanage galera
Comment out the
wsrep_cluster_addressparameter from/etc/my.cnf.d/galera.cnfon all Controller nodes.# grep wsrep_cluster_address /etc/my.cnf.d/galera.cnf # vi /etc/my.cnf.d/galera.cnf
Stop the MariaDB database on all the Controller nodes:
# mysqladmin -u root shutdown
NoteYou might get a warning from HAProxy that the database is disabled.
Move existing MariaDB data directories and prepare new data directories on all Controller nodes,
# mv /var/lib/mysql/ /var/lib/mysql.old # mkdir /var/lib/mysql # chown mysql:mysql /var/lib/mysql # chmod 0755 /var/lib/mysql # mysql_install_db --datadir=/var/lib/mysql --user=mysql # chown -R mysql:mysql /var/lib/mysql/ # restorecon -R /var/lib/mysql
Move the root configuration and cluster check to a backup file on all Controller nodes:
# sudo mv /root/.my.cnf /root/.my.cnf.old # sudo mv /etc/sysconfig/clustercheck /etc/sysconfig/clustercheck.old
On the bootstrap Controller node, set Pacemaker to manage the Galera cluster:
# pcs resource manage galera # pcs resource cleanup galera
Wait for the Galera cluster to come up properly. Run the following command to see if all nodes are set as masters:
# watch "pcs status | grep -C3 galera" Master/Slave Set: galera-master [galera] Masters: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
If the cleanup does not show all controller nodes as masters, run the cleanup command again:
# pcs resource cleanup galera
On the bootstrap Controller node, restore the OpenStack database. This will be replicated to the other Controller nodes by Galera:
# mysql -u root < openstack_database.sql
On the bootstrap controller node, restore the users and permissions:
# mysql -u root < grants.sql
On the bootstrap controller node, reset the database password to its original password:
# /usr/bin/mysqladmin -u root password "$(hiera mysql::server::root_password)"
On the bootstrap controller node, run
pcs statusto show the Galera resource:# pcs status | grep -C3 galera
The command might report an error because the database is now using the wrong username and password to connect and poll the database status. On all Controller nodes, restore the database configuration:
# sudo mv /root/.my.cnf.old /root/.my.cnf # sudo mv /etc/sysconfig/clustercheck.old /etc/sysconfig/clustercheck
Test the cluster check locally for each Controller node:
# /bin/clustercheck
On the bootstrap controller node, perform a cleanup in Pacemaker to reprobe the state of Galera:
# pcs resource cleanup galera
Test cluster check on each controller node:
# curl overcloud-controller-0:9200 # curl overcloud-controller-1:9200 # curl overcloud-controller-2:9200
Remove the firewall rule from each node for the services to restore access to the database:
# sudo /sbin/iptables -D INPUT -d $MYSQLIP -p tcp --dport 3306 -j DROP
Uncomment the
wsrep_cluster_addressparameter in/etc/my.cnf.d/galera.cnfon all Controller nodes:# vi /etc/my.cnf.d/galera.cnf
Restore the filesystem:
Copy the backup
tarfile for each Controller node to a temporary directory and uncompress all the data:# mkdir /var/tmp/filesystem_backup/data/ # cd /var/tmp/filesystem_backup/data/ # mv <backup_file>.tar.gz . # tar -xvzf --xattrs <backup_file>.tar.gz
NoteDo not extract directly to the
/directory. This overrides your current filesystem. It is recommended to extract the file in a temporary directory.Restore the
/usr/libexec/os-apply-config/templates/etc/os-net-config/config.jsonfile:$ cp /var/tmp/filesystem_backup/data/usr/libexec/os-apply-config/templates/etc/os-net-config/config.json /usr/libexec/os-apply-config/templates/etc/os-net-config/config.json
- Retain this directory in case you need any configuration files.
Cleanup the redis resource:
# pcs resource cleanup redis
After restoring the overcloud control plane data, check each relevant service is enabled and running correctly:
For high availability services on controller nodes:
# pcs resource enable [SERVICE] # pcs resource cleanup [SERVICE]
For System services on controller and compute nodes:
# systemctl start [SERVICE] # systemctl enable [SERVICE]
The next few sections provide a reference of services that should be enabled.
B.2. Restored High Availability Services
The following is a list of high availability services that should be active on OpenStack Platform 10 Controller nodes after a restore. If any of these service are disabled, use the following commands to enable them:
# pcs resource enable [SERVICE] # pcs resource cleanup [SERVICE]
| Controller Services |
|---|
| galera |
| haproxy |
| openstack-cinder-volume |
| rabbitmq |
| redis |
B.3. Restored Controller Services
The following is a list of core Systemd services that should be active on OpenStack Platform 10 Controller nodes after a restore. If any of these service are disabled, use the following commands to enable them:
# systemctl start [SERVICE] # systemctl enable [SERVICE]
| Controller Services |
|---|
| httpd |
| memcached |
| neutron-dhcp-agent |
| neutron-l3-agent |
| neutron-metadata-agent |
| neutron-openvswitch-agent |
| neutron-ovs-cleanup |
| neutron-server |
| ntpd |
| openstack-aodh-evaluator |
| openstack-aodh-listener |
| openstack-aodh-notifier |
| openstack-ceilometer-central |
| openstack-ceilometer-collector |
| openstack-ceilometer-notification |
| openstack-cinder-api |
| openstack-cinder-scheduler |
| openstack-glance-api |
| openstack-glance-registry |
| openstack-gnocchi-metricd |
| openstack-gnocchi-statsd |
| openstack-heat-api-cfn |
| openstack-heat-api-cloudwatch |
| openstack-heat-api |
| openstack-heat-engine |
| openstack-nova-api |
| openstack-nova-conductor |
| openstack-nova-consoleauth |
| openstack-nova-novncproxy |
| openstack-nova-scheduler |
| openstack-swift-account-auditor |
| openstack-swift-account-reaper |
| openstack-swift-account-replicator |
| openstack-swift-account |
| openstack-swift-container-auditor |
| openstack-swift-container-replicator |
| openstack-swift-container-updater |
| openstack-swift-container |
| openstack-swift-object-auditor |
| openstack-swift-object-expirer |
| openstack-swift-object-replicator |
| openstack-swift-object-updater |
| openstack-swift-object |
| openstack-swift-proxy |
| openvswitch |
| os-collect-config |
| ovs-delete-transient-ports |
| ovs-vswitchd |
| ovsdb-server |
| pacemaker |
B.4. Restored Overcloud Compute Services
The following is a list of core Systemd services that should be active on OpenStack Platform 10 Compute nodes after a restore. If any of these service are disabled, use the following commands to enable them:
# systemctl start [SERVICE] # systemctl enable [SERVICE]
| Compute Services |
|---|
| neutron-openvswitch-agent |
| neutron-ovs-cleanup |
| ntpd |
| openstack-ceilometer-compute |
| openstack-nova-compute |
| openvswitch |
| os-collect-config |
