Chapter 2. Preparing for an OpenStack Platform Upgrade
This process prepares your OpenStack Platform environment for a full update. This involves the following process:
- Backup both the undercloud and overcloud
- Update the undercloud packages and run the upgrade command
- Reboot the undercloud in case a newer kernel or newer system packages are installed
- Update the overcloud using the overcloud upgrade command
- Reboot the overcloud nodes in case a newer kernel or newer system packages are installed
- Perform a validation check on both the undercloud and overcloud
These procedures ensure your OpenStack Platform environment is in the best possible state before proceeding with the upgrade.
2.1. Backing up the undercloud
A full undercloud backup includes the following databases and files:
- All MariaDB databases on the undercloud node
- MariaDB configuration file on the undercloud (so that you can accurately restore databases)
-
The configuration data:
/etc -
Log data:
/var/log -
Image data:
/var/lib/glance -
Certificate generation data if using SSL:
/var/lib/certmonger -
Any container image data:
/var/lib/dockerand/var/lib/registry -
All swift data:
/srv/node -
All data in the stack user home directory:
/home/stack
Confirm that you have sufficient disk space available on the undercloud before performing the backup process. Expect the archive file to be at least 3.5 GB, if not larger.
Procedure
-
Log into the undercloud as the
rootuser. Create a
backupdirectory, and change the user ownership of the directory to thestackuser:[root@director ~]# mkdir /backup [root@director ~]# chown stack: /backup
From the
backupdirectory, back up the database:[root@director ~]# cd /backup [root@director ~]# mysqldump --opt --all-databases > /root/undercloud-all-databases.sql
Archive the database backup and the configuration files:
[root@director ~]# tar --xattrs --ignore-failed-read -cf \ undercloud-backup-`date +%F`.tar \ /root/undercloud-all-databases.sql \ /etc \ /var/log \ /var/lib/glance \ /var/lib/certmonger \ /var/lib/docker \ /var/lib/registry \ /srv/node \ /root \ /home/stack-
The
--ignore-failed-readoption skips any directory that does not apply to your undercloud. -
The
--xattrsoption includes extended attributed, which are required to store metadata for Object Storage (swift).
This creates a file named
undercloud-backup-<date>.tar.gz, where<date>is the system date. Copy thistarfile to a secure location.-
The
2.2. Backing up containerized overcloud control plane services
The following procedure creates a backup of the containerized overcloud databases and configuration. A backup of the overcloud database and services ensures you have a snapshot of a working environment. Having this snapshot helps in case you need to restore the overcloud to its original state in case of an operational failure.
This procedure only includes crucial control plane services. It does not include backups of Compute node workloads, data on Ceph Storage nodes, nor any additional services.
Procedure
Perform the database backup:
Log into a Controller node. You can access the overcloud from the undercloud:
$ ssh heat-admin@192.0.2.100
Change to the
rootuser:$ sudo -i
Create a temporary directory to store the backups:
# mkdir -p /var/tmp/mysql_backup/
Obtain the database password and store it in the
MYSQLDBPASSenvironment variable. The password is stored in themysql::server::root_passwordvariable within the/etc/puppet/hieradata/service_configs.jsonfile. Use the following command to store the password:# MYSQLDBPASS=$(sudo hiera mysql::server::root_password)
Backup the database:
# mysql -uroot -p$MYSQLDBPASS -s -N -e "select distinct table_schema from information_schema.tables where engine='innodb' and table_schema != 'mysql';" | xargs mysqldump -uroot -p$MYSQLDBPASS --single-transaction --databases > /var/tmp/mysql_backup/openstack_databases-`date +%F`-`date +%T`.sql
This dumps a database backup called
/var/tmp/mysql_backup/openstack_databases-<date>.sqlwhere<date>is the system date and time. Copy this database dump to a secure location.Backup all the users and permissions information:
# mysql -uroot -p$MYSQLDBPASS -s -N -e "SELECT CONCAT('\"SHOW GRANTS FOR ''',user,'''@''',host,''';\"') FROM mysql.user where (length(user) > 0 and user NOT LIKE 'root')" | xargs -n1 mysql -uroot -p$MYSQLDBPASS -s -N -e | sed 's/$/;/' > /var/tmp/mysql_backup/openstack_databases_grants-`date +%F`-`date +%T`.sqlThis will dump a database backup called
/var/tmp/mysql_backup/openstack_databases_grants-<date>.sqlwhere<date>is the system date and time. Copy this database dump to a secure location.
Backup the OpenStack Telemetry database:
Connect to any controller and get the IP of the MongoDB primary instance:
# MONGOIP=$(sudo hiera mongodb::server::bind_ip)
Create the backup:
# mkdir -p /var/tmp/mongo_backup/ # mongodump --oplog --host $MONGOIP --out /var/tmp/mongo_backup/
-
Copy the database dump in
/var/tmp/mongo_backup/to a secure location.
Backup the Redis cluster:
Obtain the Redis endpoint from HAProxy:
# REDISIP=$(sudo hiera redis_vip)
Obtain the master password for the Redis cluster:
# REDISPASS=$(sudo hiera redis::masterauth)
Check connectivity to the Redis cluster:
# redis-cli -a $REDISPASS -h $REDISIP ping
Dump the Redis database:
# redis-cli -a $REDISPASS -h $REDISIP bgsave
This stores the database backup in the default
/var/lib/redis/directory. Copy this database dump to a secure location.
Backup the filesystem on each Controller node:
Create a directory for the backup:
# mkdir -p /var/tmp/filesystem_backup/
Run the following
tarcommand:# tar --ignore-failed-read --xattrs \ -zcvf /var/tmp/filesystem_backup/fs_backup-`date '+%Y-%m-%d-%H-%M-%S'`.tar.gz \ /var/lib/config-data \ /var/log/containers \ /etc/corosync \ /etc/logrotate.d \ /etc/openvswitch \ /var/log/openvswitch \ /srv/node \ /home/heat-adminThe
--ignore-failed-readoption ignores any missing directories, which is useful if certain services are not used or separated on their own custom roles.
-
Copy the resulting
tarfile to a secure location.
2.3. Performing a minor update of an undercloud
The director provides commands to update the packages on the undercloud node. This allows you to perform a minor update within the current version of your OpenStack Platform environment.
Procedure
-
Log into the director as the
stackuser. Update the
python-tripleoclientpackage and its dependencies to ensure you have the latest scripts for the minor version update:$ sudo yum update -y python-tripleoclient
The director uses the
openstack undercloud upgradecommand to update the Undercloud environment. Run the command:$ openstack undercloud upgrade
- Wait until the undercloud upgrade process completes.
Reboot the undercloud to update the operating system’s kernel and other system packages:
$ sudo reboot
- Wait until the node boots.
2.4. Performing a minor update of a containerized overcloud
The director provides commands to update the packages on all overcloud nodes. This allows you to perform a minor update within the current version of your OpenStack Platform environment.
Procedure
Find the latest tag for the containerized service images:
$ openstack overcloud container image tag discover \ --image registry.access.redhat.com/rhosp12/openstack-base:latest \ --tag-from-label version-release
Make a note of the most recent tag.
Create an updated environment file for your container image source. Run using the
openstack overcloud container image preparecommand. For example, to use images fromregistry.access.redhat.com:$ openstack overcloud container image prepare \ --namespace=registry.access.redhat.com/rhosp12 \ --prefix=openstack- \ --tag [TAG] \ 1 --set ceph_namespace=registry.access.redhat.com/rhceph \ --set ceph_image=rhceph-2-rhel7 \ --set ceph_tag=latest \ --env-file=/home/stack/templates/overcloud_images.yaml \ -e /home/stack/templates/custom_environment_file.yaml 2
For more information about generating this environment file for different source types, see "Configuring a container image source" in the Director Installation and Usage guide.
Run the
openstack overcloud update stackcommand to update the container image locations in your overcloud:$ openstack overcloud update stack --init-minor-update \ --container-registry-file /home/stack/templates/overcloud_images.yaml
The
--init-minor-updateonly performs an update of the parameters in the overcloud stack. It does not perform the actual package or container update. Wait until this command completes.Perform a package and container update using the
openstack overcloud updatecommand. Using the--nodesoption to upgrade node for each role. For example, the following command updates nodes in theControllerrole$ openstack overcloud update stack --nodes Controller
Run this command for each role group in the following order:
-
Controller -
CephStorage -
Compute -
ObjectStorage -
Any custom roles such as
Database,MessageBus,Networker, and so forth.
-
- The update process starts for the chosen role starts. The director uses an Ansible playbook to perform the update and displays the output of each task.
- Update the next role group. Repeat until you have updated all nodes.
2.5. Rebooting controller and composable nodes
The following procedure reboots controller nodes and standalone nodes based on composable roles. This excludes Compute nodes and Ceph Storage nodes.
Procedure
Select a node to reboot. Log into it and stop the cluster before rebooting:
[heat-admin@overcloud-controller-0 ~]$ sudo pcs cluster stop
Reboot the node:
[heat-admin@overcloud-controller-0 ~]$ sudo reboot
- Wait until the node boots.
Re-enable the cluster for the node:
[heat-admin@overcloud-controller-0 ~]$ sudo pcs cluster start
Log into the node and check the services. For example:
If the node uses Pacemaker services, check the node has rejoined the cluster:
[heat-admin@overcloud-controller-0 ~]$ sudo pcs status
If the node uses Systemd services, check all services are enabled:
[heat-admin@overcloud-controller-0 ~]$ sudo systemctl status
2.6. Rebooting a Ceph Storage (OSD) cluster
The following procedure reboots a cluster of Ceph Storage (OSD) nodes.
Procedure
Log into a Ceph MON or Controller node and disable Ceph Storage cluster rebalancing temporarily:
$ sudo ceph osd set noout $ sudo ceph osd set norebalance
- Select the first Ceph Storage node to reboot and log into it.
Reboot the node:
$ sudo reboot
- Wait until the node boots.
Log into the node and check the cluster status:
$ sudo ceph -s
Check that the
pgmapreports allpgsas normal (active+clean).- Log out of the node, reboot the next node, and check its status. Repeat this process until you have rebooted all Ceph storage nodes.
When complete, log into a Ceph MON or Controller node and enable cluster rebalancing again:
$ sudo ceph osd unset noout $ sudo ceph osd unset norebalance
Perform a final status check to verify the cluster reports
HEALTH_OK:$ sudo ceph status
2.7. Rebooting compute nodes
The following procedure reboots Compute nodes. To ensure minimal downtime of instances in your OpenStack Platform environment, this procedure also includes instructions on migrating instances from the chosen Compute node. This involves the following workflow:
- Select a Compute node to reboot and disable it so that it does not provision new instances
- Migrate the instances to another Compute node
- Reboot the empty Compute node and enable it
Procedure
-
Log into the undercloud as the
stackuser. List all Compute nodes and their UUIDs:
$ source ~/stackrc (undercloud) $ openstack server list --name compute
Identify the UUID of the Compute node you aim to reboot.
From the undercloud, select a Compute Node and disable it:
$ source ~/overcloudrc (overcloud) $ openstack compute service list (overcloud) $ openstack compute service set [hostname] nova-compute --disable
List all instances on the Compute node:
(overcloud) $ openstack server list --host [hostname] --all-projects
Use one of the following commands to migrate your instances:
Migrate the instance to a specific host of your choice:
(overcloud) $ openstack server migrate [instance-id] --live [target-host]--wait
Let
nova-schedulerautomatically select the target host:(overcloud) $ nova live-migration [instance-id]
Live migrate all instances at once:
$ nova host-evacuate-live [hostname]
NoteThe
novacommand might cause some deprecation warnings, which are safe to ignore.
- Wait until migration completes.
Confirm the migration was successful:
(overcloud) $ openstack server list --host [hostname] --all-projects
- Continue migrating instances until none remain on the chosen Compute Node.
Log into the Compute Node and reboot it:
[heat-admin@overcloud-compute-0 ~]$ sudo reboot
- Wait until the node boots.
Enable the Compute Node again:
$ source ~/overcloudrc (overcloud) $ openstack compute service set [hostname] nova-compute --enable
Check whether the Compute node is enabled:
(overcloud) $ openstack compute service list
2.8. Validating the undercloud
The following is a set of steps to check the functionality of your undercloud.
Procedure
Source the undercloud access details:
$ source ~/stackrc
Check for failed Systemd services:
(undercloud) $ sudo systemctl list-units --state=failed 'openstack*' 'neutron*' 'httpd' 'docker'
Check the undercloud free space:
(undercloud) $ df -h
Use the "Undercloud Reqirements" as a basis to determine if you have adequate free space.
If you have NTP installed on the undercloud, check that clocks are synchronized:
(undercloud) $ sudo ntpstat
Check the undercloud network services:
(undercloud) $ openstack network agent list
All agents should be
Aliveand their state should beUP.Check the undercloud compute services:
(undercloud) $ openstack compute service list
All agents' status should be
enabledand their state should beup
Related Information
- The following solution article shows how to remove deleted stack entries in your OpenStack Orchestration (heat) database: https://access.redhat.com/solutions/2215131
2.9. Validating a containerized overcloud
The following is a set of steps to check the functionality of your containerized overcloud.
Procedure
Source the undercloud access details:
$ source ~/stackrc
Check the status of your bare metal nodes:
(undercloud) $ openstack baremetal node list
All nodes should have a valid power state (
on) and maintenance mode should befalse.Check for failed Systemd services:
(undercloud) $ for NODE in $(openstack server list -f value -c Networks | cut -d= -f2); do echo "=== $NODE ===" ; ssh heat-admin@$NODE "sudo systemctl list-units --state=failed 'openstack*' 'neutron*' 'httpd' 'docker' 'ceph*'" ; done
Check for failed containerized services:
(undercloud) $ for NODE in $(openstack server list -f value -c Networks | cut -d= -f2); do echo "=== $NODE ===" ; ssh heat-admin@$NODE "sudo docker ps -f 'exited=1' --all" ; done (undercloud) $ for NODE in $(openstack server list -f value -c Networks | cut -d= -f2); do echo "=== $NODE ===" ; ssh heat-admin@$NODE "sudo docker ps -f 'status=dead' -f 'status=restarting'" ; done
Check the HAProxy connection to all services. Obtain the Control Plane VIP address and authentication details for the
haproxy.statsservice:(undercloud) $ NODE=$(openstack server list --name controller-0 -f value -c Networks | cut -d= -f2); ssh heat-admin@$NODE sudo 'grep "listen haproxy.stats" -A 6 /var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg'
Use these details in the following cURL request:
(undercloud) $ curl -s -u admin:<PASSWORD> "http://<IP ADDRESS>:1993/;csv" | egrep -vi "(frontend|backend)" | awk -F',' '{ print $1" "$2" "$18 }'Replace
<PASSWORD>and<IP ADDRESS>details with the respective details from thehaproxy.statsservice. The resulting list shows the OpenStack Platform services on each node and their connection status.Check overcloud database replication health:
(undercloud) $ for NODE in $(openstack server list --name controller -f value -c Networks | cut -d= -f2); do echo "=== $NODE ===" ; ssh heat-admin@$NODE "sudo docker exec clustercheck clustercheck" ; done
Check RabbitMQ cluster health:
(undercloud) $ for NODE in $(openstack server list --name controller -f value -c Networks | cut -d= -f2); do echo "=== $NODE ===" ; ssh heat-admin@$NODE "sudo docker exec $(ssh heat-admin@$NODE "sudo docker ps -f 'name=.*rabbitmq.*' -q") rabbitmqctl node_health_check" ; done
Check Pacemaker resource health:
(undercloud) $ NODE=$(openstack server list --name controller-0 -f value -c Networks | cut -d= -f2); ssh heat-admin@$NODE "sudo pcs status"
Look for:
-
All cluster nodes
online. -
No resources
stoppedon any cluster nodes. -
No
failedpacemaker actions.
-
All cluster nodes
Check the disk space on each overcloud node:
(undercloud) $ for NODE in $(openstack server list -f value -c Networks | cut -d= -f2); do echo "=== $NODE ===" ; ssh heat-admin@$NODE "sudo df -h --output=source,fstype,avail -x overlay -x tmpfs -x devtmpfs" ; done
Check overcloud Ceph Storage cluster health. The following command runs the
cephtool on a Controller node to check the cluster:(undercloud) $ NODE=$(openstack server list --name controller-0 -f value -c Networks | cut -d= -f2); ssh heat-admin@$NODE "sudo ceph -s"
Check Ceph Storage OSD for free space. The following command runs the
cephtool on a Controller node to check the free space:(undercloud) $ NODE=$(openstack server list --name controller-0 -f value -c Networks | cut -d= -f2); ssh heat-admin@$NODE "sudo ceph df"
Check that clocks are synchronized on overcloud nodes
(undercloud) $ for NODE in $(openstack server list -f value -c Networks | cut -d= -f2); do echo "=== $NODE ===" ; ssh heat-admin@$NODE "sudo ntpstat" ; done
Source the overcloud access details:
(undercloud) $ source ~/overcloudrc
Check the overcloud network services:
(overcloud) $ openstack network agent list
All agents should be
Aliveand their state should beUP.Check the overcloud compute services:
(overcloud) $ openstack compute service list
All agents' status should be
enabledand their state should beupCheck the overcloud volume services:
(overcloud) $ openstack volume service list
All agents' status should be
enabledand their state should beup.
Related Information
- Review the article "How can I verify my OpenStack environment is deployed with Red Hat recommended configurations?". This article provides some information on how to check your Red Hat OpenStack Platform environment and tune the configuration to Red Hat’s recommendations.
