Chapter 2. Preparing for an OpenStack Platform upgrade
This process prepares your OpenStack Platform environment. This involves the following steps:
- Backing up both the undercloud and overcloud.
- Updating the undercloud to the latest minor version of OpenStack Platform 10, including the latest Open vSwitch.
- Rebooting the undercloud in case a newer kernel or newer system packages are installed.
- Updating the overcloud to the latest minor version of OpenStack Platform 10, including the latest Open vSwitch.
- Rebooting the overcloud nodes in case a newer kernel or newer system packages are installed.
- Performing validation checks on both the undercloud and overcloud.
These procedures ensure your OpenStack Platform environment is in the best possible state before proceeding with the upgrade.
2.1. Backing up the undercloud
A full undercloud backup includes the following databases and files:
- All MariaDB databases on the undercloud node
- MariaDB configuration file on the undercloud (so that you can accurately restore databases)
-
The configuration data:
/etc -
Log data:
/var/log -
Image data:
/var/lib/glance -
Certificate generation data if using SSL:
/var/lib/certmonger -
Any container image data:
/var/lib/dockerand/var/lib/registry -
All swift data:
/srv/node -
All data in the stack user home directory:
/home/stack
Confirm that you have sufficient disk space available on the undercloud before performing the backup process. Expect the archive file to be at least 3.5 GB, if not larger.
Procedure
-
Log into the undercloud as the
rootuser. Create a
backupdirectory, and change the user ownership of the directory to thestackuser:[root@director ~]# mkdir /backup [root@director ~]# chown stack: /backup
From the
backupdirectory, back up the database:[root@director ~]# cd /backup [root@director ~]# mysqldump --opt --all-databases > /root/undercloud-all-databases.sql
Archive the database backup and the configuration files:
[root@director ~]# tar --xattrs --ignore-failed-read -cf \ undercloud-backup-`date +%F`.tar \ /root/undercloud-all-databases.sql \ /etc \ /var/log \ /var/lib/glance \ /var/lib/certmonger \ /var/lib/docker \ /var/lib/registry \ /srv/node \ /root \ /home/stack-
The
--ignore-failed-readoption skips any directory that does not apply to your undercloud. -
The
--xattrsoption includes extended attributed, which are required to store metadata for Object Storage (swift).
This creates a file named
undercloud-backup-<date>.tar.gz, where<date>is the system date. Copy thistarfile to a secure location.-
The
Related Information
- If you need to restore the undercloud backup, see Appendix A, Restoring the undercloud.
2.2. Backing up the overcloud control plane services
The following procedure creates a backup of the overcloud databases and configuration. A backup of the overcloud database and services ensures you have a snapshot of a working environment. Having this snapshot helps in case you need to restore the overcloud to its original state in case of an operational failure.
This procedure only includes crucial control plane services. It does not include backups of Compute node workloads, data on Ceph Storage nodes, nor any additional services.
Procedure
Perform the database backup:
Log into a Controller node. You can access the overcloud from the undercloud:
$ ssh heat-admin@192.0.2.100
Change to the
rootuser:$ sudo -i
Create a temporary directory to store the backups:
# mkdir -p /var/tmp/mysql_backup/
Obtain the database password and store it in the
MYSQLDBPASSenvironment variable. The password is stored in themysql::server::root_passwordvariable within the/etc/puppet/hieradata/service_configs.jsonfile. Use the following command to store the password:# MYSQLDBPASS=$(sudo hiera mysql::server::root_password)
Backup the database:
# mysql -uroot -p$MYSQLDBPASS -s -N -e "select distinct table_schema from information_schema.tables where engine='innodb' and table_schema != 'mysql';" | xargs mysqldump -uroot -p$MYSQLDBPASS --single-transaction --databases > /var/tmp/mysql_backup/openstack_databases-`date +%F`-`date +%T`.sql
This dumps a database backup called
/var/tmp/mysql_backup/openstack_databases-<date>.sqlwhere<date>is the system date and time. Copy this database dump to a secure location.Backup all the users and permissions information:
# mysql -uroot -p$MYSQLDBPASS -s -N -e "SELECT CONCAT('\"SHOW GRANTS FOR ''',user,'''@''',host,''';\"') FROM mysql.user where (length(user) > 0 and user NOT LIKE 'root')" | xargs -n1 mysql -uroot -p$MYSQLDBPASS -s -N -e | sed 's/$/;/' > /var/tmp/mysql_backup/openstack_databases_grants-`date +%F`-`date +%T`.sqlThis will dump a database backup called
/var/tmp/mysql_backup/openstack_databases_grants-<date>.sqlwhere<date>is the system date and time. Copy this database dump to a secure location.
Backup the OpenStack Telemetry database:
Connect to any controller and get the IP of the MongoDB primary instance:
# MONGOIP=$(sudo hiera mongodb::server::bind_ip)
Create the backup:
# mkdir -p /var/tmp/mongo_backup/ # mongodump --oplog --host $MONGOIP --out /var/tmp/mongo_backup/
-
Copy the database dump in
/var/tmp/mongo_backup/to a secure location.
Backup the Redis cluster:
Obtain the Redis endpoint from HAProxy:
# REDISIP=$(sudo hiera redis_vip)
Obtain the master password for the Redis cluster:
# REDISPASS=$(sudo hiera redis::masterauth)
Check connectivity to the Redis cluster:
# redis-cli -a $REDISPASS -h $REDISIP ping
Dump the Redis database:
# redis-cli -a $REDISPASS -h $REDISIP bgsave
This stores the database backup in the default
/var/lib/redis/directory. Copy this database dump to a secure location.
Backup the filesystem on each Controller node:
Create a directory for the backup:
# mkdir -p /var/tmp/filesystem_backup/
Run the following
tarcommand:# tar --ignore-failed-read --xattrs \ -zcvf /var/tmp/filesystem_backup/`hostname`-filesystem-`date '+%Y-%m-%d-%H-%M-%S'`.tar \ /etc \ /srv/node \ /var/log \ /var/lib/nova \ --exclude /var/lib/nova/instances \ /var/lib/glance \ /var/lib/keystone \ /var/lib/cinder \ /var/lib/heat \ /var/lib/heat-config \ /var/lib/heat-cfntools \ /var/lib/rabbitmq \ /var/lib/neutron \ /var/lib/haproxy \ /var/lib/openvswitch \ /var/lib/redis \ /usr/libexec/os-apply-config \ /home/heat-adminThe
--ignore-failed-readoption ignores any missing directories, which is useful if certain services are not used or separated on their own custom roles.
-
Copy the resulting
tarfile to a secure location.
Related Information
- If you need to restore the overcloud backup, see Appendix B, Restoring the overcloud.
2.3. Preparing updates for NFV-enabled environments
If your environment has network function virtualization (NFV) enabled, you need to follow these steps before updating your undercloud and overcloud.
Procedure
-
Add the content from this sample post-install.yaml file to any existing
post-install.yamlfile. Change the vhost user socket directory in a custom environment file, for example,
network-environment.yaml:parameter_defaults: NeutronVhostuserSocketDir: "/var/lib/vhost_sockets"
Add the
ovs-dpdk-permissions.yamlfile to youropenstack overcloud deploycommand to configure the qemu group setting ashugetlbfsfor OVS-DPDK:-e environments/ovs-dpdk-permissions.yaml
2.4. Updating the current undercloud packages for OpenStack Platform 10.z
The director provides commands to update the packages on the undercloud node. This allows you to perform a minor update within the current version of your OpenStack Platform environment. This is a minor update within OpenStack Platform 10.
This step also updates the undercloud operating system to the latest version of Red Hat Enterprise Linux 7 and Open vSwtich to version 2.9.
Procedure
-
Log into the undercloud as the
stackuser. Stop the main OpenStack Platform services:
$ sudo systemctl stop 'openstack-*' 'neutron-*' httpd
NoteThis causes a short period of downtime for the undercloud. The overcloud is still functional during the undercloud upgrade.
Update the
python-tripleoclientpackage and its dependencies to ensure you have the latest scripts for the minor version update:$ sudo yum update -y python-tripleoclient
Run the
openstack undercloud upgradecommand:$ openstack undercloud upgrade
- Wait until the command completes its execution.
Reboot the undercloud to update the operating system’s kernel and other system packages:
$ sudo reboot
- Wait until the node boots.
-
Log into the undercloud as the
stackuser.
In addition to undercloud package updates, it is recommended to keep your overcloud images up to date to keep the image configuration in sync with the latest openstack-tripleo-heat-template package. This ensures successful deployment and scaling operations in between the current preparation stage and the actual fast forward upgrade. The next section shows how to update your images in this scenario. If you aim to immediately upgrade your environment after preparing your environment, you can skip the next section.
2.5. Updating the current overcloud images for OpenStack Platform 10.z
The undercloud update process might download new image archives from the rhosp-director-images and rhosp-director-images-ipa packages. This process updates these images on your undercloud within Red Hat OpenStack Platform 10.
Prerequisites
- You have updated to the latest minor release of your current undercloud version.
Procedure
Check the
yumlog to determine if new image archives are available:$ sudo grep "rhosp-director-images" /var/log/yum.log
If new archives are available, replace your current images with new images. To install the new images, first remove any existing images from the
imagesdirectory on thestackuser’s home (/home/stack/images):$ rm -rf ~/images/*
Extract the archives:
$ cd ~/images $ for i in /usr/share/rhosp-director-images/overcloud-full-latest-10.0.tar /usr/share/rhosp-director-images/ironic-python-agent-latest-10.0.tar; do tar -xvf $i; done
Import the latest images into the director and configure nodes to use the new images
$ cd ~ $ openstack overcloud image upload --update-existing --image-path /home/stack/images/ $ openstack overcloud node configure $(openstack baremetal node list -c UUID -f csv --quote none | sed "1d" | paste -s -d " ")
To finalize the image update, verify the existence of the new images:
$ openstack image list $ ls -l /httpboot
The director also retains the old images and renames them using the timestamp of when they were updated. If you no longer need these images, delete them.
The director is now updated and using the latest images. You do not need to restart any services after the update.
The undercloud is now using updated OpenStack Platform 10 packages. Next, update the overcloud to the latest minor release.
2.6. Updating the current overcloud packages for OpenStack Platform 10.z
The director provides commands to update the packages on all overcloud nodes. This allows you to perform a minor update within the current version of your OpenStack Platform environment. This is a minor update within Red Hat OpenStack Platform 10.
This step also updates the overcloud nodes' operating system to the latest version of Red Hat Enterprise Linux 7 and Open vSwtich to version 2.9.
Prerequisites
- You have updated to the latest minor release of your current undercloud version.
- You have performed a backup of the overcloud.
Procedure
Update the current plan using your original
openstack overcloud deploycommand and including the--update-plan-onlyoption. For example:$ openstack overcloud deploy --update-plan-only \ --templates \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/templates/network-environment.yaml \ -e /home/stack/templates/storage-environment.yaml \ -e /home/stack/templates/rhel-registration/environment-rhel-registration.yaml \ [-e <environment_file>|...]
The
--update-plan-onlyonly updates the Overcloud plan stored in the director. Use the-eoption to include environment files relevant to your Overcloud and its update path. The order of the environment files is important as the parameters and resources defined in subsequent environment files take precedence. Use the following list as an example of the environment file order:-
Any network isolation files, including the initialization file (
environments/network-isolation.yaml) from the heat template collection and then your custom NIC configuration file. - Any external load balancing environment files.
- Any storage environment files.
- Any environment files for Red Hat CDN or Satellite registration.
- Any other custom environment files.
-
Any network isolation files, including the initialization file (
Perform a package update on all nodes using the
openstack overcloud updatecommand:$ openstack overcloud update stack -i overcloud
The
-iruns an interactive mode to update each node sequentially. When the update process completes a node update, the script provides a breakpoint for you to confirm. Without the-ioption, the update remains paused at the first breakpoint. Therefore, it is mandatory to include the-ioption.The script performs the following functions:
The script runs on nodes one-by-one:
- For Controller nodes, this means a full package update.
- For other nodes, this means an update of Puppet modules only.
Puppet runs on all nodes at once:
- For Controller nodes, the Puppet run synchronizes the configuration.
- For other nodes, the Puppet run updates the rest of the packages and synchronizes the configuration.
The update process starts. During this process, the director reports an
IN_PROGRESSstatus and periodically prompts you to clear breakpoints. For example:starting package update on stack overcloud IN_PROGRESS IN_PROGRESS WAITING on_breakpoint: [u'overcloud-compute-0', u'overcloud-controller-2', u'overcloud-controller-1', u'overcloud-controller-0'] Breakpoint reached, continue? Regexp or Enter=proceed (will clear 49913767-e2dd-4772-b648-81e198f5ed00), no=cancel update, C-c=quit interactive mode:
Press Enter to clear the breakpoint from last node on the
on_breakpointlist. This begins the update for that node.The script automatically predefines the update order of nodes:
- Each Controller node individually
- Each individual Compute node individually
- Each Ceph Storage node individually
- All other nodes individually
It is recommended to use this order to ensure a successful update, specifically:
- Clear the breakpoint of each Controller node individually. Each Controller node requires an individual package update in case the node’s services must restart after the update. This reduces disruption to highly available services on other Controller nodes.
- After the Controller node update, clear the breakpoints for each Compute node. You can also type a Compute node name to clear a breakpoint on a specific node or use a Python-based regular expression to clear breakpoints on multiple Compute nodes at once.
- Clear the breakpoints for each Ceph Storage nodes. You can also type a Ceph Storage node name to clear a breakpoint on a specific node or use a Python-based regular expression to clear breakpoints on multiple Ceph Storage nodes at once.
- Clear any remaining breakpoints to update the remaining nodes. You can also type a node name to clear a breakpoint on a specific node or use a Python-based regular expression to clear breakpoints on multiple nodes at once.
- Wait until all nodes have completed their update.
The update command reports a
COMPLETEstatus when the update completes:... IN_PROGRESS IN_PROGRESS IN_PROGRESS COMPLETE update finished with status COMPLETE
If you configured fencing for your Controller nodes, the update process might disable it. When the update process completes, reenable fencing with the following command on one of the Controller nodes:
$ sudo pcs property set stonith-enabled=true
The update process does not reboot any nodes in the Overcloud automatically. Updates to the kernel and other system packages require a reboot. Check the /var/log/yum.log file on each node to see if either the kernel or openvswitch packages have updated their major or minor versions. If they have, reboot each node using the following procedures.
2.7. Rebooting controller and composable nodes
The following procedure reboots controller nodes and standalone nodes based on composable roles. This excludes Compute nodes and Ceph Storage nodes.
Procedure
Select a node to reboot. Log into it and stop the cluster before rebooting:
[heat-admin@overcloud-controller-0 ~]$ sudo pcs cluster stop
Reboot the node:
[heat-admin@overcloud-controller-0 ~]$ sudo reboot
- Wait until the node boots.
Re-enable the cluster for the node:
[heat-admin@overcloud-controller-0 ~]$ sudo pcs cluster start
Log into the node and check the services. For example:
If the node uses Pacemaker services, check the node has rejoined the cluster:
[heat-admin@overcloud-controller-0 ~]$ sudo pcs status
If the node uses Systemd services, check all services are enabled:
[heat-admin@overcloud-controller-0 ~]$ sudo systemctl status
2.8. Rebooting a Ceph Storage (OSD) cluster
The following procedure reboots a cluster of Ceph Storage (OSD) nodes.
Procedure
Log into a Ceph MON or Controller node and disable Ceph Storage cluster rebalancing temporarily:
$ sudo ceph osd set noout $ sudo ceph osd set norebalance
- Select the first Ceph Storage node to reboot and log into it.
Reboot the node:
$ sudo reboot
- Wait until the node boots.
Log into the node and check the cluster status:
$ sudo ceph -s
Check that the
pgmapreports allpgsas normal (active+clean).- Log out of the node, reboot the next node, and check its status. Repeat this process until you have rebooted all Ceph storage nodes.
When complete, log into a Ceph MON or Controller node and enable cluster rebalancing again:
$ sudo ceph osd unset noout $ sudo ceph osd unset norebalance
Perform a final status check to verify the cluster reports
HEALTH_OK:$ sudo ceph status
2.9. Rebooting compute nodes
The following procedure reboots Compute nodes. To ensure minimal downtime of instances in your OpenStack Platform environment, this procedure also includes instructions on migrating instances from the chosen Compute node. This involves the following workflow:
- Select a Compute node to reboot and disable it so that it does not provision new instances
- Migrate the instances to another Compute node
- Reboot the empty Compute node and enable it
Procedure
-
Log into the undercloud as the
stackuser. List all Compute nodes and their UUIDs:
$ source ~/stackrc (undercloud) $ openstack server list --name compute
Identify the UUID of the Compute node you aim to reboot.
From the undercloud, select a Compute Node and disable it:
$ source ~/overcloudrc (overcloud) $ openstack compute service list (overcloud) $ openstack compute service set [hostname] nova-compute --disable
List all instances on the Compute node:
(overcloud) $ openstack server list --host [hostname] --all-projects
Use one of the following commands to migrate your instances:
Migrate the instance to a specific host of your choice:
(overcloud) $ openstack server migrate [instance-id] --live [target-host]--wait
Let
nova-schedulerautomatically select the target host:(overcloud) $ nova live-migration [instance-id]
Live migrate all instances at once:
$ nova host-evacuate-live [hostname]
NoteThe
novacommand might cause some deprecation warnings, which are safe to ignore.
- Wait until migration completes.
Confirm the migration was successful:
(overcloud) $ openstack server list --host [hostname] --all-projects
- Continue migrating instances until none remain on the chosen Compute Node.
Log into the Compute Node and reboot it:
[heat-admin@overcloud-compute-0 ~]$ sudo reboot
- Wait until the node boots.
Enable the Compute Node again:
$ source ~/overcloudrc (overcloud) $ openstack compute service set [hostname] nova-compute --enable
Check whether the Compute node is enabled:
(overcloud) $ openstack compute service list
2.10. Verifying system packages
Before the upgrade, all nodes should be using the latest versions of the following packages:
| Package | Version |
|---|---|
|
| At least 2.9 |
|
| At least 2.10 |
|
| At least 2.10 |
|
| At least 2.10 |
|
| At least 2.10 |
Use the following procedure on each node to check the package versions.
Procedure
- Log into a node.
Run
yumto check the system packages:$ sudo yum list qemu-img-rhev qemu-kvm-common-rhev qemu-kvm-rhev qemu-kvm-tools-rhev openvswitch
Run
ovs-vsctlto check the version currently running:$ sudo ovs-vsctl --version
The undercloud is now uses updated OpenStack Platform 10 packages. Use the next few procedures to check the system is in a working state.
2.11. Validating an OpenStack Platform 10 undercloud
The following is a set of steps to check the functionality of your Red Hat OpenStack Platform 10 undercloud before an upgrade.
Procedure
Source the undercloud access details:
$ source ~/stackrc
Check for failed Systemd services:
$ sudo systemctl list-units --state=failed 'openstack*' 'neutron*' 'httpd' 'docker'
Check the undercloud free space:
$ df -h
Use the "Undercloud Reqirements" as a basis to determine if you have adequate free space.
If you have NTP installed on the undercloud, check the clock is synchronized:
$ sudo ntpstat
Check the undercloud network services:
$ openstack network agent list
All agents should be
Aliveand their state should beUP.Check the undercloud compute services:
$ openstack compute service list
All agents' status should be
enabledand their state should beup
Related Information
- The following solution article shows how to remove deleted stack entries in your OpenStack Orchestration (heat) database: https://access.redhat.com/solutions/2215131
2.12. Validating an OpenStack Platform 10 overcloud
The following is a set of steps to check the functionality of your Red Hat OpenStack Platform 10 overcloud before an upgrade.
Procedure
Source the undercloud access details:
$ source ~/stackrc
Check the status of your bare metal nodes:
$ openstack baremetal node list
All nodes should have a valid power state (
on) and maintenance mode should befalse.Check for failed Systemd services:
$ for NODE in $(openstack server list -f value -c Networks | cut -d= -f2); do echo "=== $NODE ===" ; ssh heat-admin@$NODE "sudo systemctl list-units --state=failed 'openstack*' 'neutron*' 'httpd' 'docker' 'ceph*'" ; done
Check the HAProxy connection to all services. Obtain the Control Plane VIP address and authentication details for the
haproxy.statsservice:$ NODE=$(openstack server list --name controller-0 -f value -c Networks | cut -d= -f2); ssh heat-admin@$NODE sudo 'grep "listen haproxy.stats" -A 6 /etc/haproxy/haproxy.cfg'
Use these details in the following cURL request:
$ curl -s -u admin:<PASSWORD> "http://<IP ADDRESS>:1993/;csv" | egrep -vi "(frontend|backend)" | awk -F',' '{ print $1" "$2" "$18 }'Replace
<PASSWORD>and<IP ADDRESS>details with the respective details from thehaproxy.statsservice. The resulting list shows the OpenStack Platform services on each node and their connection status.Check overcloud database replication health:
$ for NODE in $(openstack server list --name controller -f value -c Networks | cut -d= -f2); do echo "=== $NODE ===" ; ssh heat-admin@$NODE "sudo clustercheck" ; done
Check RabbitMQ cluster health:
$ for NODE in $(openstack server list --name controller -f value -c Networks | cut -d= -f2); do echo "=== $NODE ===" ; ssh heat-admin@$NODE "sudo rabbitmqctl node_health_check" ; done
Check Pacemaker resource health:
$ NODE=$(openstack server list --name controller-0 -f value -c Networks | cut -d= -f2); ssh heat-admin@$NODE "sudo pcs status"
Look for:
-
All cluster nodes
online. -
No resources
stoppedon any cluster nodes. -
No
failedpacemaker actions.
-
All cluster nodes
Check the disk space on each overcloud node:
$ for NODE in $(openstack server list -f value -c Networks | cut -d= -f2); do echo "=== $NODE ===" ; ssh heat-admin@$NODE "sudo df -h --output=source,fstype,avail -x overlay -x tmpfs -x devtmpfs" ; done
Check overcloud Ceph Storage cluster health. The following command runs the
cephtool on a Controller node to check the cluster:$ NODE=$(openstack server list --name controller-0 -f value -c Networks | cut -d= -f2); ssh heat-admin@$NODE "sudo ceph -s"
Check Ceph Storage OSD for free space. The following command runs the
cephtool on a Controller node to check the free space:$ NODE=$(openstack server list --name controller-0 -f value -c Networks | cut -d= -f2); ssh heat-admin@$NODE "sudo ceph df"
Check that clocks are synchronized on overcloud nodes
$ for NODE in $(openstack server list -f value -c Networks | cut -d= -f2); do echo "=== $NODE ===" ; ssh heat-admin@$NODE "sudo ntpstat" ; done
Source the overcloud access details:
$ source ~/overcloudrc
Check the overcloud network services:
$ openstack network agent list
All agents should be
Aliveand their state should beUP.Check the overcloud compute services:
$ openstack compute service list
All agents' status should be
enabledand their state should beupCheck the overcloud volume services:
$ openstack volume service list
All agents' status should be
enabledand their state should beup.
Related Information
- Review the article "How can I verify my OpenStack environment is deployed with Red Hat recommended configurations?". This article provides some information on how to check your Red Hat OpenStack Platform environment and tune the configuration to Red Hat’s recommendations.
- Review the article "Database Size Management for Red Hat Enterprise Linux OpenStack Platform" to check and clean unused database records for OpenStack Platform services on the overcloud.
2.13. Finalizing updates for NFV-enabled environments
If your environment has network function virtualization (NFV) enabled, you need to follow these steps after updating your undercloud and overcloud.
Procedure
You need to migrate your existing OVS-DPDK instances to ensure that the vhost socket mode changes from dkdpvhostuser to dkdpvhostuserclient mode in the OVS ports. We recommend that you snapshot existing instances and rebuild a new instance based on that snapshot image. See Manage Instance Snapshots for complete details on instance snapshots.
To snapshot an instance and boot a new instance from the snapshot:
Find the server ID for the instance you want to take a snapshot of:
# openstack server list
Shut down the source instance before you take the snapshot to ensure that all data is flushed to disk:
# openstack server stop SERVER_IDCreate the snapshot image of the instance:
# openstack image create --id SERVER_ID SNAPSHOT_NAME
Boot a new instance with this snapshot image:
# openstack server create --flavor DPDK_FLAVOR --nic net-id=DPDK_NET_ID--image SNAPSHOT_NAME INSTANCE_NAME
Optionally, verify that the new instance status is
ACTIVE:# openstack server list
Repeat this procedure for all instances that you need to snapshot and relaunch.
2.14. Next Steps
With the preparation stage complete, you can now perform an upgrade of the undercloud from 10 to 13 using the steps in Chapter 3, Upgrading the undercloud.
