Chapter 2. Preparing for an OpenStack Platform Upgrade

This process prepares your OpenStack Platform environment for a full update. This involves the following process:

  • Backup both the undercloud and overcloud
  • Update the undercloud packages and run the upgrade command
  • Reboot the undercloud in case a newer kernel or newer system packages are installed
  • Update the overcloud using the overcloud upgrade command
  • Reboot the overcloud nodes in case a newer kernel or newer system packages are installed
  • Perform a validation check on both the undercloud and overcloud

These procedures ensure your OpenStack Platform environment is in the best possible state before proceeding with the upgrade.

2.1. Backing up the undercloud

A full undercloud backup includes the following databases and files:

  • All MariaDB databases on the undercloud node
  • MariaDB configuration file on the undercloud (so that you can accurately restore databases)
  • The configuration data: /etc
  • Log data: /var/log
  • Image data: /var/lib/glance
  • Certificate generation data if using SSL: /var/lib/certmonger
  • Any container image data: /var/lib/docker and /var/lib/registry
  • All swift data: /srv/node
  • All data in the stack user home directory: /home/stack
Note

Confirm that you have sufficient disk space available on the undercloud before performing the backup process. Expect the archive file to be at least 3.5 GB, if not larger.

Procedure

  1. Log into the undercloud as the root user.
  2. Create a backup directory, and change the user ownership of the directory to the stack user:

    [root@director ~]# mkdir /backup
    [root@director ~]# chown stack: /backup
  3. From the backup directory, back up the database:

    [root@director ~]# cd /backup
    [root@director ~]# mysqldump --opt --all-databases > /root/undercloud-all-databases.sql
  4. Archive the database backup and the configuration files:

    [root@director ~]# tar --xattrs --ignore-failed-read -cf \
        undercloud-backup-`date +%F`.tar \
        /root/undercloud-all-databases.sql \
        /etc \
        /var/log \
        /var/lib/glance \
        /var/lib/certmonger \
        /var/lib/docker \
        /var/lib/registry \
        /srv/node \
        /root \
        /home/stack
    • The --ignore-failed-read option skips any directory that does not apply to your undercloud.
    • The --xattrs option includes extended attributed, which are required to store metadata for Object Storage (swift).

    This creates a file named undercloud-backup-<date>.tar.gz, where <date> is the system date. Copy this tar file to a secure location.

2.2. Backing up containerized overcloud control plane services

The following procedure creates a backup of the containerized overcloud databases and configuration. A backup of the overcloud database and services ensures you have a snapshot of a working environment. Having this snapshot helps in case you need to restore the overcloud to its original state in case of an operational failure.

Important

This procedure only includes crucial control plane services. It does not include backups of Compute node workloads, data on Ceph Storage nodes, nor any additional services.

Procedure

  1. Perform the database backup:

    1. Log into a Controller node. You can access the overcloud from the undercloud:

      $ ssh heat-admin@192.0.2.100
    2. Change to the root user:

      $ sudo -i
    3. Create a temporary directory to store the backups:

      # mkdir -p /var/tmp/mysql_backup/
    4. Obtain the database password and store it in the MYSQLDBPASS environment variable. The password is stored in the mysql::server::root_password variable within the /etc/puppet/hieradata/service_configs.json file. Use the following command to store the password:

      # MYSQLDBPASS=$(sudo hiera mysql::server::root_password)
    5. Backup the database:

      # mysql -uroot -p$MYSQLDBPASS -s -N -e "select distinct table_schema from information_schema.tables where engine='innodb' and table_schema != 'mysql';" | xargs mysqldump -uroot -p$MYSQLDBPASS --single-transaction --databases > /var/tmp/mysql_backup/openstack_databases-`date +%F`-`date +%T`.sql

      This dumps a database backup called /var/tmp/mysql_backup/openstack_databases-<date>.sql where <date> is the system date and time. Copy this database dump to a secure location.

    6. Backup all the users and permissions information:

      # mysql -uroot -p$MYSQLDBPASS -s -N -e "SELECT CONCAT('\"SHOW GRANTS FOR ''',user,'''@''',host,''';\"') FROM mysql.user where (length(user) > 0 and user NOT LIKE 'root')" | xargs -n1 mysql -uroot -p$MYSQLDBPASS -s -N -e | sed 's/$/;/' > /var/tmp/mysql_backup/openstack_databases_grants-`date +%F`-`date +%T`.sql

      This will dump a database backup called /var/tmp/mysql_backup/openstack_databases_grants-<date>.sql where <date> is the system date and time. Copy this database dump to a secure location.

  2. Backup the OpenStack Telemetry database:

    1. Connect to any controller and get the IP of the MongoDB primary instance:

      # MONGOIP=$(sudo hiera mongodb::server::bind_ip)
    2. Create the backup:

      # mkdir -p /var/tmp/mongo_backup/
      # mongodump --oplog --host $MONGOIP --out /var/tmp/mongo_backup/
    3. Copy the database dump in /var/tmp/mongo_backup/ to a secure location.
  3. Backup the Redis cluster:

    1. Obtain the Redis endpoint from HAProxy:

      # REDISIP=$(sudo hiera redis_vip)
    2. Obtain the master password for the Redis cluster:

      # REDISPASS=$(sudo hiera redis::masterauth)
    3. Check connectivity to the Redis cluster:

      # redis-cli -a $REDISPASS -h $REDISIP ping
    4. Dump the Redis database:

      # redis-cli -a $REDISPASS -h $REDISIP bgsave

      This stores the database backup in the default /var/lib/redis/ directory. Copy this database dump to a secure location.

  4. Backup the filesystem on each Controller node:

    1. Create a directory for the backup:

      # mkdir -p /var/tmp/filesystem_backup/
    2. Run the following tar command:

      # tar --ignore-failed-read --xattrs \
          -zcvf /var/tmp/filesystem_backup/fs_backup-`date '+%Y-%m-%d-%H-%M-%S'`.tar.gz \
          /var/lib/config-data \
          /var/log/containers \
          /etc/corosync \
          /etc/logrotate.d \
          /etc/openvswitch \
          /var/log/openvswitch \
          /srv/node \
          /home/heat-admin

      The --ignore-failed-read option ignores any missing directories, which is useful if certain services are not used or separated on their own custom roles.

  5. Copy the resulting tar file to a secure location.

2.3. Performing a minor update of an undercloud

The director provides commands to update the packages on the undercloud node. This allows you to perform a minor update within the current version of your OpenStack Platform environment.

Procedure

  1. Log into the director as the stack user.
  2. Update the python-tripleoclient package and its dependencies to ensure you have the latest scripts for the minor version update:

    $ sudo yum update -y python-tripleoclient
  3. The director uses the openstack undercloud upgrade command to update the Undercloud environment. Run the command:

    $ openstack undercloud upgrade
  4. Wait until the undercloud upgrade process completes.
  5. Reboot the undercloud to update the operating system’s kernel and other system packages:

    $ sudo reboot
  6. Wait until the node boots.

2.4. Performing a minor update of a containerized overcloud

The director provides commands to update the packages on all overcloud nodes. This allows you to perform a minor update within the current version of your OpenStack Platform environment.

Procedure

  1. Find the latest tag for the containerized service images:

    $ openstack overcloud container image tag discover \
      --image registry.access.redhat.com/rhosp12/openstack-base:latest \
      --tag-from-label version-release

    Make a note of the most recent tag.

  2. Create an updated environment file for your container image source. Run using the openstack overcloud container image prepare command. For example, to use images from registry.access.redhat.com:

    $ openstack overcloud container image prepare \
      --namespace=registry.access.redhat.com/rhosp12 \
      --prefix=openstack- \
      --tag [TAG] \ 1
      --set ceph_namespace=registry.access.redhat.com/rhceph \
      --set ceph_image=rhceph-2-rhel7 \
      --set ceph_tag=latest \
      --env-file=/home/stack/templates/overcloud_images.yaml \
      -e /home/stack/templates/custom_environment_file.yaml 2
    1
    Replace [TAG] with the tag obtained from the previous step.
    2
    Include all additional environment files with the -e parameter. The director checks the custom resources in all included environment files and identifies the container images required for the containerized services.

    For more information about generating this environment file for different source types, see "Configuring a container image source" in the Director Installation and Usage guide.

  3. Run the openstack overcloud update stack command to update the container image locations in your overcloud:

    $ openstack overcloud update stack --init-minor-update \
      --container-registry-file /home/stack/templates/overcloud_images.yaml

    The --init-minor-update only performs an update of the parameters in the overcloud stack. It does not perform the actual package or container update. Wait until this command completes.

  4. Perform a package and container update using the openstack overcloud update command. Using the --nodes option to upgrade node for each role. For example, the following command updates nodes in the Controller role

    $ openstack overcloud update stack --nodes Controller

    Run this command for each role group in the following order:

    • Controller
    • CephStorage
    • Compute
    • ObjectStorage
    • Any custom roles such as Database, MessageBus, Networker, and so forth.
  5. The update process starts for the chosen role starts. The director uses an Ansible playbook to perform the update and displays the output of each task.
  6. Update the next role group. Repeat until you have updated all nodes.

2.5. Rebooting controller and composable nodes

The following procedure reboots controller nodes and standalone nodes based on composable roles. This excludes Compute nodes and Ceph Storage nodes.

Procedure

  1. Select a node to reboot. Log into it and stop the cluster before rebooting:

    [heat-admin@overcloud-controller-0 ~]$ sudo pcs cluster stop
  2. Reboot the node:

    [heat-admin@overcloud-controller-0 ~]$ sudo reboot
  3. Wait until the node boots.
  4. Re-enable the cluster for the node:

    [heat-admin@overcloud-controller-0 ~]$ sudo pcs cluster start
  5. Log into the node and check the services. For example:

    1. If the node uses Pacemaker services, check the node has rejoined the cluster:

      [heat-admin@overcloud-controller-0 ~]$ sudo pcs status
    2. If the node uses Systemd services, check all services are enabled:

      [heat-admin@overcloud-controller-0 ~]$ sudo systemctl status

2.6. Rebooting a Ceph Storage (OSD) cluster

The following procedure reboots a cluster of Ceph Storage (OSD) nodes.

Procedure

  1. Log into a Ceph MON or Controller node and disable Ceph Storage cluster rebalancing temporarily:

    $ sudo ceph osd set noout
    $ sudo ceph osd set norebalance
  2. Select the first Ceph Storage node to reboot and log into it.
  3. Reboot the node:

    $ sudo reboot
  4. Wait until the node boots.
  5. Log into the node and check the cluster status:

    $ sudo ceph -s

    Check that the pgmap reports all pgs as normal (active+clean).

  6. Log out of the node, reboot the next node, and check its status. Repeat this process until you have rebooted all Ceph storage nodes.
  7. When complete, log into a Ceph MON or Controller node and enable cluster rebalancing again:

    $ sudo ceph osd unset noout
    $ sudo ceph osd unset norebalance
  8. Perform a final status check to verify the cluster reports HEALTH_OK:

    $ sudo ceph status

2.7. Rebooting compute nodes

The following procedure reboots Compute nodes. To ensure minimal downtime of instances in your OpenStack Platform environment, this procedure also includes instructions on migrating instances from the chosen Compute node. This involves the following workflow:

  • Select a Compute node to reboot and disable it so that it does not provision new instances
  • Migrate the instances to another Compute node
  • Reboot the empty Compute node and enable it

Procedure

  1. Log into the undercloud as the stack user.
  2. List all Compute nodes and their UUIDs:

    $ source ~/stackrc
    (undercloud) $ openstack server list --name compute

    Identify the UUID of the Compute node you aim to reboot.

  3. From the undercloud, select a Compute Node and disable it:

    $ source ~/overcloudrc
    (overcloud) $ openstack compute service list
    (overcloud) $ openstack compute service set [hostname] nova-compute --disable
  4. List all instances on the Compute node:

    (overcloud) $ openstack server list --host [hostname] --all-projects
  5. Use one of the following commands to migrate your instances:

    1. Migrate the instance to a specific host of your choice:

      (overcloud) $ openstack server migrate [instance-id] --live [target-host]--wait
    2. Let nova-scheduler automatically select the target host:

      (overcloud) $ nova live-migration [instance-id]
    3. Live migrate all instances at once:

      $ nova host-evacuate-live [hostname]
      Note

      The nova command might cause some deprecation warnings, which are safe to ignore.

  6. Wait until migration completes.
  7. Confirm the migration was successful:

    (overcloud) $ openstack server list --host [hostname] --all-projects
  8. Continue migrating instances until none remain on the chosen Compute Node.
  9. Log into the Compute Node and reboot it:

    [heat-admin@overcloud-compute-0 ~]$ sudo reboot
  10. Wait until the node boots.
  11. Enable the Compute Node again:

    $ source ~/overcloudrc
    (overcloud) $ openstack compute service set [hostname] nova-compute --enable
  12. Check whether the Compute node is enabled:

    (overcloud) $ openstack compute service list

2.8. Validating the undercloud

The following is a set of steps to check the functionality of your undercloud.

Procedure

  1. Source the undercloud access details:

    $ source ~/stackrc
  2. Check for failed Systemd services:

    (undercloud) $ sudo systemctl list-units --state=failed 'openstack*' 'neutron*' 'httpd' 'docker'
  3. Check the undercloud free space:

    (undercloud) $ df -h

    Use the "Undercloud Reqirements" as a basis to determine if you have adequate free space.

  4. If you have NTP installed on the undercloud, check that clocks are synchronized:

    (undercloud) $ sudo ntpstat
  5. Check the undercloud network services:

    (undercloud) $ openstack network agent list

    All agents should be Alive and their state should be UP.

  6. Check the undercloud compute services:

    (undercloud) $ openstack compute service list

    All agents' status should be enabled and their state should be up

Related Information

2.9. Validating a containerized overcloud

The following is a set of steps to check the functionality of your containerized overcloud.

Procedure

  1. Source the undercloud access details:

    $ source ~/stackrc
  2. Check the status of your bare metal nodes:

    (undercloud) $ openstack baremetal node list

    All nodes should have a valid power state (on) and maintenance mode should be false.

  3. Check for failed Systemd services:

    (undercloud) $ for NODE in $(openstack server list -f value -c Networks | cut -d= -f2); do echo "=== $NODE ===" ; ssh heat-admin@$NODE "sudo systemctl list-units --state=failed 'openstack*' 'neutron*' 'httpd' 'docker' 'ceph*'" ; done
  4. Check for failed containerized services:

    (undercloud) $ for NODE in $(openstack server list -f value -c Networks | cut -d= -f2); do echo "=== $NODE ===" ; ssh heat-admin@$NODE "sudo docker ps -f 'exited=1' --all" ; done
    (undercloud) $ for NODE in $(openstack server list -f value -c Networks | cut -d= -f2); do echo "=== $NODE ===" ; ssh heat-admin@$NODE "sudo docker ps -f 'status=dead' -f 'status=restarting'" ; done
  5. Check the HAProxy connection to all services. Obtain the Control Plane VIP address and authentication details for the haproxy.stats service:

    (undercloud) $ NODE=$(openstack server list --name controller-0 -f value -c Networks | cut -d= -f2); ssh heat-admin@$NODE sudo 'grep "listen haproxy.stats" -A 6 /var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg'

    Use these details in the following cURL request:

    (undercloud) $ curl -s -u admin:<PASSWORD> "http://<IP ADDRESS>:1993/;csv" | egrep -vi "(frontend|backend)" | awk -F',' '{ print $1" "$2" "$18 }'

    Replace <PASSWORD> and <IP ADDRESS> details with the respective details from the haproxy.stats service. The resulting list shows the OpenStack Platform services on each node and their connection status.

  6. Check overcloud database replication health:

    (undercloud) $ for NODE in $(openstack server list --name controller -f value -c Networks | cut -d= -f2); do echo "=== $NODE ===" ; ssh heat-admin@$NODE "sudo docker exec clustercheck clustercheck" ; done
  7. Check RabbitMQ cluster health:

    (undercloud) $ for NODE in $(openstack server list --name controller -f value -c Networks | cut -d= -f2); do echo "=== $NODE ===" ; ssh heat-admin@$NODE "sudo docker exec $(ssh heat-admin@$NODE "sudo docker ps -f 'name=.*rabbitmq.*' -q") rabbitmqctl node_health_check" ; done
  8. Check Pacemaker resource health:

    (undercloud) $ NODE=$(openstack server list --name controller-0 -f value -c Networks | cut -d= -f2); ssh heat-admin@$NODE "sudo pcs status"

    Look for:

    • All cluster nodes online.
    • No resources stopped on any cluster nodes.
    • No failed pacemaker actions.
  9. Check the disk space on each overcloud node:

    (undercloud) $ for NODE in $(openstack server list -f value -c Networks | cut -d= -f2); do echo "=== $NODE ===" ; ssh heat-admin@$NODE "sudo df -h --output=source,fstype,avail -x overlay -x tmpfs -x devtmpfs" ; done
  10. Check overcloud Ceph Storage cluster health. The following command runs the ceph tool on a Controller node to check the cluster:

    (undercloud) $ NODE=$(openstack server list --name controller-0 -f value -c Networks | cut -d= -f2); ssh heat-admin@$NODE "sudo ceph -s"
  11. Check Ceph Storage OSD for free space. The following command runs the ceph tool on a Controller node to check the free space:

    (undercloud) $ NODE=$(openstack server list --name controller-0 -f value -c Networks | cut -d= -f2); ssh heat-admin@$NODE "sudo ceph df"
  12. Check that clocks are synchronized on overcloud nodes

    (undercloud) $ for NODE in $(openstack server list -f value -c Networks | cut -d= -f2); do echo "=== $NODE ===" ; ssh heat-admin@$NODE "sudo ntpstat" ; done
  13. Source the overcloud access details:

    (undercloud) $ source ~/overcloudrc
  14. Check the overcloud network services:

    (overcloud) $ openstack network agent list

    All agents should be Alive and their state should be UP.

  15. Check the overcloud compute services:

    (overcloud) $ openstack compute service list

    All agents' status should be enabled and their state should be up

  16. Check the overcloud volume services:

    (overcloud) $ openstack volume service list

    All agents' status should be enabled and their state should be up.

Related Information