Red Hat Training

A Red Hat training course is available for Red Hat OpenStack Platform

Chapter 2. Preparing for an OpenStack Platform upgrade

This process prepares your OpenStack Platform environment. This involves the following steps:

  • Backing up both the undercloud and overcloud.
  • Updating the undercloud to the latest minor version of OpenStack Platform 10, including the latest Open vSwitch.
  • Rebooting the undercloud in case a newer kernel or newer system packages are installed.
  • Updating the overcloud to the latest minor version of OpenStack Platform 10, including the latest Open vSwitch.
  • Rebooting the overcloud nodes in case a newer kernel or newer system packages are installed.
  • Performing validation checks on both the undercloud and overcloud.

These procedures ensure your OpenStack Platform environment is in the best possible state before proceeding with the upgrade.

2.1. Creating a baremetal Undercloud backup

A full undercloud backup includes the following databases and files:

  • All MariaDB databases on the undercloud node
  • MariaDB configuration file on the undercloud (so that you can accurately restore databases)
  • The configuration data: /etc
  • Log data: /var/log
  • Image data: /var/lib/glance
  • Certificate generation data if using SSL: /var/lib/certmonger
  • Any container image data: /var/lib/docker and /var/lib/registry
  • All swift data: /srv/node
  • All data in the stack user home directory: /home/stack
Note

Confirm that you have sufficient disk space available on the undercloud before performing the backup process. Expect the archive file to be at least 3.5 GB, if not larger.

Procedure

  1. Log into the undercloud as the root user.
  2. Back up the database:

    [root@director ~]# mysqldump --opt --all-databases > /root/undercloud-all-databases.sql
  3. Create a backup directory and change the user ownership of the directory to the stack user:

    [root@director ~]# mkdir /backup
    [root@director ~]# chown stack: /backup

    You will use this directory to store the archive containing the undercloud database and file system.

  4. Change to the backup directory

    [root@director ~]# cd /backup
  5. Archive the database backup and the configuration files:

    [root@director ~]# tar --xattrs --xattrs-include='*.*' --ignore-failed-read -cf \
        undercloud-backup-$(date +%F).tar \
        /root/undercloud-all-databases.sql \
        /etc \
        /var/log \
        /var/lib/glance \
        /var/lib/certmonger \
        /var/lib/docker \
        /var/lib/registry \
        /srv/node \
        /root \
        /home/stack
    • The --ignore-failed-read option skips any directory that does not apply to your undercloud.
    • The --xattrs and --xattrs-include='.' options include extended attributes, which are required to store metadata for Object Storage (swift) and SELinux.

    This creates a file named undercloud-backup-<date>.tar.gz, where <date> is the system date. Copy this tar file to a secure location.

Related Information

2.2. Backing up the overcloud control plane services

The following procedure creates a backup of the overcloud databases and configuration. A backup of the overcloud database and services ensures you have a snapshot of a working environment. Having this snapshot helps in case you need to restore the overcloud to its original state in case of an operational failure.

Important

This procedure only includes crucial control plane services. It does not include backups of Compute node workloads, data on Ceph Storage nodes, nor any additional services.

Procedure

  1. Perform the database backup:

    1. Log into a Controller node. You can access the overcloud from the undercloud:

      $ ssh heat-admin@192.0.2.100
    2. Change to the root user:

      $ sudo -i
    3. Create a temporary directory to store the backups:

      # mkdir -p /var/tmp/mysql_backup/
    4. Obtain the database password and store it in the MYSQLDBPASS environment variable. The password is stored in the mysql::server::root_password variable within the /etc/puppet/hieradata/service_configs.json file. Use the following command to store the password:

      # MYSQLDBPASS=$(sudo hiera -c /etc/puppet/hiera.yaml mysql::server::root_password)
    5. Backup the database:

      # mysql -uroot -p$MYSQLDBPASS -s -N -e "select distinct table_schema from information_schema.tables where engine='innodb' and table_schema != 'mysql';" | xargs mysqldump -uroot -p$MYSQLDBPASS --single-transaction --databases > /var/tmp/mysql_backup/openstack_databases-$(date +%F)-$(date +%T).sql

      This dumps a database backup called /var/tmp/mysql_backup/openstack_databases-<date>.sql where <date> is the system date and time. Copy this database dump to a secure location.

    6. Backup all the users and permissions information:

      # mysql -uroot -p$MYSQLDBPASS -s -N -e "SELECT CONCAT('\"SHOW GRANTS FOR ''',user,'''@''',host,''';\"') FROM mysql.user where (length(user) > 0 and user NOT LIKE 'root')" | xargs -n1 mysql -uroot -p$MYSQLDBPASS -s -N -e | sed 's/$/;/' > /var/tmp/mysql_backup/openstack_databases_grants-$(date +%F)-$(date +%T).sql

      This dumps a database backup called /var/tmp/mysql_backup/openstack_databases_grants-<date>.sql where <date> is the system date and time. Copy this database dump to a secure location.

  2. Backup the Pacemaker configuration:

    1. Log into a Controller node.
    2. Run the following command to create an archive of the current Pacemaker configuration:

      # sudo pcs config backup pacemaker_controller_backup
    3. Copy the resulting archive (pacemaker_controller_backup.tar.bz2) to a secure location.
  3. Backup the OpenStack Telemetry database:

    1. Connect to any controller and get the IP of the MongoDB primary instance:

      # MONGOIP=$(sudo hiera -c /etc/puppet/hiera.yaml mongodb::server::bind_ip)
    2. Create the backup:

      # mkdir -p /var/tmp/mongo_backup/
      # mongodump --oplog --host $MONGOIP --out /var/tmp/mongo_backup/
    3. Copy the database dump in /var/tmp/mongo_backup/ to a secure location.
  4. Backup the Redis cluster:

    1. Obtain the Redis endpoint from HAProxy:

      # REDISIP=$(sudo hiera -c /etc/puppet/hiera.yaml redis_vip)
    2. Obtain the master password for the Redis cluster:

      # REDISPASS=$(sudo hiera -c /etc/puppet/hiera.yaml redis::masterauth)
    3. Check connectivity to the Redis cluster:

      # redis-cli -a $REDISPASS -h $REDISIP ping
    4. Dump the Redis database:

      # redis-cli -a $REDISPASS -h $REDISIP bgsave

      This stores the database backup in the default /var/lib/redis/ directory. Copy this database dump to a secure location.

  5. Backup the filesystem on each Controller node:

    1. Create a directory for the backup:

      # mkdir -p /var/tmp/filesystem_backup/
    2. Run the following tar command:

      # tar --acls --ignore-failed-read --xattrs --xattrs-include='*.*' \
          -zcvf /var/tmp/filesystem_backup/`hostname`-filesystem-`date '+%Y-%m-%d-%H-%M-%S'`.tar \
          /etc \
          /srv/node \
          /var/log \
          /var/lib/nova \
          --exclude /var/lib/nova/instances \
          /var/lib/glance \
          /var/lib/keystone \
          /var/lib/cinder \
          /var/lib/heat \
          /var/lib/heat-config \
          /var/lib/heat-cfntools \
          /var/lib/rabbitmq \
          /var/lib/neutron \
          /var/lib/haproxy \
          /var/lib/openvswitch \
          /var/lib/redis \
          /var/lib/os-collect-config \
          /usr/libexec/os-apply-config \
          /usr/libexec/os-refresh-config \
          /home/heat-admin

      The --ignore-failed-read option ignores any missing directories, which is useful if certain services are not used or separated on their own custom roles.

    3. Copy the resulting tar file to a secure location.
  6. Archive deleted rows on the overcloud:

    1. Check for archived deleted instances:

      $ source ~/overcloudrc
      $ nova list --all-tenants --deleted
    2. If there are no archived deleted instances, then archive the deleted instances by entering the following command on one of the overcloud Controller nodes:

      # su - nova -s /bin/bash -c "nova-manage --debug db archive_deleted_rows --max_rows 1000"

      Rerun this command until you have archived all deleted instances.

    3. Purge all the archived deleted instances by entering the following command on one of the overcloud Controller nodes:

      # su - nova -s /bin/bash -c "nova-manage --debug db purge --all --all-cells"
    4. Verify that there are no remaining archived deleted instances:

      $ nova list --all-tenants --deleted

Related Information

2.3. Updating the current undercloud packages for OpenStack Platform 10.z

The director provides commands to update the packages on the undercloud node. This allows you to perform a minor update within the current version of your OpenStack Platform environment. This is a minor update within OpenStack Platform 10.

Note

This step also updates the undercloud operating system to the latest version of Red Hat Enterprise Linux 7 and Open vSwitch to version 2.9.

Procedure

  1. Log in to the undercloud as the stack user.
  2. Stop the main OpenStack Platform services:

    $ sudo systemctl stop 'openstack-*' 'neutron-*' httpd
    Note

    This causes a short period of downtime for the undercloud. The overcloud is still functional during the undercloud upgrade.

  3. Set the RHEL version to RHEL 7.7:

    $ sudo subscription-manager release --set=7.7
  4. Update the python-tripleoclient package and its dependencies to ensure you have the latest scripts for the minor version update:

    $ sudo yum update -y python-tripleoclient
  5. Run the openstack undercloud upgrade command:

    $ openstack undercloud upgrade
  6. Wait until the command completes its execution.
  7. Reboot the undercloud to update the operating system’s kernel and other system packages:

    $ sudo reboot
  8. Wait until the node boots.
  9. Log into the undercloud as the stack user.

In addition to undercloud package updates, it is recommended to keep your overcloud images up to date to keep the image configuration in sync with the latest openstack-tripleo-heat-template package. This ensures successful deployment and scaling operations in between the current preparation stage and the actual fast forward upgrade. The next section shows how to update your images in this scenario. If you aim to immediately upgrade your environment after preparing your environment, you can skip the next section.

2.4. Preparing updates for NFV-enabled environments

If your environment has network function virtualization (NFV) enabled, follow these steps after you update your undercloud, and before you update your overcloud.

Procedure

  1. Change the vhost user socket directory in a custom environment file, for example, network-environment.yaml:

    parameter_defaults:
      NeutronVhostuserSocketDir: "/var/lib/vhost_sockets"
  2. Add the ovs-dpdk-permissions.yaml file to your openstack overcloud deploy command to configure the qemu group setting as hugetlbfs for OVS-DPDK:

     -e environments/ovs-dpdk-permissions.yaml
  3. Ensure that vHost user ports for all instances are in dpdkvhostuserclient mode. For more information see Manually changing the vhost user port mode.

2.5. Updating the current overcloud images for OpenStack Platform 10.z

The undercloud update process might download new image archives from the rhosp-director-images and rhosp-director-images-ipa packages. This process updates these images on your undercloud within Red Hat OpenStack Platform 10.

Prerequisites

  • You have updated to the latest minor release of your current undercloud version.

Procedure

  1. Check the yum log to determine if new image archives are available:

    $ sudo grep "rhosp-director-images" /var/log/yum.log
  2. If new archives are available, replace your current images with new images. To install the new images, first remove any existing images from the images directory on the stack user’s home (/home/stack/images):

    $ rm -rf ~/images/*
  3. On the undercloud node, source the undercloud credentials:

    $ source ~/stackrc
  4. Extract the archives:

    $ cd ~/images
    $ for i in /usr/share/rhosp-director-images/overcloud-full-latest-10.0.tar /usr/share/rhosp-director-images/ironic-python-agent-latest-10.0.tar; do tar -xvf $i; done
  5. Import the latest images in to director and configure nodes to use the new images:

    $ cd ~/images
    $ openstack overcloud image upload --update-existing --image-path /home/stack/images/
    $ openstack overcloud node configure $(openstack baremetal node list -c UUID -f csv --quote none | sed "1d" | paste -s -d " ")
  6. To finalize the image update, verify the existence of the new images:

    $ openstack image list
    $ ls -l /httpboot

    Director also retains the old images and renames them using the timestamp of when they were updated. If you no longer need these images, delete them.

Director is now updated and using the latest images. You do not need to restart any services after the update.

The undercloud is now using updated OpenStack Platform 10 packages. Next, update the overcloud to the latest minor release.

2.6. Updating the current overcloud packages for OpenStack Platform 10.z

The director provides commands to update the packages on all overcloud nodes. This allows you to perform a minor update within the current version of your OpenStack Platform environment. This is a minor update within Red Hat OpenStack Platform 10.

Note

This step also updates the overcloud nodes' operating system to the latest version of Red Hat Enterprise Linux 7 and Open vSwitch to version 2.9.

Prerequisites

  • You have updated to the latest minor release of your current undercloud version.
  • You have performed a backup of the overcloud.

Procedure

  1. Check your subscription management configuration for the rhel_reg_release parameter. If this parameter is not set, you must include it and set it version 7.7:

    parameter_defaults:
      ...
      rhel_reg_release: "7.7"

    Ensure that you save the changes to the overcloud subscription management environment file.

  2. Update the current plan using your original openstack overcloud deploy command and including the --update-plan-only option. For example:

    $ openstack overcloud deploy --update-plan-only \
      --templates  \
      -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
      -e /home/stack/templates/network-environment.yaml \
      -e /home/stack/templates/storage-environment.yaml \
      -e /home/stack/templates/rhel-registration/environment-rhel-registration.yaml \
      [-e <environment_file>|...]

    The --update-plan-only only updates the Overcloud plan stored in the director. Use the -e option to include environment files relevant to your Overcloud and its update path. The order of the environment files is important as the parameters and resources defined in subsequent environment files take precedence. Use the following list as an example of the environment file order:

    • Any network isolation files, including the initialization file (environments/network-isolation.yaml) from the heat template collection and then your custom NIC configuration file.
    • Any external load balancing environment files.
    • Any storage environment files.
    • Any environment files for Red Hat CDN or Satellite registration.
    • Any other custom environment files.
  3. Create a static inventory file of your overcloud:

    $ tripleo-ansible-inventory --ansible_ssh_user heat-admin --static-yaml-inventory ~/inventory.yaml

    If you use an overcloud name different to the default overcloud name of overcloud, set the name of your overcloud with the --plan option.

  4. Create a playbook that contains a task to set the operating system version to Red Hat Enterprise Linux 7.7 on all nodes:

    $ cat > ~/set_release.yaml <<'EOF'
    - hosts: all
      gather_facts: false
      tasks:
        - name: set release to 7.7
          command: subscription-manager release --set=7.7
          become: true
    EOF
  5. Run the set_release.yaml playbook:

    $ ansible-playbook -i ~/inventory.yaml -f 25 ~/set_release.yaml --limit undercloud,Controller,Compute

    Use the --limit option to apply the content to all Red Hat OpenStack Platform nodes.

  6. Perform a package update on all nodes using the openstack overcloud update command:

    $ openstack overcloud update stack -i overcloud

    The -i runs an interactive mode to update each node sequentially. When the update process completes a node update, the script provides a breakpoint for you to confirm. Without the -i option, the update remains paused at the first breakpoint. Therefore, it is mandatory to include the -i option.

    The script performs the following functions:

    1. The script runs on nodes one-by-one:

      1. For Controller nodes, this means a full package update.
      2. For other nodes, this means an update of Puppet modules only.
    2. Puppet runs on all nodes at once:

      1. For Controller nodes, the Puppet run synchronizes the configuration.
      2. For other nodes, the Puppet run updates the rest of the packages and synchronizes the configuration.
  7. The update process starts. During this process, the director reports an IN_PROGRESS status and periodically prompts you to clear breakpoints. For example:

    starting package update on stack overcloud
    IN_PROGRESS
    IN_PROGRESS
    WAITING
    on_breakpoint: [u'overcloud-compute-0', u'overcloud-controller-2', u'overcloud-controller-1', u'overcloud-controller-0']
    Breakpoint reached, continue? Regexp or Enter=proceed (will clear 49913767-e2dd-4772-b648-81e198f5ed00), no=cancel update, C-c=quit interactive mode:

    Press Enter to clear the breakpoint from last node on the on_breakpoint list. This begins the update for that node.

  8. The script automatically predefines the update order of nodes:

    • Each Controller node individually
    • Each individual Compute node individually
    • Each Ceph Storage node individually
    • All other nodes individually

    It is recommended to use this order to ensure a successful update, specifically:

    1. Clear the breakpoint of each Controller node individually. Each Controller node requires an individual package update in case the node’s services must restart after the update. This reduces disruption to highly available services on other Controller nodes.
    2. After the Controller node update, clear the breakpoints for each Compute node. You can also type a Compute node name to clear a breakpoint on a specific node or use a Python-based regular expression to clear breakpoints on multiple Compute nodes at once.
    3. Clear the breakpoints for each Ceph Storage nodes. You can also type a Ceph Storage node name to clear a breakpoint on a specific node or use a Python-based regular expression to clear breakpoints on multiple Ceph Storage nodes at once.
    4. Clear any remaining breakpoints to update the remaining nodes. You can also type a node name to clear a breakpoint on a specific node or use a Python-based regular expression to clear breakpoints on multiple nodes at once.
    5. Wait until all nodes have completed their update.
  9. The update command reports a COMPLETE status when the update completes:

    ...
    IN_PROGRESS
    IN_PROGRESS
    IN_PROGRESS
    COMPLETE
    update finished with status COMPLETE
  10. If you configured fencing for your Controller nodes, the update process might disable it. When the update process completes, re-enable fencing with the following command on one of the Controller nodes:

    $ sudo pcs property set stonith-enabled=true

The update process does not reboot any nodes in the Overcloud automatically. Updates to the kernel and other system packages require a reboot. Check the /var/log/yum.log file on each node to see if either the kernel or openvswitch packages have updated their major or minor versions. If they have, reboot each node using the following procedures.

2.7. Rebooting controller and composable nodes

The following procedure reboots controller nodes and standalone nodes based on composable roles. This excludes Compute nodes and Ceph Storage nodes.

Procedure

  1. Log in to the node that you want to reboot.
  2. Optional: If the node uses Pacemaker resources, stop the cluster:

    [heat-admin@overcloud-controller-0 ~]$ sudo pcs cluster stop
  3. Reboot the node:

    [heat-admin@overcloud-controller-0 ~]$ sudo reboot
  4. Wait until the node boots.
  5. Check the services. For example:

    1. If the node uses Pacemaker services, check that the node has rejoined the cluster:

      [heat-admin@overcloud-controller-0 ~]$ sudo pcs status
    2. If the node uses Systemd services, check that all services are enabled:

      [heat-admin@overcloud-controller-0 ~]$ sudo systemctl status
    3. Repeat these steps for all Controller and composable nodes.

2.8. Rebooting a Ceph Storage (OSD) cluster

The following procedure reboots a cluster of Ceph Storage (OSD) nodes.

Procedure

  1. Log in to a Ceph MON or Controller node and disable Ceph Storage cluster rebalancing temporarily:

    $ sudo ceph osd set noout
    $ sudo ceph osd set norebalance
  2. Select the first Ceph Storage node to reboot and log into it.
  3. Reboot the node:

    $ sudo reboot
  4. Wait until the node boots.
  5. Log in to a Ceph MON or Controller node and check the cluster status:

    $ sudo ceph -s

    Check that the pgmap reports all pgs as normal (active+clean).

  6. Log out of the Ceph MON or Controller node, reboot the next Ceph Storage node, and check its status. Repeat this process until you have rebooted all Ceph storage nodes.
  7. When complete, log into a Ceph MON or Controller node and enable cluster rebalancing again:

    $ sudo ceph osd unset noout
    $ sudo ceph osd unset norebalance
  8. Perform a final status check to verify the cluster reports HEALTH_OK:

    $ sudo ceph status

2.9. Rebooting Compute nodes

Rebooting a Compute node involves the following workflow:

  • Select a Compute node to reboot and disable it so that it does not provision new instances.
  • Migrate the instances to another Compute node to minimise instance downtime.
  • Reboot the empty Compute node and enable it.

Procedure

  1. Log in to the undercloud as the stack user.
  2. To identify the Compute node that you intend to reboot, list all Compute nodes:

    $ source ~/stackrc
    (undercloud) $ openstack server list --name compute
  3. From the overcloud, select a Compute Node and disable it:

    $ source ~/overcloudrc
    (overcloud) $ openstack compute service list
    (overcloud) $ openstack compute service set <hostname> nova-compute --disable
  4. List all instances on the Compute node:

    (overcloud) $ openstack server list --host <hostname> --all-projects
  5. Migrate your instances. For more information on migration strategies, see Migrating virtual machines between Compute nodes.
  6. Log into the Compute Node and reboot it:

    [heat-admin@overcloud-compute-0 ~]$ sudo reboot
  7. Wait until the node boots.
  8. Enable the Compute node:

    $ source ~/overcloudrc
    (overcloud) $ openstack compute service set <hostname> nova-compute --enable
  9. Verify that the Compute node is enabled:

    (overcloud) $ openstack compute service list

2.10. Verifying system packages

Before the upgrade, the undercloud node and all overcloud nodes should be using the latest versions of the following packages:

PackageVersion

openvswitch

At least 2.9

qemu-img-rhev

At least 2.10

qemu-kvm-common-rhev

At least 2.10

qemu-kvm-rhev

At least 2.10

qemu-kvm-tools-rhev

At least 2.10

Procedure

  1. Log into a node.
  2. Run yum to check the system packages:

    $ sudo yum list qemu-img-rhev qemu-kvm-common-rhev qemu-kvm-rhev qemu-kvm-tools-rhev openvswitch
  3. Run ovs-vsctl to check the version currently running:

    $ sudo ovs-vsctl --version
  4. Repeat these steps for all nodes.

The undercloud is now uses updated OpenStack Platform 10 packages. Use the next few procedures to check the system is in a working state.

2.11. Validating an OpenStack Platform 10 undercloud

The following is a set of steps to check the functionality of your Red Hat OpenStack Platform 10 undercloud before an upgrade.

Procedure

  1. Source the undercloud access details:

    $ source ~/stackrc
  2. Check for failed Systemd services:

    $ sudo systemctl list-units --state=failed 'openstack*' 'neutron*' 'httpd' 'docker'
  3. Check the undercloud free space:

    $ df -h

    Use the "Undercloud Requirements" as a basis to determine if you have adequate free space.

  4. If you have NTP installed on the undercloud, check the clock is synchronized:

    $ sudo ntpstat
  5. Check the undercloud network services:

    $ openstack network agent list

    All agents should be Alive and their state should be UP.

  6. Check the undercloud compute services:

    $ openstack compute service list

    All agents' status should be enabled and their state should be up

Related Information

2.12. Validating an OpenStack Platform 10 overcloud

The following is a set of steps to check the functionality of your Red Hat OpenStack Platform 10 overcloud before an upgrade.

Procedure

  1. Source the undercloud access details:

    $ source ~/stackrc
  2. Check the status of your bare metal nodes:

    $ openstack baremetal node list

    All nodes should have a valid power state (on) and maintenance mode should be false.

  3. Check for failed Systemd services:

    $ for NODE in $(openstack server list -f value -c Networks | cut -d= -f2); do echo "=== $NODE ===" ; ssh heat-admin@$NODE "sudo systemctl list-units --state=failed 'openstack*' 'neutron*' 'httpd' 'docker' 'ceph*'" ; done
  4. Check the HAProxy connection to all services. Obtain the Control Plane VIP address and authentication information for the haproxy.stats service:

    $ NODE=$(openstack server list --name controller-0 -f value -c Networks | cut -d= -f2); ssh heat-admin@$NODE sudo 'grep "listen haproxy.stats" -A 6 /etc/haproxy/haproxy.cfg'
  5. Use the connection and authentication information obtained from the previous step to check the connection status of RHOSP services.

    If SSL is not enabled, use these details in the following cURL request:

    $ curl -s -u admin:<PASSWORD> "http://<IP ADDRESS>:1993/;csv" | egrep -vi "(frontend|backend)" | awk -F',' '{ print $1" "$2" "$18 }'

    If SSL is enabled, use these details in the following cURL request:

    curl -s -u admin:<PASSWORD> "https://<HOSTNAME>:1993/;csv" | egrep -vi "(frontend|backend)" | awk -F',' '{ print $1" "$2" "$18 }'

    Replace the <PASSWORD> and <IP ADDRESS> or <HOSTNAME> values with the respective information from the haproxy.stats service. The resulting list shows the OpenStack Platform services on each node and their connection status.

  6. Check overcloud database replication health:

    $ for NODE in $(openstack server list --name controller -f value -c Networks | cut -d= -f2); do echo "=== $NODE ===" ; ssh heat-admin@$NODE "sudo clustercheck" ; done
  7. Check RabbitMQ cluster health:

    $ for NODE in $(openstack server list --name controller -f value -c Networks | cut -d= -f2); do echo "=== $NODE ===" ; ssh heat-admin@$NODE "sudo rabbitmqctl node_health_check" ; done
  8. Check Pacemaker resource health:

    $ NODE=$(openstack server list --name controller-0 -f value -c Networks | cut -d= -f2); ssh heat-admin@$NODE "sudo pcs status"

    Look for:

    • All cluster nodes online.
    • No resources stopped on any cluster nodes.
    • No failed pacemaker actions.
  9. Check the disk space on each overcloud node:

    $ for NODE in $(openstack server list -f value -c Networks | cut -d= -f2); do echo "=== $NODE ===" ; ssh heat-admin@$NODE "sudo df -h --output=source,fstype,avail -x overlay -x tmpfs -x devtmpfs" ; done
  10. Check overcloud Ceph Storage cluster health. The following command runs the ceph tool on a Controller node to check the cluster:

    $ NODE=$(openstack server list --name controller-0 -f value -c Networks | cut -d= -f2); ssh heat-admin@$NODE "sudo ceph -s"
  11. Check Ceph Storage OSD for free space. The following command runs the ceph tool on a Controller node to check the free space:

    $ NODE=$(openstack server list --name controller-0 -f value -c Networks | cut -d= -f2); ssh heat-admin@$NODE "sudo ceph df"
    Important

    The number of placement groups (PGs) for each Ceph object storage daemon (OSD) must not exceed 250 by default. Upgrading Ceph nodes with more PGs per OSD results in a warning state and might fail the upgrade process. You can increase the number of PGs per OSD before you start the upgrade process. For more information about diagnosing and troubleshooting this issue, see the article OpenStack FFU from 10 to 13 times out because Ceph PGs allocated in one or more OSDs is higher than 250.

  12. Check that clocks are synchronized on overcloud nodes

    $ for NODE in $(openstack server list -f value -c Networks | cut -d= -f2); do echo "=== $NODE ===" ; ssh heat-admin@$NODE "sudo ntpstat" ; done
  13. Source the overcloud access details:

    $ source ~/overcloudrc
  14. Check the overcloud network services:

    $ openstack network agent list

    All agents should be Alive and their state should be UP.

  15. Check the overcloud compute services:

    $ openstack compute service list

    All agents' status should be enabled and their state should be up

  16. Check the overcloud volume services:

    $ openstack volume service list

    All agents' status should be enabled and their state should be up.

Related Information

2.13. Finalizing updates for NFV-enabled environments

If your environment has network function virtualization (NFV) enabled, you need to follow these steps after updating your undercloud and overcloud.

Procedure

You need to migrate your existing OVS-DPDK instances to ensure that the vhost socket mode changes from dkdpvhostuser to dkdpvhostuserclient mode in the OVS ports. We recommend that you snapshot existing instances and rebuild a new instance based on that snapshot image. See Manage Instance Snapshots for complete details on instance snapshots.

To snapshot an instance and boot a new instance from the snapshot:

  1. Source the overcloud access details:

    $ source ~/overcloudrc
  2. Find the server ID for the instance you want to take a snapshot of:

    $ openstack server list
  3. Shut down the source instance before you take the snapshot to ensure that all data is flushed to disk:

    $ openstack server stop SERVER_ID
  4. Create the snapshot image of the instance:

    $ openstack image create --id SERVER_ID SNAPSHOT_NAME
  5. Boot a new instance with this snapshot image:

    $ openstack server create --flavor DPDK_FLAVOR --nic net-id=DPDK_NET_ID--image SNAPSHOT_NAME INSTANCE_NAME
  6. Optionally, verify that the new instance status is ACTIVE:

    $ openstack server list

Repeat this procedure for all instances that you need to snapshot and relaunch.

2.14. Retaining YUM history

After completing a minor update of the overcloud, retain the yum history. This information is useful to have in case you need to undo yum transaction for any possible rollback operations.

Procedure

  1. On each node, run the following command to save the entire yum history of the node in a file:

    $ sudo yum history list all > /home/heat-admin/$(hostname)-yum-history-all
  2. On each node, run the following command to save the ID of the last yum history item:

    $ sudo yum history list all | head -n 5 | tail -n 1 | awk '{print $1}' > /home/heat-admin/$(hostname)-yum-history-all-last-id
  3. Copy these files to a secure location.

2.15. Next Steps

With the preparation stage complete, you can now perform an upgrade of the undercloud from 10 to 13 using the steps in Chapter 3, Upgrading the undercloud.