Chapter 2. Planning and preparation for an in-place upgrade

Before you conduct an in-place upgrade of your OpenStack Platform environment, create a plan for the upgrade and accommodate any potential obstacles that might block a successful upgrade.

2.1. Familiarize yourself with Red Hat OpenStack Platform 16.2

Before you perform an upgrade, familiarize yourself with Red Hat OpenStack Platform 16.2 to help you understand the resulting environment and any potential version-to-version changes that might affect your upgrade. To familiarize yourself with Red Hat OpenStack Platform 16.2, follow these suggestions:

2.2. High level changes in Red Hat OpenStack Platform 16.2

The following high-level changes occur during the upgrade to Red Hat OpenStack Platform 16.2:

  • OpenStack Platform director 16.2 configures the overcloud using an Ansible-driven method called config-download. This replaces the standard heat-based configuration method. Director still uses heat to orchestrate provisioning operations.
  • The director installation uses the same method as the overcloud deployment. Therefore, the undercloud also uses openstack-tripleo-heat-templates as a blueprint for installing and configuring each service.
  • The undercloud runs OpenStack services in containers.
  • The undercloud pulls and stores container images through a new method. Instead of pulling container images before deploying the overcloud, the undercloud pulls all relevant container images during the deployment process.
  • The overcloud deployment process includes an Advanced Subscription Management method to register nodes. This method incorporates an Ansible role to register OpenStack Platform nodes. The new method also applies different subscriptions to different node roles if necessary.
  • The overcloud now uses Open Virtual Network (OVN) as the default ML2 mechanism driver. It is possible to migrate your Open vSwitch (OVS) service to OVN, which you perform after the completion of a successful upgrade.
  • The undercloud and overcloud both run on Red Hat Enterprise Linux 8.
  • openstack-tripleo-heat-templates includes a unified composable service template collection in the deployment directory. This directory now includes templates with merged content from both the containerized service and Puppet-based composable service templates.
  • The OpenStack Data Processing service (sahara) is no longer supported.

    Important

    If you have sahara enabled in your Red Hat OpenStack Platform 13 environment, do not continue with this upgrade and contact Red Hat Global Support Services.

  • The OpenStack Telemetry components are deprecated in favor of the Service Telemetry Framework (STF).
  • Starting with Red Hat Enterprise Linux (RHEL) version 8.3, support for the Intel Transactional Synchronization Extensions (TSX) feature is disabled by default. This causes issues with instance live migration between hosts when migrating from hosts that run Red Hat OpenStack Platform 13 with RHEL version 8.2, to hosts that run Red Hat OpenStack Platform 16.2 with RHEL version 8.4.

    Instance live migration fails after you reboot the Compute nodes. To ensure that the upgraded nodes are booted with the TSX feature enabled and that you can successfully live migrate your instances, add tsx=off to your KernelArgs role parameter for the Compute node and reboot the node.

    For more information, see the Red Hat Knowledgebase solution Guidance on Intel TSX impact on OpenStack guests (applies for RHEL 8.3 and above).

2.3. Changes in Red Hat Enterprise Linux 8

The undercloud and overcloud both run on Red Hat Enterprise Linux 8. This includes new tools and functions relevant to the undercloud and overcloud:

  • The undercloud and overcloud use the Red Hat Container Toolkit. Instead of docker to build and control the container lifecycle, Red Hat Enterprise Linux 8 includes buildah to build new container images and podman for container management.
  • Red Hat Enterprise Linux 8 does not include the docker-distribution package. The undercloud now includes a private HTTP registry to provide container images to overcloud nodes.
  • The upgrade process from Red Hat Enterprise Linux 7 to 8 uses the leapp tool.
  • Red Hat Enterprise Linux 8 does not use the ntp service. Instead, Red Hat Enterprise Linux 8 uses chronyd.
  • Red Hat Enterprise Linux 8 includes new versions of high availability tools.

The Red Hat OpenStack Platform 16.2 uses Red Hat Enterprise Linux 8.4 as the base operating system. As a part of the upgrade process, you will upgrade the base operating system of nodes to Red Hat Enterprise Linux 8.4.

For more information about the key differences between Red Hat Enterprise Linux 7 and 8, see Considerations in adopting RHEL 8. For general information about Red Hat Enterprise linux 8, see Product Documentation for Red Hat Enterprise Linux 8.

2.4. Leapp upgrade usage in Red Hat OpenStack Platform

The long-life Red Hat OpenStack Platform upgrade requires a base operating system upgrade from Red Hat Enterprise Linux 7 to Red Hat Enterprise Linux 8. Red Hat Enterprise Linux 7 uses the Leapp utility to perform the upgrade to Red Hat Enterprise Linux 8. To ensure that Leapp and its dependencies are available, verify that the following Red Hat Enterprise Linux 7 repositories are enabled:

  • Red Hat Enterprise Linux 7 Server RPMs x86_64 7Server or Red Hat Enterprise Linux 7 Server RPMs x86_64 7.9

    rhel-7-server-rpms
    x86_64 7Server
    or:
    rhel-7-server-rpms
    x86_64 7.9
  • Red Hat Enterprise Linux 7 Server - Extras RPMs x86_64

    rhel-7-server-extras-rpms
    x86_64

For more information, see Preparing a RHEL 7 system for the upgrade.

The undercloud and overcloud use a separate process for performing the operating system upgrade.

Undercloud process

Run the leapp upgrade manually before you run the openstack undercloud upgrade command. The undercloud upgrade includes instructions for performing the leapp upgrade.

Overcloud process

The overcloud upgrade framework automatically runs the leapp upgrade.

Limitations

For information of potential limitations that might affect your upgrade, see the following sections from the Upgrading from RHEL 7 to RHEL 8 guide:

In particular, you cannot perform a Leapp upgrade on nodes that use encryption of the whole disk or a partition, such as LUKS encryption, or file-system encryption. This limitation affects Ceph OSD nodes that you have configured with the dmcrypt: true parameter.

If any known limitations affect your environment, seek advice from the Red Hat Technical Support Team.

Troubleshooting

For information about troubleshooting potential Leapp issues, see Troubleshooting in Upgrading from RHEL 7 to RHEL 8.

2.5. Supported upgrade scenarios

Before proceeding with the upgrade, check that your overcloud is supported.

Note

If you are uncertain whether a particular scenario not mentioned in these lists is supported, seek advice from the Red Hat Technical Support Team.

Supported scenarios

The following in-place upgrade scenarios are tested and supported.

  • Standard environments with default role types: Controller, Compute, and Ceph Storage OSD
  • Split-Controller composable roles
  • Ceph Storage composable roles, including Ceph Storage custom configurations, such as CephConfigOverrides and CephAnsibleExtraConfig
  • Hyper-Converged Infrastructure: Compute and Ceph Storage OSD services on the same node
  • Environments with Network Functions Virtualization (NFV) technologies: Single-root input/output virtualization (SR-IOV) and Data Plane Development Kit (DPDK)
  • Environments with Instance HA enabled

    Note

    During an upgrade procedure, nova live migrations are supported. However, evacuations initiated by Instance HA are not supported. When you upgrade a Compute node, the node is shut down cleanly and any workload running on the node is not evacuated by Instance HA automatically. Instead, you must perform live migration manually.

Technology preview scenarios

The framework for upgrades is considered a Technology Preview when you use it in conjunction with these features, and therefore is not fully supported by Red Hat. You should only test this scenario in a proof-of-concept environment and not upgrade in a production environment. For more information about Technology Preview features, see Scope of Coverage Details.

  • Edge and Distributed Compute Node (DCN) scenarios

2.6. Considerations for upgrading with external Ceph deployments

If you have deployed a Red Hat Ceph Storage system separately and then used director to deploy and configure OpenStack, you can use the Red Hat OpenStack Platform framework for upgrades to perform an in-place upgrade with external Ceph deployments. This scenario is different from upgrading a Ceph cluster that was deployed using director.

The differences that you must take into account when planning and preparing for an in-place upgrade with external Ceph deployments are the following:

  1. Before you can upgrade your Red Hat OpenStack Platform deployment from version 13 to version 16.2, you must upgrade your Red Hat Ceph Storage cluster from version 3 to version 4. For more information, see Upgrading a Red Hat Ceph Storage cluster in the Red Hat Ceph Storage 4 Installation Guide.
  2. After you upgrade your Red Hat Ceph Storage cluster from version 3 to version 4, Red Hat OpenStack Platform 13 might still run RHCSv3 client components, however these are compatible against the RHCSv4 cluster.
  3. You can follow the upgrade path described in the Framework For Upgrades (13 to 16.2) document, and where applicable, you must complete the conditional steps that support this particular scenario. A conditional step starts with the following statement: "If you are upgrading with external Ceph deployments".
  4. When you upgrade with external Ceph deployments, you install RHCSv4 ceph-ansible as part of the overcloud upgrade process. When you upgrade a Ceph cluster that was deployed using director, you install RHCSv4 ceph-ansible after the overcloud upgrade process is complete.
Important

When you upgrade a Red Hat Ceph Storage cluster from a previous supported version to version 4.2z2, the upgrade completes with the storage cluster in a HEALTH_WARN state with a warning message that states monitors are allowing insecure global_id reclaim. This is due to the patched CVE (CVE-2021-20288), see Ceph HEALTH_WARN with 'mons are allowing insecure global_id reclaim' after install/upgrade to RHCS 4.2z2 (or newer).

Because the HEALTH_WARN state is displayed due to the CVE, it is possible to mute health warnings temporarily. However, there is a risk that if you mute warnings you do not have visibility about potential older and unpatched clients connected to your cluster. For more information about muting health warnings, see Upgrading a Red Hat Ceph Storage cluster in the Red Hat Ceph Storage documentation.

2.7. Known issues that might block an upgrade

Review the following known issues that might affect a successful upgrade.

BZ#2228414 - Missing service_user for nova_compute causes nova hybrid state to fail

A service token is now required for OpenStack Compute (nova) and Openstack Block (cinder) services. During an upgrade from Red Hat OpenStack Platform (RHOSP) 13 to 16.2, if the service token is not configured, live migrations fail with the following error in nova-compute.log:

"2023-xx-xx xx:xx:xx.xxx 8 ERROR oslo_messaging.rpc.server […​] Exception during message handling: cinderclient.exceptions.ClientException: ConflictNovaUsingAttachment: Detach volume from instance XXXXXX using the Compute API (HTTP 409) (Request-ID: req-XXXXXX)"

To avoid this issue, apply the fix from RHBA-2023:5163 - Bug Fix Advisory. You must apply the fix after the undercloud upgrade, but before starting the overcloud adoption.

BZ#1902849 - osp13-osp16.1 ffu fails on clusters previously upgraded from osp8, osp10
Red Hat OpenStack Platform (RHOSP) environments that have been previously upgraded from version RHOSP 10, require the python-docker package to avoid BZ#1902849. For more information, see the Red Hat Knowledgebase solution osp13-osp16.1 ffu fails on older environments missing python-docker package.
BZ#1925078 - RHOSP13-16.1 FFU: Overcloud upgrade hangs in controller after failed attempt with reference to wrong ceph image

Systems that use UEFI boot and a UEFI bootloader in OSP13 might run into an UEFI issue that results in:

  • /etc/fstab not being updated
  • grub-install is incorrectly used on EFI system

For more information, see the Red Hat Knowledgebase solution FFU 13 to 16.1: Leapp fails to update the kernel on UEFI based systems and /etc/fstab does not contain the EFI partition.

If your systems use UEFI, contact Red Hat Technical Support.

BZ#1895887 - ovs+dpdk fail to attach device OvsDpdkHCI

After upgrading with the Leapp utility, the Compute node with OVS-DPDK workload does not function properly. To resolve this issue, perform one of the following steps:

Remove the /etc/modules-load.d/vfio-pci.conf file before you upgrade the Compute node.

or

Restart ovs-vswitchd service on the Compute node after you upgrade the Compute node.

This issue affects RHOSP 16.1.3. For more information, see the Red Hat Knowledgebase solution OVS-DPDK errors after Framework Upgrade from OSP 13 to 16.1 on HCI compute node.

BZ#1923165 - OSP-16.2 (Upgrades)(TripleO) Add a config to disable Intel "TSX" on RHEL-8.3 kernel

Starting with Red Hat Enterprise Linux (RHEL) version 8.3, support for the Intel Transactional Synchronization Extensions (TSX) feature is disabled by default. This causes issues with instance live migration between hosts in the following migration scenario:

  • Migrating from hosts where the TSX kernel argument is enabled to hosts where the TSX kernel argument is disabled.

Live migration can be unsuccessful in Intel hosts that support the TSX feature. For more information about the CPUs that are affected by this issue, see Affected Configurations.

For more information, review the following Red Hat Knowledgebase solution Guidance on Intel TSX impact on OpenStack guests.

BZ#2016144 - FFU 13-16.1: During Leapp upgrade reboot, openvswitch failed to start with error Starting ovsdb-server ovsdb-server: /var/run/openvswitch/ovsdb-server.pid.tmp: create failed (Permission denied)
Red Hat OpenStack Platform (RHOSP) environments that have been upgraded from previous versions might contain unnecessary files in /etc/systemd/system/ovs*. You must remove these files before you begin the overcloud upgrade process from RHOSP 13 to RHOSP 16.2.
BZ#2021525 - openstack overcloud upgrade run times out / HAProxy container fails to start
An upgrade from Red Hat OpenStack Platform (RHOSP) 13 to RHOSP 16.2 might fail during the deployment step because of invalid SELinux labels. For a resolution and more information, see the Red Hat Knowledgebase solution Pacemaker managed services might not restart during an OSP13 - OSP16.x FFU.
BZ#2027787 - Undercloud upgrade to 16.2 fails because of missing dependencies of swtpm
There is a known issue with the advanced-virt-for-rhel-8-x86_64-rpms and advanced-virt-for-rhel-8-x86_64-eus-rpms repositories that prevents a successful upgrade. To disable these repositories before upgrading, see the Red Hat Knowledgebase solution advanced-virt-for-rhel-8-x86_64-rpms are no longer required in OSP 16.2.
BZ#2024447 - Identity service (keystone) password for the placement user was overridden by NovaPassword during FFU RHOSP 13 to 16

During an upgrade from Red Hat OpenStack Platform 13 to 16.2, if you define a value for the NovaPassword parameter but not the PlacementPassword parameter, the NovaPassword parameter overrides the OpenStack Identity service (keystone) password for the placement user. To preserve the Identity service password, do not set the NovaPassword or the PlacementPassword in the parameter_defaults section.

If you set both passwords in the parameter_defaults section, the Compute nodes might not be able to communicate with the control plane until they are upgraded. For more information about upgrading Compute nodes, see Upgrading Compute nodes.

Additionally, if you deployed the overcloud on RHOSP 13 by using the NovaPassword, PlacementPassword, or both, you must remove those passwords from the template and run the openstack overcloud deploy command on RHOSP 13 before upgrading to RHOSP 16.2.

BZ#2141186 - Live migration fails due to qemu error during in-place upgrade

During or after an in-place upgrade from Red Hat OpenStack Platform (RHOSP) 13 to RHOSP 16.2, live migration between 16.2 Compute nodes fails on instances with the following configuration:

  • Multi-queue is enabled.
  • The number of allocated vcps is 9 or more.
  • The instance is running on RHOSP 13.

To successfully migrate your Compute nodes during an upgrade, add the following parameter to your custom environment file:

parameter_defaults:
  ComputeExtraConfig:
    nova::compute::libvirt::max_queues: 8

Include your updated custom environment file when you run the following commands during the upgrade:

  • openstack overcloud upgrade prepare
  • openstack overcloud upgrade converge

Optionally, after you complete the upgrade, include the custom environment file with the parameter when you run the openstack overcloud deploy command.

For more information, see the Red Hat Knowledgebase solution Live migration fails due to qemu error in in-place upgrades environment.

BZ#2141393 - cephvolumescan actor fails

If your environment includes both Ceph and non-Ceph containers, the Leapp upgrade fails because the cephvolumescan actor cannot retrieve the ceph volumes list.

To disable the cephvolumescan actor and complete the Leapp upgrade, add the following parameter to your template:

parameter_defaults:
  LeappActorsToRemove: ['cephvolumescan']
BZ#2164396 - FFU: Redhat satellite tools repository to be enabled for FFU (13 to 16.2)
If you are using Satellite version 6.7, the upgrade fails when you enable the Red Hat Satellite Tools for RHEL 8 Server RPMs x86_64 repository. The failure occurs because the appropriate packages cannot be installed. The Red Hat engineering team is investigating a solution to this issue.
BZ#2245602 - Upgrade (OSP16.2 →OSP17.1) controller-0 does not perform leapp upgrade due to packages missing ovn2.15 openvswitch2.15

If you upgrade from Red Hat OpenStack Platform (RHOSP) 13 to 16.1 or 16.2, or from RHOSP 16.2 to 17.1, do not include the system_upgrade.yaml file in the --answers-file answer-upgrade.yaml file. If the system_upgrade.yaml file is included in that file, the environments/lifecycle/upgrade-prepare.yaml file overwrites the parameters in the system_upgrade.yaml file. To avoid this issue, append the system_upgrade.yaml file to the openstack overcloud upgrade prepare command. For example:

$ openstack overcloud upgrade prepare --answers-file answer-upgrade.yaml /
-r roles-data.yaml /
-n networking-data.yaml /
-e system_upgrade.yaml /
-e upgrade_environment.yaml /

With this workaround, the parameters that are configured in the system_upgrade.yaml file overwrite the default parameters in the environments/lifecycle/upgrade-prepare.yaml file.

Red Hat Ceph Storage Issues

BZ#1855813 - Ceph tools repository should be switched from RHCS3 to RHCS4 only after converge, before running external-upgrade
The ceph-ansible playbook collection on the undercloud deploys Red Hat Ceph Storage containers on the overcloud. To upgrade your environment, you must have Red Hat Ceph Storage 3 version of ceph-ansible to maintain Ceph Storage 3 containers through the upgrade. This guide includes instructions on how to retain ceph-ansible version 3 over the course of the upgrade until you are ready to upgrade to Ceph Storage 4. Before performing the 13 to 16.2 upgrade, you must perform a minor version update of your Red Hat OpenStack Platform 13 environment and ensure you have ceph-ansible version 3.2.46 or later.

2.8. Backup and restore

Before you upgrade your Red Hat OpenStack Platform 13 environment, back up the undercloud and overcloud control plane. For more information about backing up nodes with the Relax-and-recover (ReaR) utility, see the Undercloud and Control Plane Back Up and Restore guide.

2.9. Minor version update

Before you upgrade your Red Hat OpenStack Platform environment, update the environment to the latest minor version of your current release. For example, perform an update of your Red Hat OpenStack Platform 13 environment to the latest 13 before running the upgrade to Red Hat OpenStack Platform 16.2.

For instructions on performing a minor version update for Red Hat OpenStack Platform 13, see Keeping Red Hat OpenStack Platform Updated.

2.10. Proxy configuration

If you use a proxy with your Red Hat OpenStack Platform 13 environment, the proxy configuration in the /etc/environment file will persist past the operating system upgrade and the Red Hat OpenStack Platform 16.2 upgrade.

2.11. Deleting RHEL registration resources

If the DeleteOnRHELUnregistration parameter is set to true in an existing environment file or the rhel-registration.yaml template, the overcloud upgrade cannot proceed. In this case, when you perform a minor update to the latest Red Hat OpenStack Platform 13z version, set the DeleteOnRHELUnregistration parameter to false.

Procedure

  1. In the parameter_defaults section of your environment file, if the DeleteOnRHELUnregistration parameter is set to true, set the parameter to false.
  2. Run the openstack overcloud update prepare command.
  3. Run the openstack undercloud upgrade command.

2.12. Validating Red Hat OpenStack Platform 13 before the upgrade

Before you upgrade to Red Hat OpenStack Platform 16.2, validate your undercloud and overcloud with the tripleo-validations playbooks. In Red Hat OpenStack Platform 13, you run these playbooks through the OpenStack Workflow Service (mistral).

Note

If you use CDN or Satellite as repository sources, the validation fails. To resolve this issue, see the Red Hat Knowledgebase solution, repos validation fails because of SSL certificate error.

Procedure

  1. Log in to the undercloud as the stack user.
  2. Source the stackrc file:

    $ source ~/stackrc
  3. Create a bash script called pre-upgrade-validations.sh and include the following content in the script:

    #!/bin/bash
    for VALIDATION in $(openstack action execution run tripleo.validations.list_validations '{"groups": ["pre-upgrade"]}' | jq ".result[] | .id")
    do
      echo "=== Running validation: $VALIDATION ==="
      STACK_NAME=$(openstack stack list -f value -c 'Stack Name')
      ID=$(openstack workflow execution create -f value -c ID tripleo.validations.v1.run_validation "{\"validation_name\": $VALIDATION, \"plan\": \"$STACK_NAME\"}")
      while [ $(openstack workflow execution show $ID -f value -c State) == "RUNNING" ]
      do
        sleep 1
      done
      echo ""
      openstack workflow execution output show $ID | jq -r ".stdout"
      echo ""
    done
  4. Add permission to run the script:

    $ chmod +x pre-upgrade-validations.sh
  5. Run the script:

    $ ./pre-upgrade-validations.sh

    Review the script output to determine which validations succeed and fail:

    === Running validation: "check-ftype" ===
    
    Success! The validation passed for all hosts:
    * undercloud