Red Hat Training

A Red Hat training course is available for Red Hat OpenStack Platform

Chapter 1. Introduction

This document provides a workflow to help keep your Red Hat OpenStack Platform 13 environment updated with the latest packages and containers.

This guide provides an upgrade path through the following versions:

Old Overcloud VersionNew Overcloud Version

Red Hat OpenStack Platform 13

Red Hat OpenStack Platform 13.z

1.1. High level workflow

The following table provides an outline of the steps required for the upgrade process:

StepDescription

Obtaining new container images

Create a new environment file containing the latest container images for OpenStack Platform 13 services.

Updating the undercloud

Update the undercloud to the latest OpenStack Platform 13.z version.

Updating the overcloud

Update the overcloud to the latest OpenStack Platform 13.z version.

Updating the Ceph Storage nodes

Upgrade all Ceph Storage 3 services.

Finalize the upgrade

Run the convergence command to refresh your overcloud stack.

1.2. Known issues that might block an update

Review the following known issues that might affect a successful minor version update.

Minor updates of Red Hat Ceph Storage 3 can cause OSD corruption

Red Hat Ceph Storage 3 relies on docker for containerized deployments that run on EL7. The ceph-ansible fix for BZ#1846830 updates the systemd units that control Ceph containers, making the systemd units require the docker service to be up and running for execution. This requirement is essential to implement a safe update path and avoid service disruption or even data corruption on uncontrolled updates of the docker package.

Updating the ceph-ansible package is not sufficient for the ceph-ansible fix to be effective. You must also update the systemd units of the containers by rerunning the deployment playbook. For information about resolving the issue in your director-driven Ceph Storage deployment, refer to the Red Hat Knowledgebase solution Issue affecting minor updates of Red Had Ceph Storage 3 can cause OSDs corruption.

OSP13 update may appear to fail while it’s eventually successful

The python tripleo-client that is used in the openstack overcloud update run command might time out before the update process can complete. This results in the openstack overcloud update run command returning a failure, while the update process continues to run in the background until it completes.

To avoid this failure, edit the value of the ttl parameter in the tripleo-client/plugin.py file to increase the tripleo-client timeout value before you update your overcloud nodes. For more information, see the Red Hat Knowledgebase solution OSP 13 update process appears to fail while the update process runs in the background and completes successfully.

Slight cut in rabbitmq connectivity triggered a data plane loss after a full sync
If you want to update your environment from a release earlier than RHOSP 13 z10 (19 December 2019 maintenance release), to avoid data plane connectivity loss that is described in bug BZ#1955538, see the Red Hat Knowledgebase solution Stale namespaces on OSP13 can create data plane cut during update.
During ceph upgrade all the OSDs (and other ceph services) went down

If you are using Ceph, see the Red Hat Knowledgebase solution During minor update of OSP13/RHCS3 to latest packages Ceph services go offline and need to be manually restarted to avoid bug BZ#1910842 before you complete the following procedures:

  • Updating all Controller nodes
  • Updating all HCI Compute nodes
  • Updating all Ceph Storage nodes
Octavia and LB issues after z11 upgrade
During an update, the load-balancing service (Octavia) containers would continuously restart due to a missing file named /var/lib/config-data/puppet-generated/octavia/etc/octavia/conf.d/common/post-deploy.conf. This file was introduced during the Red Hat OpenStack Platform 13 lifecycle to configure octavia services after the Amphora deployment. This file is currently generated during the openstack overcloud update converge step of an update. To work around this issue, you must continue with the update. The octavia containers start normally after you run the openstack overcloud update converge command. The Red Hat OpenStack Platform engineering team is currently investigating a resolution to this issue.
DBAPIError exception wrapped from (pymysql.err.InternalError) (1054, u"Unknown column 'pool.tls_certificate_id' in 'field list'"

If you use the load-balancing service (octavia) and want to update from a release earlier than RHOSP 13 z13 (8 October 2020 maintenance release), to avoid bug BZ#1927169, you must run the database migrations that upgrade the load-balancing service in the correct order. You must update the bootstrap Controller node, before you can update the rest of the control plane.

  1. To identify your current maintenance release, run the following command:

    $ cat /etc/rhosp-release
  2. On the undercloud node, to identify the bootstrap Controller node, run the following command and replace <any_controller_node_IP_address> with the IP address of any of the Controller nodes in your deployment:

    $ ssh heat-admin@<any_controller_node_IP_address> sudo hiera -c /etc/puppet/hiera.yaml octavia_api_short_bootstrap_node_name
  3. On the undercloud node, run the openstack overcloud update run command to update the bootstrap Controller node:

    $ openstack overcloud update run --nodes <bootstrap_node_name>
Minor update to 13z16 failed with "Unable to find constraint"

When you restart an update of a Red Hat OpenStack Platform 13z16 overcloud node, you might experience an error Unable to find constraint. This error occurs because of a discrepancy in RabbitMQ version during the update. To ensure that the new RabbitMQ version can start, you must clear any pacemaker bans that might exist in the overcloud.

For more information about this issue, see the Red Hat Knowledgebase solution Cannot restart Update of the OSP13z16 controllers.

Cannot stop ceph-mon on controllers: No such container: ceph-mon controller-2

If you use Red Hat Ceph Storage version 3.3 z5 or earlier and update the docker package to docker-1.13.1-209, the RHOSP 13 update fails. The RHOSP 13 update does not stop the ceph-mon container before the docker package updates. This results in an orphan ceph-mon process, which blocks the new ceph-mon container from starting.

For more information about this issue, see the Red Hat Knowledgebase solution Updating Red Hat OpenStack Platform 13.z12 and earlier with Ceph Storage might fail during controller update.

1.3. Troubleshooting

  • If the update process takes longer than expected, then it might time out with the error: socket is already closed. This can arise because the undercloud’s authentication token is set to expire after a set duration. For more information, see Recommendations for Large Deployments.