Chapter 1. Introduction
This document provides a workflow to help keep your Red Hat OpenStack Platform 13 environment updated with the latest packages and containers.
This guide provides an upgrade path through the following versions:
|Old Overcloud Version||New Overcloud Version|
Red Hat OpenStack Platform 13
Red Hat OpenStack Platform 13.z
1.1. High level workflow
The following table provides an outline of the steps required for the upgrade process:
Obtaining new container images
Create a new environment file containing the latest container images for OpenStack Platform 13 services.
Updating the undercloud
Update the undercloud to the latest OpenStack Platform 13.z version.
Updating the overcloud
Update the overcloud to the latest OpenStack Platform 13.z version.
Updating the Ceph Storage nodes
Upgrade all Ceph Storage 3 services.
Finalize the upgrade
Run the convergence command to refresh your overcloud stack.
1.2. Known issues that might block an update
Review the following known issues that might affect a successful minor version update.
- Minor updates of Red Hat Ceph Storage 3 can cause OSD corruption
Red Hat Ceph Storage 3 relies on docker for containerized deployments that run on EL7. The ceph-ansible fix for BZ#1846830 updates the systemd units that control Ceph containers, making the systemd units require the docker service to be up and running for execution. This requirement is essential to implement a safe update path and avoid service disruption or even data corruption on uncontrolled updates of the docker package.
Updating the ceph-ansible package is not sufficient for the ceph-ansible fix to be effective. You must also update the systemd units of the containers by rerunning the deployment playbook. For information about resolving the issue in your director-driven Ceph Storage deployment, refer to the Red Hat Knowledgebase solution Issue affecting minor updates of Red Had Ceph Storage 3 can cause OSDs corruption.
- OSP13 update may appear to fail while it’s eventually successful
tripleo-clientthat is used in the
openstack overcloud update runcommand might time out before the update process can complete. This results in the
openstack overcloud update runcommand returning a failure, while the update process continues to run in the background until it completes.
To avoid this failure, edit the value of the
ttlparameter in the
tripleo-client/plugin.pyfile to increase the
tripleo-clienttimeout value before you update your overcloud nodes. For more information, see the Red Hat Knowledgebase solution OSP 13 update process appears to fail while the update process runs in the background and completes successfully.
- Slight cut in rabbitmq connectivity triggered a data plane loss after a full sync
- If you want to update your environment from a release earlier than RHOSP 13 z10 (19 December 2019 maintenance release), to avoid data plane connectivity loss that is described in bug BZ#1955538, see the Red Hat Knowledgebase solution Stale namespaces on OSP13 can create data plane cut during update.
- During ceph upgrade all the OSDs (and other ceph services) went down
If you are using Ceph, see the Red Hat Knowledgebase solution During minor update of OSP13/RHCS3 to latest packages Ceph services go offline and need to be manually restarted to avoid bug BZ#1910842 before you complete the following procedures:
- Updating all Controller nodes
- Updating all HCI Compute nodes
- Updating all Ceph Storage nodes
- Octavia and LB issues after z11 upgrade
During an update, the load-balancing service (Octavia) containers would continuously restart due to a missing file named
/var/lib/config-data/puppet-generated/octavia/etc/octavia/conf.d/common/post-deploy.conf. This file was introduced during the Red Hat OpenStack Platform 13 lifecycle to configure octavia services after the Amphora deployment. This file is currently generated during the
openstack overcloud update convergestep of an update. To work around this issue, you must continue with the update. The octavia containers start normally after you run the
openstack overcloud update convergecommand. The Red Hat OpenStack Platform engineering team is currently investigating a resolution to this issue.
- DBAPIError exception wrapped from (pymysql.err.InternalError) (1054, u"Unknown column 'pool.tls_certificate_id' in 'field list'"
If you use the load-balancing service (octavia) and want to update from a release earlier than RHOSP 13 z13 (8 October 2020 maintenance release), to avoid bug BZ#1927169, you must run the database migrations that upgrade the load-balancing service in the correct order. You must update the bootstrap Controller node, before you can update the rest of the control plane.
To identify your current maintenance release, run the following command:
$ cat /etc/rhosp-release
On the undercloud node, to identify the bootstrap Controller node, run the following command and replace
<any_controller_node_IP_address>with the IP address of any of the Controller nodes in your deployment:
$ ssh heat-admin@<any_controller_node_IP_address> sudo hiera -c /etc/puppet/hiera.yaml octavia_api_short_bootstrap_node_name
On the undercloud node, run the
openstack overcloud update runcommand to update the bootstrap Controller node:
$ openstack overcloud update run --nodes <bootstrap_node_name>
- Minor update to 13z16 failed with "Unable to find constraint"
When you restart an update of a Red Hat OpenStack Platform 13z16 overcloud node, you might experience an error
Unable to find constraint. This error occurs because of a discrepancy in RabbitMQ version during the update. To ensure that the new RabbitMQ version can start, you must clear any pacemaker bans that might exist in the overcloud.
For more information about this issue, see the Red Hat Knowledgebase solution Cannot restart Update of the OSP13z16 controllers.
- Cannot stop ceph-mon on controllers: No such container: ceph-mon controller-2
If you use Red Hat Ceph Storage version 3.3 z5 or earlier and update the docker package to docker-1.13.1-209, the RHOSP 13 update fails. The RHOSP 13 update does not stop the ceph-mon container before the docker package updates. This results in an orphan ceph-mon process, which blocks the new ceph-mon container from starting.
For more information about this issue, see the Red Hat Knowledgebase solution Updating Red Hat OpenStack Platform 13.z12 and earlier with Ceph Storage might fail during controller update.
If the update process takes longer than expected, then it might time out with the error:
socket is already closed. This can arise because the undercloud’s authentication token is set to expire after a set duration. For more information, see Recommendations for Large Deployments.