About the impact of 2 step (minor) upgrades in Red Hat OpenStack Platform

Solution In Progress - Updated -

Issue

According to the following text, one would expect director to fully update each overcloud node sequentially (after removal of the respective breakpoints) before moving to the next one:
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/upgrading_red_hat_openstack_platform/sect-updating_the_environment#sect-Updating_the_Overcloud

2.2.3. Updating the Overcloud Packages
...
Performing a package update on all nodes using the openstack overcloud update command. For example:

$ openstack overcloud update stack -i overcloud

Running an update on all nodes in parallel might cause problems. For example, an update of a package might involve restarting a service, which can disrupt other nodes. This is why the process updates each node using a set of breakpoints. This means nodes are updated one by one. When one node completes the package update, the update process moves to the next node. The update process also requires the -i option, which puts the command in an interactive mode that requires confirmation at each breakpoint. Without the -i option, the update remains paused at the first breakpoint.

Reality is slightly different as the update happens in two steps:

The minor update (everything before Pike) is basically 2 steps:

1. Run yum_update.sh on all nodes via interactive prompt. On controllers this also does stop cluster update package start cluster. On non controllers package update doesn't happen here.
2. Run puppet on all nodes and (non-controllers also get packages before config is applied).

This workflow is not transparent to the user as the update command returns completed after Step 1, and the convergence (Step 2) is somehow "hidden" behind what appears to be a longer update process for the last node.

One would expect to be able to perform the evacuation before removing the breakpoint, and bring workloads back when computes have been marked as completed.

Referring to the output below:

completed: [u'overcloud-compute-0', u'overcloud-controller-0']
on_breakpoint: [u'overcloud-compute-1']
  1. Is it safe to consider a compute node (compute-0) updated at this stage?
  2. Is it safe to evacuate compute-1 and migrate workloads back to compute-0?
  3. Assuming Step 2 could harm workloads (ovs?)- is there a way to "stage" updates so that a group of computes could be completely upgraded before moving to the next group? [This looks similar to bz#1509272]
  4. How about "networker" nodes?

Environment

Red Hat OpenStack Platform 8
Red Hat OpenStack Platform 9
Red Hat OpenStack Platform 10
Red Hat OpenStack Platform 11

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content