About the impact of 2 step (minor) upgrades in Red Hat OpenStack Platform
Issue
According to the following text, one would expect director to fully update each overcloud node sequentially (after removal of the respective breakpoints) before moving to the next one:
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/upgrading_red_hat_openstack_platform/sect-updating_the_environment#sect-Updating_the_Overcloud
2.2.3. Updating the Overcloud Packages
...
Performing a package update on all nodes using the openstack overcloud update command. For example:
$ openstack overcloud update stack -i overcloud
Running an update on all nodes in parallel might cause problems. For example, an update of a package might involve restarting a service, which can disrupt other nodes. This is why the process updates each node using a set of breakpoints. This means nodes are updated one by one. When one node completes the package update, the update process moves to the next node. The update process also requires the -i option, which puts the command in an interactive mode that requires confirmation at each breakpoint. Without the -i option, the update remains paused at the first breakpoint.
Reality is slightly different as the update happens in two steps:
The minor update (everything before Pike) is basically 2 steps:
1. Run yum_update.sh on all nodes via interactive prompt. On controllers this also does stop cluster update package start cluster. On non controllers package update doesn't happen here.
2. Run puppet on all nodes and (non-controllers also get packages before config is applied).
This workflow is not transparent to the user as the update command returns completed
after Step 1, and the convergence (Step 2) is somehow "hidden" behind what appears to be a longer update process for the last node.
One would expect to be able to perform the evacuation before removing the breakpoint, and bring workloads back when computes have been marked as completed.
Referring to the output below:
completed: [u'overcloud-compute-0', u'overcloud-controller-0']
on_breakpoint: [u'overcloud-compute-1']
- Is it safe to consider a compute node (compute-0) updated at this stage?
- Is it safe to evacuate compute-1 and migrate workloads back to compute-0?
- Assuming Step 2 could harm workloads (ovs?)- is there a way to "stage" updates so that a group of computes could be completely upgraded before moving to the next group? [This looks similar to bz#1509272]
- How about "networker" nodes?
Environment
Red Hat OpenStack Platform 8
Red Hat OpenStack Platform 9
Red Hat OpenStack Platform 10
Red Hat OpenStack Platform 11
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.