When custom roles files get missed on update-plan-only step, minor update fails and returns the nodes to the pool
Issue
We deploy the Overcloud normally on 3 Controller and 6 Compute Nodes, with the Compute Nodes having 2 different Composable Roles.
After the initial deployment, we tried to update the overcloud via openstack overcloud update as explained in the product documentation
The Update went through the first 2 Controller Nodes and then fails after clearing the 3rd with:
[stack]$ ./overcloud-update.sh
Started Mistral Workflow tripleo.package_update.v1.package_update_plan. Execution ID: 17716474-2106-4469-8102-a45c929961a5
Waiting for messages on queue 'ac38ee4a-d63a-4d34-ae10-0c77147056e4' with no timeout.
WAITING
not_started: [u'overcloud-computelenovo-4', u'overcloud-computelenovo-2', u'overcloud-computelenovo-3', u'overcloud-computelenovo-1', u'overcloud-computelenovo-0', u'overcloud-computehp_v1-0']
on_breakpoint: [u'overcloud-controller-1', u'overcloud-controller-2', u'overcloud-controller-0']
Breakpoint reached, continue? Regexp or Enter=proceed (will clear 6ce54680-07d3-4910-8745-921adc280095), C-c=quit interactive mode:
WAITING
not_started: [u'overcloud-computelenovo-4', u'overcloud-computelenovo-2', u'overcloud-computelenovo-3', u'overcloud-computelenovo-1', u'overcloud-computelenovo-0', u'overcloud-computehp_v1-0']
completed: [u'overcloud-controller-0']
on_breakpoint: [u'overcloud-controller-1', u'overcloud-controller-2']
Breakpoint reached, continue? Regexp or Enter=proceed (will clear 46766fb0-7c53-45ec-8f66-2ccdf3faf42d), C-c=quit interactive mode:
WAITING
not_started: [u'overcloud-computelenovo-4', u'overcloud-computelenovo-2', u'overcloud-computelenovo-3', u'overcloud-computelenovo-1', u'overcloud-computelenovo-0', u'overcloud-computehp_v1-0']
completed: [u'overcloud-controller-2', u'overcloud-controller-0']
on_breakpoint: [u'overcloud-controller-1']
Breakpoint reached, continue? Regexp or Enter=proceed (will clear 0923044e-a1be-48e5-9b5b-f42eacf34b74), C-c=quit interactive mode:
IN_PROGRESS
IN_PROGRESS
IN_PROGRESS
IN_PROGRESS
IN_PROGRESS
IN_PROGRESS
IN_PROGRESS
IN_PROGRESS
IN_PROGRESS
IN_PROGRESS
IN_PROGRESS
IN_PROGRESS
ERROR: The specified reference "ComputeLenovoDeployment_Step5" (in ComputeExtraConfigPost) is incorrect.
After that all Compute Nodes are instantly removed and shown up as available in "openstack baremetal node list
". The Complete Stack needs to be redeployed after that.
Environment
- Red Hat OpenStack Platform 11.0
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.