[RHOSO] Instances are not evacuated during host failure
Environment
- Red Hat Openstack Services on Openshift (RHOSO) 18
Issue
InstanceHA is enabled, but instances are not evacuated during host failure. Only host is rebooted.
Resolution
Correct the InstanceHA configuration to disable aggregation filter or create aggregation as doc.
-
Edit the configuration of resource instanceha-config to disable aggregation validation and save the change.
$ oc edit cm <config_map_name> ... apiVersion: v1 data: config.yaml: | config: EVACUABLE_TAG: "evacuable" TAGGED_IMAGES: "true" TAGGED_FLAVORS: "true" TAGGED_AGGREGATES: "false" ...
-
Or create a host aggregate as doc to include some compute nodes there and be able to evacuate.
Root Cause
By default, the TAGGED_AGGREGATES Instance HA service parameter is set to true, so that the Instance HA service checks for tagged host aggregates. If you set the TAGGED_AGGREGATES parameter to false then the Instance HA service does not check for tagged host aggregates and therefore will evacuate all the eligible Compute nodes.
Diagnostic Steps
-
Check the logs of instanceHA pod for the warning as below.
$ oc logs instanceha-0-54f865b6dd-w6h4t | grep evacuable 2024-09-15 23:21:38,105 WARNING The following computes are not part of an evacuable aggregate, so they will not be recovered: ['compute-0.ctlplane.example.com']
-
Check if host aggregate validation is enabled, if not set it's enabled.
$ oc get cm instanceha-config -o yaml | grep TAGGED_AGGREGATES
-
Check if compute node is part of a host aggregate.
# openstack host show <host> # openstack aggregate show <aggregate_name>
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments