Upgrade maintenance delayed or failed during the Pre-Health Check in OSD/ROSA cluster
Environment
- Red Hat OpenShift Service on AWS (ROSA)
- 4
- Red Hat OpenShift Dedicated (OSD)
- 4
- Managed Upgrade Operator (MUO)
Issue
-
Cluster upgrade is failing and retries again for upgrade every two hours and again failing with firing health alerts in the cluster history:
Upgrade maintenance delayed Cluster upgrade to version 4.y.z was cancelled during the Pre-Health Check step. Health alerts are firing in the cluster which could impact the upgrade's operation, so the upgrade did not proceed.
Upgrade maintenance delayed Cluster upgrade to version 4.y.z is experiencing a delay as health alerts are firing in the cluster which could impact the upgrade's operation. The upgrade will continue to retry. This is an informational notification and no action is required by you
-
Cluster upgrade failed and was canceled with message:
Upgrade maintenance failed Your Cluster upgrade to version 4.y.z was cancelled during the Pre-Health Check step. Health alerts are firing in the cluster which could impact the upgrade's operation, so the upgrade did not proceed. Automated upgrades will be retried on their next scheduling cycle. If you have manually scheduled an upgrade instead, it must now be rescheduled.
Resolution
In some cases, the issue is automatically resolved and the upgrade continues in a retry without action.
If the cluster cannot continue with the upgrade, or it finally fails, the reason for the issue can be checked in the managed-upgrade-operator
logs:
$ oc get pods -n openshift-managed-upgrade-operator
[...]
$ oc logs -n openshift-managed-upgrade-operator managed-upgrade-operator-xxxxxxxxxx-xxxxx
An example of error preventing the cluster to be upgraded could be:
level=error logger=controller_upgradeconfig msg="error when ClusterHealthyBeforeUpgrade" Request.Namespace=openshift-managed-upgrade-operator Request.Name=managed-upgrade-config error="critical alert(s) firing: MultipleVersionsOfEFSCSIDriverInstalled"
For this specific alert, check if two different EFS operators are installed in the cluster.
Root Cause
The Managed Upgrade Operator (MUO) check if the cluster is in a good state before starting a cluster upgrade.
Diagnostic Steps
Check MUO logs:
$ oc get pods -n openshift-managed-upgrade-operator
[...]
$ oc logs -n openshift-managed-upgrade-operator managed-upgrade-operator-xxxxxxxxxx-xxxxx
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments