Upgrade maintenance delayed or failed during the Pre-Health Check in OSD/ROSA cluster

Solution Verified - Updated June 13 2024 at 6:43 PM -

Environment

Red Hat OpenShift Service on AWS (ROSA)
- 4
Red Hat OpenShift Dedicated (OSD)
- 4
Managed Upgrade Operator (MUO)

Issue

Cluster upgrade is failing and retries again for upgrade every two hours and again failing with firing health alerts in the cluster history:

Upgrade maintenance delayed
Cluster upgrade to version 4.y.z was cancelled during the Pre-Health Check step. Health alerts are firing in the cluster which could impact the upgrade's operation, so the upgrade did not proceed.

Upgrade maintenance delayed
Cluster upgrade to version 4.y.z is experiencing a delay as health alerts are firing in the cluster which could impact the upgrade's operation. The upgrade will continue to retry. This is an informational notification and no action is required by you

Cluster upgrade failed and was canceled with message:

Upgrade maintenance failed
Your Cluster upgrade to version 4.y.z was cancelled during the Pre-Health Check step. Health alerts are firing in the cluster which could impact the upgrade's operation, so the upgrade did not proceed. Automated upgrades will be retried on their next scheduling cycle. If you have manually scheduled an upgrade instead, it must now be rescheduled.

Resolution

In some cases, the issue is automatically resolved and the upgrade continues in a retry without action.

If the cluster cannot continue with the upgrade, or it finally fails, the reason for the issue can be checked in the managed-upgrade-operator logs:

$ oc get pods -n openshift-managed-upgrade-operator
[...]
$ oc logs -n openshift-managed-upgrade-operator managed-upgrade-operator-xxxxxxxxxx-xxxxx

An example of error preventing the cluster to be upgraded could be:

level=error logger=controller_upgradeconfig msg="error when ClusterHealthyBeforeUpgrade" Request.Namespace=openshift-managed-upgrade-operator Request.Name=managed-upgrade-config error="critical alert(s) firing: MultipleVersionsOfEFSCSIDriverInstalled"

For this specific alert, check if two different EFS operators are installed in the cluster.

Root Cause

The Managed Upgrade Operator (MUO) check if the cluster is in a good state before starting a cluster upgrade.

Diagnostic Steps

Check MUO logs:

$ oc get pods -n openshift-managed-upgrade-operator
[...]
$ oc logs -n openshift-managed-upgrade-operator managed-upgrade-operator-xxxxxxxxxx-xxxxx

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Ansible.com

Red Hat Ecosystem Catalog

Red Hat Hybrid Cloud Console

Red Hat Store

Red Hat Marketplace

Red Hat Summit and AnsibleFest

Upgrade maintenance delayed or failed during the Pre-Health Check in OSD/ROSA cluster

Environment

Issue

Resolution

Root Cause

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Environment

Issue

Resolution

Root Cause

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links