Failed to delete cluster after manually removing cluster cloud resources
Environment
- Red Hat OpenShift Service on AWS [ROSA]
- 4.x
- Red Hat OpenShift Dedicated [OSD]
- 4.x
Issue
- What is preventing my cluster from being deleted?
-
Clusters may not be deleted successfully if any resources in the underlying cloud provider infra, including the IAM roles and credentials, are accidentally deleted first. The sample below is an error observed on an AWS environment:
CLUSTERS-MGMT-400: Failed to delete cluster 21xxxxxxxXXXXXXXXXXXXXirg191lko: Add 'arn:aws:iam::71xxxxxxxx33:role/RH-Managed-OpenShift-Installer' to the trust policy on IAM role 'arn:aws:iam::9-----------5:role/ManagedOpenShift-Installer-Role' Operation ID: bexxxxxx-XXXX-xxxx-XXXX-cexxxxXXXXdb
Resolution
-
As a best practice, before deleting any cloud provider resources, clusters need to be correctly uninstalled by following the official Red Hat documentation. For ROSA clusters, please follow Deleting a ROSA cluster, and for OSD clusters, Deleting an OpenShift Dedicated cluster
-
If the cloud provider resources were accidentally deleted and the cluster still appears in
Readystate, try to delete the cluster by adding thebest_effort='true'parameter as per the command below. Thisbest-effortparameter currently skips the account roles verification, and puts the cluster in uninstalling with no pre-flight checks for the operation.NOTE: Please, note that this command only accepts the
Internal Cluster IDwhich can be displayed by using the commandsocm list clusterorocm describe cluster <cluster>.$ ocm delete cluster <Internal Cluster ID> -p best_effort='true' -
If the cluster fails to uninstall correctly, please contact Red Hat Support for further troubleshooting.
Root Cause
-
When the cloud provider resources are deleted, this means that the OpenShift Hive Operator may be unaware that the cluster has been deleted, and so Red Hat will still have active entitlements and objects that need to be cleaned up.
-
When trying to delete the cluster by issuing
rosa delete cluster --cluster=<clusterName>, it fails because one of the first thingsHivedoes is to do an assessment of the cluster resources.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments