Failed to delete cluster after manually removing cluster cloud resources

Solution Verified - Updated -

Environment

  • Red Hat OpenShift Service on AWS [ROSA]
    • 4.x
  • Red Hat OpenShift Dedicated [OSD]
    • 4.x

Issue

  • What is preventing my cluster from being deleted?
  • Clusters may not be deleted successfully if any resources in the underlying cloud provider infra, including the IAM roles and credentials, are accidentally deleted first. The sample below is an error observed on an AWS environment:

    CLUSTERS-MGMT-400: Failed to delete cluster 21xxxxxxxXXXXXXXXXXXXXirg191lko: Add 'arn:aws:iam::71xxxxxxxx33:role/RH-Managed-OpenShift-Installer' to the trust policy on IAM role 'arn:aws:iam::9-----------5:role/ManagedOpenShift-Installer-Role' Operation ID: bexxxxxx-XXXX-xxxx-XXXX-cexxxxXXXXdb
    

Resolution

  • As a best practice, before deleting any cloud provider resources, clusters need to be correctly uninstalled by following the official Red Hat documentation. For ROSA clusters, please follow Deleting a ROSA cluster, and for OSD clusters, Deleting an OpenShift Dedicated cluster

  • If the cloud provider resources were accidentally deleted and the cluster still appears in Ready state, try to delete the cluster by adding the best_effort='true' parameter as per the command below. This best-effort parameter currently skips the account roles verification, and puts the cluster in uninstalling with no pre-flight checks for the operation.

    NOTE: Please, note that this command only accepts the Internal Cluster ID which can be displayed by using the commands ocm list cluster or ocm describe cluster <cluster>.

    $ ocm delete cluster <Internal Cluster ID> -p best_effort='true'
    
  • If the cluster fails to uninstall correctly, please contact Red Hat Support for further troubleshooting.

Root Cause

  • When the cloud provider resources are deleted, this means that the OpenShift Hive Operator may be unaware that the cluster has been deleted, and so Red Hat will still have active entitlements and objects that need to be cleaned up.

  • When trying to delete the cluster by issuing rosa delete cluster --cluster=<clusterName>, it fails because one of the first things Hive does is to do an assessment of the cluster resources.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments