Cluster installation failed with error code OCM3018: NoWorkerNodes

Solution Verified - Updated -

Environment

  • Red Hat OpenShift Service on AWS (ROSA 4)

Issue

Disclaimer: Links contained herein to external website(s) are provided for convenience only. Red Hat has not reviewed the links and is not responsible for the content or its availability. The inclusion of any link to an external website does not imply endorsement by Red Hat of the website or their entities, products or services. You agree that Red Hat is not responsible or liable for any loss or expenses that may result due to your use of (or reliance on) the external site or content.

  • Cluster installation failed with error code **OCM3018: NoWorkerNodes
  • OCM shows the following message:
No worker nodes could be created. Check that your machine-api role is correct and try again.
  • Install log contains:
Got 0 worker nodes, X master nodes.

Resolution

This usually happens when your cluster is attempting to use roles from a previous installation attempt. If you have previously installed a cluster in this AWS account, be sure to delete all old unused roles before retrying installation. Otherwise, verify that the trust relationship for your ManagedOpenShift-openshift-machine-api-aws-cloud-credentials role references the OIDC provider for the current cluster ID, and try again.

To delete unused roles, refer the official AWS documentation.

You can verify the trust relationship for your ManagedOpenShift-openshift-machine-api-aws-cloud-credentials role by following the below steps:

  1. In the navigation pane of the IAM console, choose Roles.

  2. Choose the name of the role ManagedOpenShift-openshift-machine-api-aws-cloud-credentials, and select the Trust relationships tab on the details page.
    NOTE: The trusted entity has a cluster ID.

  3. Verify that the trust relationship for your ManagedOpenShift-openshift-machine-api-aws-cloud-credentials role references the OIDC provider for the current cluster ID (and not the previous cluster ID).

If you need help from Red Hat, please open a support case with us by clicking here.

Root Cause

The STS role ManagedOpenShift-openshift-machine-api-aws-cloud-credentials is a role that the machine-api-operator in the cluster uses to authenticate to AWS. In order for it to do that, it needs a trust relationship, and that relationship is with the OIDC provider. That OIDC provider is therefore required for the life of the cluster to do any EC2 operations in AWS via machine-api-operator such as, creating worker nodes.

A common thing that can happen is you create a cluster with this role, and for whatever reason the cluster fails to install. Due to install failure, you delete it and try reinstalling. Here, the new cluster uses the machine-api role with the wrong OIDC provider trusted. Note that the trusted entity has a cluster ID in it.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments