OCM3999 Unknown error trying to install an OSD or ROSA cluster

Solution Unverified - Updated -

Environment

  • Red Hat OpenShift Service on AWS (ROSA)
    • 4
  • Red Hat OpenShift Dedicated (OSD)
    • 4

Issue

  • The error OCM3999 Unknown error is shown in the OCM web console when trying to install an OSD or ROSA cluster.

Resolution

Disclaimer: Links contained herein to external website(s) are provided for convenience only. Red Hat has not reviewed the links and is not responsible for the content or its availability. The inclusion of any link to an external website does not imply endorsement by Red Hat of the website or their entities, products or services. You agree that Red Hat is not responsible or liable for any loss or expenses that may result due to your use of (or reliance on) the external site or content.

Some times, a transient error could occur during the cluster installation and simply trying to reinstall the cluster is enough. But, before trying a new installation, check the cluster in the OCM web console, and check if there is any Service Log with information about the issue in the "Cluster History" tab.

If there are no Service Logs in the OCM web console regarding the error, for ROSA clusters:

  • Ensure the rosa CLI version in use is the last one in the mirror, or download the last one.

    • With an up-to-date rosa CLI version, execute the following commands to very the version and the permissions:

      $ rosa version
      $ rosa verify permissions
      
  • If the cluster is using AWS STS:

    • Ensure that AWS STS is enable din the desired region following the AWS documentation.
    • Create or update the account-roles and operator-roles as explained in ROSA STS requires user action before install or upgrade (if using a prefix for the account-roles, add --prefix [prefix_name] to the account-roles command):

      $ rosa create account-roles -c ${CLUSTER} -f
      $ rosa create operator-roles -c ${CLUSTER} -f
      
  • If the installation includes a cluster-wide proxy, refer to ROSA installation fails when using cluster-wide proxy.

With the rosa version up-to-date, the account-roles and operator-roles created/updated, and the prerequisites for the cluster-wide proxy (only if used) checked, remove the failing cluster and try a new installation after the uninstallation of the previous cluster finished.

If the new installation fails again after following the above steps:

If after following the Troubleshooting documents and the prerequisites, it's not possible to identify the cause of the issue:

  • Collect the information as shown in Troubleshooting installations and Troubleshooting cluster deployments.
  • Keep the cluster in the error state (without uninstalling it) to allow Red Hat to investigate the issue, and create a Support Case with Red Hat providing:
    • The clusterID.
    • The name of the cluster.
    • The output of rosa verify permissions command.
    • The information from the troubleshooting documents.
    • All the commands executed for the cluster installation and the logs/command outputs.

Diagnostic Steps

It's possible to check if there are missing permissions for deploying a ROSA STS cluster with the script in Verify Permissions for ROSA STS Deployment.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments