OCM3999 Unknown error trying to install an OSD or ROSA cluster
Environment
- Red Hat OpenShift Service on AWS (ROSA)
- 4
- Red Hat OpenShift Dedicated (OSD)
- 4
Issue
- The error
OCM3999 Unknown error
is shown in the OCM web console when trying to install an OSD or ROSA cluster.
Resolution
Disclaimer: Links contained herein to external website(s) are provided for convenience only. Red Hat has not reviewed the links and is not responsible for the content or its availability. The inclusion of any link to an external website does not imply endorsement by Red Hat of the website or their entities, products or services. You agree that Red Hat is not responsible or liable for any loss or expenses that may result due to your use of (or reliance on) the external site or content.
Some times, a transient error could occur during the cluster installation and simply trying to reinstall the cluster is enough. But, before trying a new installation, check the cluster in the OCM web console, and check if there is any Service Log with information about the issue in the "Cluster History" tab.
If there are no Service Logs in the OCM web console regarding the error, for ROSA clusters:
-
Ensure the
rosa
CLI version in use is the last one in the mirror, or download the last one.-
With an up-to-date
rosa
CLI version, execute the following commands to very the version and the permissions:$ rosa version $ rosa verify permissions
-
-
If the cluster is using AWS STS:
- Ensure that AWS STS is enable din the desired region following the AWS documentation.
-
Create or update the
account-roles
andoperator-roles
as explained in ROSA STS requires user action before install or upgrade (if using aprefix
for theaccount-roles
, add--prefix [prefix_name]
to theaccount-roles
command):$ rosa create account-roles -c ${CLUSTER} -f $ rosa create operator-roles -c ${CLUSTER} -f
- If the installation includes a cluster-wide proxy, refer to ROSA installation fails when using cluster-wide proxy.
With the rosa
version up-to-date, the account-roles
and operator-roles
created/updated, and the prerequisites for the cluster-wide proxy (only if used) checked, remove the failing cluster and try a new installation after the uninstallation of the previous cluster finished.
If the new installation fails again after following the above steps:
- Check the cluster in the OCM web console, and check if there is any Service Log with information about the issue in the "Cluster History" tab.
- Refer to the ROSA documentation for Troubleshooting installations and Troubleshooting cluster deployments.
- Review the prerequisites:
- For ROSA cluster with STS, please review the AWS prerequisites for ROSA with STS.
- For non-STS ROSA clusters, please refer to AWS prerequisites for ROSA.
- For OSD clusters, please refer to Understanding your cloud deployment options.
If after following the Troubleshooting documents and the prerequisites, it's not possible to identify the cause of the issue:
- Collect the information as shown in Troubleshooting installations and Troubleshooting cluster deployments.
- Keep the cluster in the error state (without uninstalling it) to allow Red Hat to investigate the issue, and create a Support Case with Red Hat providing:
- The
clusterID
. - The name of the cluster.
- The output of
rosa verify permissions
command. - The information from the troubleshooting documents.
- All the commands executed for the cluster installation and the logs/command outputs.
- The
Diagnostic Steps
It's possible to check if there are missing permissions for deploying a ROSA STS cluster with the script in Verify Permissions for ROSA STS Deployment.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments