OpenShift Data Science deployment requires specific AWS tags for pre-existing VPC

Solution In Progress - Updated -

Environment

  • Red Hat OpenShift Data Science (RHODS)
  • Red Hat OpenShift Service on AWS (ROSA)
    • non-STS
    • 4
  • Red Hat OpenShift Dedicated (OSD)
    • 4

Issue

  • When OSD is deployed into a pre-existing Virtual Private Cloud (VPC), no tags are added to the pre-existing resources. The Cloud Resource Operator (CRO), part of Red Hat OpenShift Data Science, requires certain tags, so it's needed to add tags manually to pre-existing resources before installing OpenShift Data Science.
  • The following message is shown when trying to install OpenShift Data Science:

    failed to check cluster vpc subnets: unable to get vpc: error, no vpc found
    

Resolution

Note: RHODS is not yet supported in ROSA clusters with STS. Red Hat is aware of this issue and RHODS-3965 was created to track it.

Important: this is a workaround and not a resolution.
It can have negative impact.s Do not attempt this in production environments.
There is a chance that editing those tags manually can have negative impacts, including the potential removal of the manually-tagged VPC when trying to uninstall OSD.

Workaround

Note: if the installation has already failed, it's needed to uninstall and start the installation again after applying this workaround.

To add the required tags manually, perform the following steps:

1) Get the names of the nodes:

$ oc get machines -A

The output looks similar to this:

NAMESPACE               NAME                                      PHASE     TYPE         REGION      ZONE         AGE
openshift-machine-api   mycluster-9q4cg-infra-us-east-1a-4mccc    Running   r5.xlarge    us-east-1   us-east-1a   155m
openshift-machine-api   mycluster-9q4cg-infra-us-east-1a-vk6dq    Running   r5.xlarge    us-east-1   us-east-1a   155m
openshift-machine-api   mycluster-9q4cg-master-0                  Running   m5.2xlarge   us-east-1   us-east-1a   3h
openshift-machine-api   mycluster-9q4cg-master-1                  Running   m5.2xlarge   us-east-1   us-east-1a   3h
openshift-machine-api   mycluster-9q4cg-master-2                  Running   m5.2xlarge   us-east-1   us-east-1a   3h
openshift-machine-api   mycluster-9q4cg-worker-us-east-1a-khqdt   Running   m5.2xlarge   us-east-1   us-east-1a   176m
openshift-machine-api   mycluster-9q4cg-worker-us-east-1a-xn8w7   Running   m5.2xlarge   us-east-1   us-east-1a   176m

2) Save the "cluster" string from the output above.
This is the randomly generated string in the NAME column, between the cluster name and the role of the machine.
In the example output above, the "cluster" string is 9q4cg

3) Add the correct tags to the AWS VPC.
In AWS, locate the VPC that the cluster is deployed in.
Add the following tags, replacing the string 9q4cg with the saved "cluster" value and mycluster with the name of the cluster.

Key:                                    Value:
tagforCRO                                   mycluster-9q4cg-vpc
kubernetes.io/cluster/mycluster-9q4cg   manual

4) Add the correct tags to the 2 routing tables in that VPC, replacing the string 9q4cq with the saved "cluster" value.
Route associated with Private Subnet:

Key:                                    Value:
kubernetes.io/cluster/mycluster-9q4cg   owned

Other Route (Public):

Key:                                    Value:
kubernetes.io/cluster/mycluster-9q4cg   owned

Once these tags are in place, it's possible to proceed with installation of Red Hat OpenShift Data Science.

Note: if the installation has already failed, it's needed to uninstall and start the installation again.

Root Cause

If using a pre-existing VPC, tags are missing.
If using a VPC built on the fly by OSD, it might be that some required tags were manually removed after the VPC creation.

Diagnostic Steps

In the redhat-ods-operator namespace, locate the cloud-resource-operator pod:

$ oc get pods -n redhat-ods-operator | grep cloud-resource-operator

In the log of this pod, look for the message "failed to check cluster vpc subnets: unable to get vpc: error, no vpc found":

$ oc logs -n redhat-ods-operator [cloud-resource-operator_pod_name] | grep "failed to check cluster vpc subnets"

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments