CustomDomain NotReady in AWS PrivateLink ROSA cluster

Solution Unverified - Updated -

Environment

  • Red Hat OpenShift Service on AWS (ROSA)
  • AWS PrivateLink ROSA cluster

Issue

  • Creating a CustomDomain in AWS PrivateLink ROSA cluster never finish.
  • The CustomDomain CR is in NotReady status in AWS PrivateLink ROSA cluster, with reason Creating:

    status:
    conditions:
    - message: Creating Apps Custom Domain (apps.tc01686-dev.afs1-nprd.aws-za.sbgrp.cloud)
      reason: Creating
      status: "True"
      type: Creating
    
  • The ingress ClusterOperator is degraded with the following message:

      message: 'Some ingresscontrollers are degraded: ingresscontroller "console-domain" is degraded: DegradedConditions: One or more other status conditions indicate a degraded state: LoadBalancerReady=False (LoadBalancerPending: The LoadBalancer service is pending)'
      reason: IngressControllersDegraded
      status: "True"
      type: Degraded
    
  • The ingresscontroller generated by the CustomDomain CR show the following messages:

    message: 'One or more status conditions indicate unavailable: LoadBalancerReady=False (LoadBalancerPending: The LoadBalancer service is pending)'
    
    message: 'One or more other status conditions indicate a degraded state: LoadBalancerReady=False (LoadBalancerPending: The LoadBalancer service is pending)'
    
  • The ingress-operator shows the following errors:

    ERROR   operator.ingress_controller     controller/controller.go:244    got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded: LoadBalancerRead
    y=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: could not find any suitable subnets for creating the ELB\nThe kube-controller-manager logs may contain more details.)"}
    
    ERROR   operator.ingress_controller     controller/controller.go:244    got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded: LoadBalancerReady=False (LoadBalancerPending: The LoadBalancer service is pending)"}
    

Resolution

Disclaimer: Links contained herein to external website(s) are provided for convenience only. Red Hat has not reviewed the links and is not responsible for the content or its availability. The inclusion of any link to an external website does not imply endorsement by Red Hat of the website or their entities, products or services. You agree that Red Hat is not responsible or liable for any loss or expenses that may result due to your use of (or reliance on) the external site or content.

Custom Domains require a public facing ELB, and PrivateLink ROSA clusters doesn't have a public facing ELB by default. Internal Custom Domains can be configured in PrivateLink ROSA clusters. Refer to the documentation for configuring custom domains for applications.

There is an example for adding a Public Ingress endpoint to a ROSA Private-Link Cluster, but please, note that the steps shown in that article are not supported by Red Hat Support or the Red Hat SRE Team.

Root Cause

Custom Domains require a public facing ELB. If there are not public subnets, the controller will not put an ELB on the private subnets.

Diagnostic Steps

Check the CustomDomain status and messages:

$ oc get customdomain
NAME             ENDPOINT   DOMAIN                    STATUS
console-domain              [my_cluster_domain]       NotReady

$ oc get customdomain console-domain -o yaml
[...]
status:
  conditions:
  [...]
  - message: Creating Apps Custom Domain (apps.tc01686-dev.afs1-nprd.aws-za.sbgrp.cloud)
    reason: Creating
    status: "True"
    type: Creating
[...]

Check the messages in the ingresscontroller generated by the CustomDomain CR:

$ oc get ingresscontroller console-domain -n openshift-ingress-operator -o yaml
[...]
  - message: 'One or more status conditions indicate unavailable: LoadBalancerReady=False (LoadBalancerPending: The LoadBalancer service is pending)'
    reason: IngressControllerUnavailable
    status: "False"
    type: Available
  [...]
  - message: 'One or more other status conditions indicate a degraded state: LoadBalancerReady=False (LoadBalancerPending: The LoadBalancer service is pending)'
    reason: DegradedConditions
    status: "True"
    type: Degraded
[...]

Check the ingress ClusterOperator status and messages:

$ oc get co
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
[...]
ingress                                    4.7.29    False       True          True       15h
[...]

$ oc get co ingress -o yaml
[...]
status:
  conditions:
  [...]
  - message: 'Some ingresscontrollers are degraded: ingresscontroller "console-domain" is degraded: DegradedConditions: One or more other status conditions indicate a degraded state: LoadBalancerReady=False (LoadBalancerPending: The LoadBalancer service is pending)'
    reason: IngressControllersDegraded
    status: "True"
    type: Degraded
[...]

Check the ingress-operator logs:

$ oc logs -n openshift-ingress-operator -c ingress-operator ingress-operator-7694d685cf-d6jkb
[...]
ERROR   operator.ingress_controller     controller/controller.go:244    got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded: LoadBalancerRead
y=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: could not find any suitable subnets for creating the ELB\nThe kube-controller-manager logs may contain more details.)"}
[...]
[...]
ERROR   operator.ingress_controller     controller/controller.go:244    got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded: LoadBalancerReady=False (LoadBalancerPending: The LoadBalancer service is pending)"}
[...]

Check the kube-controller-manager logs:

$ oc get pods -n openshift-kube-controller-manager -l kube-controller-manager
NAME                                         READY   STATUS    RESTARTS   AGE
kube-controller-manager-[master-0_name]      4/4     Running   2          17d
kube-controller-manager-[master-1_name]      4/4     Running   3          17d
kube-controller-manager-[master-2_name]      4/4     Running   4          17d

$ oc logs -n openshift-kube-controller-manager kube-controller-manager-[master-0_name]
[...]
I1129 21:55:50.800103       1 controller.go:368] Ensuring load balancer for service openshift-ingress/router-console-domain
I1129 21:55:50.800182       1 aws.go:3788] EnsureLoadBalancer(rosapoc-6s9x2, openshift-ingress, router-console-domain, af-south-1, , [{http TCP <nil> 80 {1 0 http} 32458} {https TCP <nil> 443 {1 0 https} 31694}], map[service.beta.kubernetes.io/aws-load-balancer-healthcheck-healthy-threshold:2 service.beta.kubernetes.io/aws-load-balancer-healthcheck-interval:5 service.beta.kubernetes.io/aws-load-balancer-healthcheck-timeout:4 service.beta.kubernetes.io/aws-load-balancer-healthcheck-unhealthy-threshold:2 service.beta.kubernetes.io/aws-load-balancer-proxy-protocol:*])
I1129 21:55:50.800235       1 event.go:291] "Event occurred" object="openshift-ingress/router-console-domain" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
I1129 21:55:50.917073       1 aws.go:3440] Ignoring private subnet for public ELB "subnet-0768119a8539c210e"
I1129 21:55:50.917095       1 aws.go:3440] Ignoring private subnet for public ELB "subnet-07bd8f81179783c5a"
I1129 21:55:50.917101       1 aws.go:3440] Ignoring private subnet for public ELB "subnet-08c1489cee92c365a"
E1129 21:55:50.917139       1 controller.go:275] error processing service openshift-ingress/router-console-domain (will retry): failed to ensure load balancer: could not find any suitable subnets for creating the ELB
I1129 21:55:50.917223       1 event.go:291] "Event occurred" object="openshift-ingress/router-console-domain" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: could not find any suitable subnets for creating the ELB"
[...]

Check if the cluster is an AWS PrivateLink ROSA cluster:

$ ocm describe cluster [cluster_id]
[...]
PrivateLink:   true
[...]

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments