Authentication operator degraded with error x509 certificate is not valid for any names in OCP4

Solution Verified - Updated -

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4

Issue

  • The operator is in degraded state with the following error after a successful UPI installation:

    $ oc get clusteroperator/authentication -o yaml | grep "x509: certificate is not valid for any names" -B2
    - lastTransitionTime: "2023-04-25T09:10:15Z"
    message: 'OAuthServerRouteEndpointAccessibleControllerDegraded: Get "https://oauth-openshift.apps.cluster.com/healthz":
      x509: certificate is not valid for any names, but wanted to match oauth-openshift.apps.cluster.com'
    

Resolution

Review the load-balancer configuration and make sure it is configured as Raw TCP, SSL Passthrough, or SSL Bridge mode. If SSL Bridge mode is used, then Server Name Indication (SNI) for the ingress routes must be enabled. Find the full requirements as well as a haproxy configuration example in the documentation.

Root Cause

The load balancer used for the .apps domain during the cluster installation is intercepting the SSL communication between the host and the cluster, exposing an unexpected certificate.

See the documentation for more details on the load balancer configuration requirements.

Diagnostic Steps

  1. Verify the status of the authentication operator:

    $ oc get clusteroperator/authentication -o yaml | grep "x509: certificate is not valid for any names" -B2
    - lastTransitionTime: "2023-04-25T09:10:15Z"
       message: 'OAuthServerRouteEndpointAccessibleControllerDegraded: Get "https://oauth-openshift.apps.cluster.com/healthz":
       x509: certificate is not valid for any names, but wanted to match oauth-openshift.apps.cluster.com'
    
  • The clusterversion operator may be unavailable:

    $ oc get clusterversion
    NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
    version             False       True          1h2m    Unable to apply 4.12.11: some cluster operators are not available
    
  • Aditionally, the console operator may not be available, showing a similar error:

    $ oc get co/console -o yaml | grep "x509: certificate is not valid for any names" -B2
    - lastTransitionTime: "2023-04-25T10:04:13Z"
        message: 'RouteHealthDegraded: failed to GET route (https://console-openshift-console.apps.cluster.com):
        Get "https://console-openshift-console.apps.cluster.com": x509: certificate
        is not valid for any names, but wanted to match console-openshift-console.apps.cluster.com'
    
  1. The certificate configured in the IngressOperator is rotated to the config-map oauth-serving-cert which is the one that the oauth route exposes. More information on how this config-map is created in the authentication documentation and replacing the default ingress certificate.

    $ oc get cm/oauth-serving-cert -n openshift-config-managed -o json | jq '.data."ca-bundle.crt"' |  awk  '{gsub(/\\n/,"\n")}1' | openssl x509 -noout --serial --subject -text -in /dev/stdin
    
    serial=6289B65115C9E5F0
    subject=CN = *.apps.cluster.com
    Certificate:
      Data:
          Version: 3 (0x2)
          Serial Number: 7...
          Signature Algorithm: sha256WithRSAEncryption
          Issuer: CN = ingress-operator@1682413764
          Validity
              Not Before: Apr 25 09:09:23 2023 GMT
              Not After : Apr 24 09:09:24 2025 GMT
          Subject: CN = *.apps.cluster.com
          Subject Public Key Info:
              Public Key Algorithm: rsaEncryption
                  RSA Public-Key: (2048 bit)
                  Modulus:
                      ...
                      e9:2e:3c:38:57:36:ae:24:7a:43:06:e8:51:69:5d:
                      09:59
                  Exponent: 65537 (0x10001)
          ...
    
              X509v3 Subject Alternative Name: 
                  DNS:*.apps.cluster.com
      Signature Algorithm: sha256WithRSAEncryption
           ...
           a1:10:4f:dc
    
  2. Verify that the above certificate matches the one exposed by the router pods. It is possible to check against them directly, bypassing any load balancer or DNS issues in the middle with the following command:

    $ oc get pods -n openshift-ingress -o wide
    NAME                   READY   STATUS    RESTARTS   AGE     IP            NODE
    router-defaultb7zw6    1/1     Running   0          1h15m   10.165.1.73   <worker-1>
    router-default-xsmp2   1/1     Running   0          1h15m   10.165.1.72   <worker-2>
    
    
    $ oc debug node/<worker-1>
    ...
    $ openssl s_client -showcerts -servername oauth-openshift.apps.cluster.com -connect 127.0.0.1:443
    
    CONNECTED(00000003)
    depth=1 CN = ingress-operator@1682413764
    verify error:num=19:self signed certificate in certificate chain
    verify return:1
    depth=1 CN = ingress-operator@1682413764
    verify return:1
    depth=0 CN = *.apps.cluster.com
    verify return:1
    
    
    Certificate chain
    0 s:CN = *.apps.cluster.com
    i:CN = ingress-operator@1682413764
    -----BEGIN CERTIFICATE-----
    MIIDZTCCAk2gAwIBAgIIYom2URXJ5fAwDQYJKoZIhvcNAQELBQAwJjEkMCIGA1UE
    ...
    P8pz5HKhEE/c
    -----END CERTIFICATE-----
    1 s:CN = ingress-operator@1682413764
     i:CN = ingress-operator@1682413764
    -----BEGIN CERTIFICATE-----
    MIIDDDCCAfSgAwIBAgIBATANBgkqhkiG9w0BAQsFADAmMSQwIgYDVQQDDBtpbmdy
    ...
    ZFzZhYlJuCJplolrJJZ1XA==
    -----END CERTIFICATE-----
    
    Server certificate
    subject=CN = *.apps.cluster.com
    
    issuer=CN = ingress-operator@1682413764
    
    
    Acceptable client certificate CA names
    OU = openshift, CN = admin-kubeconfig-signer
    OU = openshift, CN = kube-control-plane-signer
    OU = openshift, CN = kube-apiserver-to-kubelet-signer
    OU = openshift, CN = kubelet-bootstrap-kubeconfig-signer
    CN = openshift-kube-apiserver-operator_node-system-admin-signer@1682413688
    CN = openshift-kube-controller-manager-operator_csr-signer-signer@1682481186
    CN = kube-csr-signer_@1682481510
    CN = openshift-kube-apiserver-operator_aggregator-client-signer@1682481188
    ...
    
    
    
  3. When querying the oauth route using the load balancer, the certificate does not match (CN = localhost.localdomain):

    $ openssl s_client -showcerts -connect oauth-openshift.apps.cluster.com:443
    CONNECTED(00000003)
    depth=0 C = US, ST = WA, L = Seattle, O = MyCompany, OU = IT, CN = localhost.localdomain, emailAddress =     root@localhost.localdomain
    verify error:num=18:self signed certificate
    verify return:1
    depth=0 C = US, ST = WA, L = Seattle, O = MyCompany, OU = IT, CN = localhost.localdomain, emailAddress =    root@localhost.localdomain
    verify return:1
    
    Certificate chain
    0 s:C = US, ST = WA, L = Seattle, O = MyCompany, OU = IT, CN = localhost.localdomain, emailAddress =   root@localhost.localdomain
    i:C = US, ST = WA, L = Seattle, O = MyCompany, OU = IT, CN = localhost.localdomain, emailAddress =   root@localhost.localdomain
    -----BEGIN CERTIFICATE-----
    MIIDrjCCApagAwIBAgIEEnm8YTANBgkqhkiG9w0BAQsFADCBmDELMAkGA1UEBhMC
    ...
    1dEE1YE4pEi0oOXeFtjHBDNjEcKuq5mraQLT6ZrqKXiDOQ==
    -----END CERTIFICATE-----
    

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments