Manually create the master and API certificates when API is down and redeploy-certificates playbook fails

Solution Verified - Updated -

Environment

  • Red Hat OpenShift Container Platform (OCP)
    • 3.11

Issue

  • The redeploy-certificates playbook fails due to the already expired API certificates.
  • Master node certificates are expired.

Resolution

Info
The playbooks/redeploy-certificates.yml or playbooks/openshift-master/redeploy-certificates.yml playbooks fail, because the playbook checks whether the API is up and accessible. If all masters and it's API is down, the playbook fails, because it checks the API against the Load Balancer API URL, instead the API running on each master.

Almost all of the certificates present inside /etc/origin/master/ can be regenerated manually with openssl to bring the API up.

NOTE: The oc adm command to generate the certificate needs to be run on the first master node where the /etc/origin/master/ca.serial.txt file is present.

  • The /etc/origin/master/master.server.crt is the API server certificate which is required to keep the API running. The common name (CN) and Subject Alternative Name needs to be obtained from the expired certificate as the hostnames and IP address are required at the time of certificate creation.

    # openssl x509 -in /etc/origin/master/master.server.crt -text -noout
    Certificate:
    Data:
    .....
    .....
    .....
        Subject: CN=10.74.249.116
    .....
    .....
        X509v3 Subject Alternative Name: 
          DNS:external.example.com, DNS:internal.example.com, DNS:master-1.example.com, DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, DNS:openshift, DNS:10.74.249.116, DNS:172.30.0.1, IP Address:10.74.249.116, IP Address:172.30.0.1
    
  • Generate the new API server certificate with the oc adm command by specifying the hostnames and IP address retrieved from the previous step. This step and the previous step needs to be performed for each master node individually.

    # oc adm ca create-server-cert --signer-cert=/etc/origin/master/ca.crt --signer-key=/etc/origin/master/ca.key --signer-serial=/etc/origin/master/ca.serial.txt --hostnames='external.example.com,internal.example.com,master-1.example.com,kubernetes,kubernetes.default,kubernetes.default.svc,kubernetes.default.svc.cluster.local,openshift,10.74.249.116,172.30.0.1' --cert=/etc/origin/master/master.server.crt  --key=/etc/origin/master/master.server.key
    
  • The /etc/origin/master/openshift-master.crt has to be generated for each master node individually and after generating the certificate and key, both need to be added to /etc/origin/master/openshift-master.kubeconfig in base64 encoded format on all master nodes respectively.

    # openssl genrsa -out /etc/origin/master/openshift-master.key 2048
    
    # cat extension.ext 
    keyUsage             = critical,digitalSignature,keyEncipherment
    extendedKeyUsage     = clientAuth
    basicConstraints     = critical,CA:false
    
    # openssl req -new -key /etc/origin/master/openshift-master.key -subj "/O=system:masters/O=system:openshift-master/CN=system:openshift-master" -out /etc/origin/master/openshift-master.csr
    
    # openssl x509 -req -in /etc/origin/master/openshift-master.csr -CA /etc/origin/master/ca.crt -CAkey /etc/origin/master/ca.key -CAcreateserial -out /etc/origin/master/openshift-master.crt -days 730 -sha256 -extfile extension.ext
    
  • Encode the /etc/origin/master/openshift-master.crt and /etc/origin/master/openshift-master.key data into base64 format and replace the existing data inside /etc/origin/master/openshift-master.kubeconfig with the new one.

    # cat /etc/origin/master/openshift-master.crt | base64 -w 0   LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JURKakNDQWc2Z0F3SUJBZ0lKQU5aTXFTOFl4RnNHTUEwR0NTcUdTSWIzRFFFQkN3VUFNQkV4RHpBTkJnTlYKQkFNVEJuSnZiM1JEUVRBZUZ3MHlNREExTXpFd09EQTNNVGRhRncweU1qQTFNekV3T0RBM01UZGFNRjB4RnpBVgpCZ05WQkFvTURuTjVjM1JsYlRwdFlYTjBaWEp6TVNBd0hnWURWUVFLREJkemVYTjBaVzA2YjNCbGJuTm9hV1owCkxXMWhjM1JsY2pFZ01CNEdBMVVFQXd3WGMzbHpkR1Z0T205d1pXNXphR2xtZEMxdFl.....
    
    # cat /etc/origin/master/openshift-master.key | base64 -w 0
    LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFb3mR0MnN1TU5EZ0hmY3hxOGpZcUh3cmk5SXNIeEtDNnBVCmJXTjRxR25iZkNRRVZUeHNMRFp2RFdoeE5zZjFXN29nTlRpM20xb2VXQmpPQklQVE9RTTZyczJGQWtKSFNPdGQKRlQyK21YaGJYaFhodkZ.....
    
    # cat /etc/origin/master/openshift-master.kubeconfig
    apiVersion: v1
    clusters:
    ...
    ...
        client-certificate-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JURKakNDQWc2Z0F3SUJBZ0lKQU5aT.....
        client-key-data: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFb3mR0MnN1TU5EZ0hmY3hxOGpZ.....
    
  • The /etc/origin/master/master.kubelet-client.crt key-pair will remain the same on all the master nodes, so it can be generated on any of the master nodes and copy on others.

    # openssl genrsa -out /etc/origin/master/master.kubelet-client.key 2048
    
    # openssl req -new -key /etc/origin/master/master.kubelet-client.key -subj "/O=system:node-admins/CN=system:openshift-node-admin" -out /etc/origin/master/master.kubelet-client.csr
    
    # Use the same extension file from the previous step.
    
    # openssl x509 -req -in /etc/origin/master/master.kubelet-client.csr -CA /etc/origin/master/ca.crt -CAkey /etc/origin/master/ca.key -CAcreateserial -out /etc/origin/master/master.kubelet-client.crt -days 730 -sha256 -extfile extension.ext
    
  • The /etc/origin/master/master.proxy-client.crt key-pair will also remain the same on all the master nodes, so it can be generated on any of the master nodes and copy on others.

    # openssl genrsa -out /etc/origin/master/master.proxy-client.key 2048
    
    # openssl req -new -key /etc/origin/master/master.proxy-client.key -subj "/CN=system:master-proxy" -out /etc/origin/master/master.proxy-client.csr
    
    # Use the same extension file from the previous step.
    
    # openssl x509 -req -in /etc/origin/master/master.proxy-client.csr -CA /etc/origin/master/ca.crt -CAkey /etc/origin/master/ca.key -CAcreateserial -out /etc/origin/master/master.proxy-client.crt -days 730 -sha256 -extfile extension.ext
    
  • Now, wait for a few minutes until the API comes up. The above-created certificates are sufficient to bring the API up but if any other certificates are also expired on master nodes then those can be created manually or by running the playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-master/redeploy-certificates.yml.

  • The playbook will run properly now as the API is recovered.

NOTE - make sure to run the playbook afterwards, as the guide only recovers the master certificates. The playbook not only redeploys new certificates, but also restarts necessary services as web console and other.

Start the control plane manually

Due to the API being down, the hyperkube won't start the master services automatically. To do so, you can start the services manually with the docker command.

  • check the previous running containers
# docker ps -a | grep master-api

a71000045a3a   51f70394a454                                                            "/bin/bash -c '#!/..."     4 days ago Exited (2) 2 days ago k8s_api_master-api-my-cluster_kube-system_1ab1ce8dbbe107a24e4a04ff31f706fb_0
31156342d174   registry.redhat.io/openshift3/ose-pod:v3.11.420  "/usr/bin/pod"                4 days ago Exited (0) 2 days ago k8s_POD_master-api-my-cluster_kube-system_1ab1ce8dbbe107a24e4a04ff31f706fb_0
  • start the containers in order to start first the POD container and then the actual container (in case above from bottom to top).
# docker start 31156342d174 
# docker start a71000045a3a
  • check if the pods are running (2 containers should be running)
# docker ps | grep master-api

For the other masters, the hyperkube should be restarted to start the control plane services as at least 1 API server is up.

Root Cause

  • The API server certificate and other master node certificates were already expired or near the expiry date due to which the redeploy-certificates.yml playbook failed.

  • The Playbook checks if the API is running against the Load balancer URL. The playbook expects that at least 1 master is up and running correctly.

Diagnostic Steps

  • Check the expiry date of the certificates by running the playbooks/openshift-checks/certificate_expiry/easy-mode.yaml playbook which will generate a JSON and HTML report.

  • Manually check the expiry date of all the certificates by following the How to list all OpenShift TLS certificate expire date.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

1 Comments

In the step where the /etc/origin/master/openshift-master.crt certificate is created, would it not be cleaner to also use the /etc/origin/master/ca.serial.txt file to keep track of and use a unique serial number?