Recovering from kube-apiserver pod crashloop from bad certificates leading to cluster outage (OCP4)

Solution Verified - Updated -

Issue

  • During automatic roll-out of certificates, it may be possible to find yourself in a situation in which you have lost access to the cluster due to empty secrets being published and the cluster cannot automatically recover.
  • Application of certificates with null content can lead to kube-apiserver pods crash-looping on all 3 master nodes, leading to loss of oc commands and failure to recover via normal bypass procedures.
  • kube-apiserver pods are crashing with the following error logged when viewed via crictl logs <kube-apiserver-container-id>:
I1130 17:14:52.083740 18 server.go:220] Version: v1.21.11+5cc9227
I1130 17:14:52.084239 18 dynamic_serving_content.go:111] Loaded a new cert/key pair for "serving-cert::/etc/kubernetes/static-pod-certs/secrets/service-network-serving-certkey/tls.crt::/etc/kubernetes/static-pod-certs/secrets/service-network-serving-certkey/tls.key"
I1130 17:14:52.084436 18 dynamic_serving_content.go:111] Loaded a new cert/key pair for "sni-serving-cert::/etc/kubernetes/static-pod-certs/secrets/localhost-serving-cert-certkey/tls.crt::/etc/kubernetes/static-pod-certs/secrets/localhost-serving-cert-certkey/tls.key"
I1130 17:14:52.084712 18 dynamic_serving_content.go:111] Loaded a new cert/key pair for "sni-serving-cert::/etc/kubernetes/static-pod-certs/secrets/service-network-serving-certkey/tls.crt::/etc/kubernetes/static-pod-certs/secrets/service-network-serving-certkey/tls.key"
I1130 17:14:52.084990 18 dynamic_serving_content.go:111] Loaded a new cert/key pair for "sni-serving-cert::/etc/kubernetes/static-pod-certs/secrets/external-loadbalancer-serving-certkey/tls.crt::/etc/kubernetes/static-pod-certs/secrets/external-loadbalancer-serving-certkey/tls.key"
I1130 17:14:52.085274 18 dynamic_serving_content.go:111] Loaded a new cert/key pair for "sni-serving-cert::/etc/kubernetes/static-pod-certs/secrets/internal-loadbalancer-serving-certkey/tls.crt::/etc/kubernetes/static-pod-certs/secrets/internal-loadbalancer-serving-certkey/tls.key"
I1130 17:14:52.085541 18 dynamic_serving_content.go:111] Loaded a new cert/key pair for "sni-serving-cert::/etc/kubernetes/static-pod-resources/secrets/localhost-recovery-serving-certkey/tls.crt::/etc/kubernetes/static-pod-resources/secrets/localhost-recovery-serving-certkey/tls.key"
Error: failed to load SNI cert and key: tls: failed to parse private key
I1130 17:14:52.088051 1 main.go:198] Termination finished with exit code 1
I1130 17:14:52.088067 1 main.go:151] Deleting termination lock file "/var/log/kube-apiserver/.terminating"

Environment

  • Red Hat OpenShift Container Platform (OCP) 4.x

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content