Recovering from kube-apiserver pod crashloop from bad certificates leading to cluster outage (OCP4)
Issue
- During automatic roll-out of certificates, it may be possible to find yourself in a situation in which you have lost access to the cluster due to empty secrets being published and the cluster cannot automatically recover.
- Application of certificates with null content can lead to kube-apiserver pods crash-looping on all 3 master nodes, leading to loss of
oc
commands and failure to recover via normal bypass procedures. - kube-apiserver pods are crashing with the following error logged when viewed via
crictl logs <kube-apiserver-container-id>
:
I1130 17:14:52.083740 18 server.go:220] Version: v1.21.11+5cc9227
I1130 17:14:52.084239 18 dynamic_serving_content.go:111] Loaded a new cert/key pair for "serving-cert::/etc/kubernetes/static-pod-certs/secrets/service-network-serving-certkey/tls.crt::/etc/kubernetes/static-pod-certs/secrets/service-network-serving-certkey/tls.key"
I1130 17:14:52.084436 18 dynamic_serving_content.go:111] Loaded a new cert/key pair for "sni-serving-cert::/etc/kubernetes/static-pod-certs/secrets/localhost-serving-cert-certkey/tls.crt::/etc/kubernetes/static-pod-certs/secrets/localhost-serving-cert-certkey/tls.key"
I1130 17:14:52.084712 18 dynamic_serving_content.go:111] Loaded a new cert/key pair for "sni-serving-cert::/etc/kubernetes/static-pod-certs/secrets/service-network-serving-certkey/tls.crt::/etc/kubernetes/static-pod-certs/secrets/service-network-serving-certkey/tls.key"
I1130 17:14:52.084990 18 dynamic_serving_content.go:111] Loaded a new cert/key pair for "sni-serving-cert::/etc/kubernetes/static-pod-certs/secrets/external-loadbalancer-serving-certkey/tls.crt::/etc/kubernetes/static-pod-certs/secrets/external-loadbalancer-serving-certkey/tls.key"
I1130 17:14:52.085274 18 dynamic_serving_content.go:111] Loaded a new cert/key pair for "sni-serving-cert::/etc/kubernetes/static-pod-certs/secrets/internal-loadbalancer-serving-certkey/tls.crt::/etc/kubernetes/static-pod-certs/secrets/internal-loadbalancer-serving-certkey/tls.key"
I1130 17:14:52.085541 18 dynamic_serving_content.go:111] Loaded a new cert/key pair for "sni-serving-cert::/etc/kubernetes/static-pod-resources/secrets/localhost-recovery-serving-certkey/tls.crt::/etc/kubernetes/static-pod-resources/secrets/localhost-recovery-serving-certkey/tls.key"
Error: failed to load SNI cert and key: tls: failed to parse private key
I1130 17:14:52.088051 1 main.go:198] Termination finished with exit code 1
I1130 17:14:52.088067 1 main.go:151] Deleting termination lock file "/var/log/kube-apiserver/.terminating"
Environment
- Red Hat OpenShift Container Platform (OCP) 4.x
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.