Red Hat Customer Portal - Access to 24x7 support and knowledge

Hello community,

I am facing a challenging situation with my OKD cluster and would appreciate guidance from experienced members.

Server Version: 4.19.0-okd-scos.6
Kubernetes Version: v1.32.5-dirty

Background:

The cluster was down for an extended period.
During this time, many TLS certificates (API server, kubelet, kube-controller-manager, console, etc.) expired.
As a result, the components cannot communicate properly anymore.

What I have attempted:

I manually regenerated certificates for various components (API server, kubelet, kube-controller-manager).
I updated the corresponding secrets and static pod certificates.
I temporarily restored some functionality by manually updating CA bundles in config maps (e.g.,
Copied!
```
kubelet-serving-ca
```
) and restarting pods.
Some components, like the OpenShift console, now still fail to connect due to expired client certificates or CA mismatches.

Challenges:

It is unclear which certificates must be updated first to restore proper communication.
Some secrets are automatically recreated by operators, overwriting manual changes.
Directly accessing logs via
Copied!
```
kubectl
```
or
Copied!
```
oc logs
```
often fails due to certificate or authorization errors.
The cluster has a mix of static pods and operator-managed resources, making manual intervention complex.

Goal:
I would like to know the recommended or supported procedure for recovering an OKD cluster that has been down for a long period and now suffers widespread certificate expiration. Specifically:

Which certificates must be rotated first for a minimal viable cluster operation?
How to safely regenerate client and server certificates without causing further conflicts with operators?
Best practices for updating CA bundles so that all components trust new certificates.
How to avoid manual interventions being overwritten by operators.

Any guidance, documentation references, or examples from similar recovery scenarios would be immensely helpful.

Thank you in advance for your support!

Select Your Language

Infrastructure and Management

Cloud Computing

Storage

Runtimes

Integration and Automation

Recovering OKD Cluster After Long Downtime – TLS Certificates Expired

Responses

Quick Links

Help

Site Info

Related Sites

Red Hat legal and privacy links

Red Hat legal and privacy links