Recovering from expired control plane certificates

Latest response

This is from Red Hat Open Course Red Hat OpenShift Container Platform 4 Troubleshooting: Cluster Recovery

Firstly, I used a vm (classroom) to do

[user1@classroom ~]$ export KUBECONFIG=/home/user1/training1/auth/kubeconfig 
[user1@classroom ~]$ oc get nodes
Unable to connect to the server: x509: certificate has expired or is not yet valid

Then, I followed document to fix the problem and made sure that all CSRs are already approved.

But I still got the same error message when tried to oc get nodes from that vm.

I doubt whether using temporary kube-apiserver to approve CSRs has nothing to do with the original content of KUBECONFIG in tha vm at all?

If I would like to oc login to the cluster on that vm within valid certificate period, what should I do?

Responses

Same for me and my experience in this course.

Maybe someone experienced can help here?

https://stackoverflow.com/questions/59087871/openshift-4-2-unable-to-connect-to-the-server-x509-certificate-has-expired-o has a good answer for this; however you can follow: https://docs.openshift.com/container-platform/4.7/backup_and_restore/disaster_recovery/scenario-3-expired-certs.html to recover.

That said; it may be faster to re-install.

Hi Rick,

thanks for answering. The issue we are facing is in the context of the training "Red Hat OpenShift Container Platform 4 Troubleshooting: Cluster Recovery", which is linked in the original post.

Basically one has to follow the steps described in the Recovering from expired control plane certificates for 4.2. At least for me I can got through the steps and acknowledge the Pending certificates, but that won't fix the cluster as expected and also outlined in the Guided Solution of the training.

Thanks Thomas

I have the same issue.

After restarting kubelet

sudo systemctl stop kubelet sudo rm -rf /var/lib/kubelet/pki /var/lib/kubelet/kubeconfig sudo systemctl start kubelet

The certificate doesn't appear in /var/lib/kubelet/pki

Regards Radek

Hello Radek,

I finally made it through the class somehow. Unfortunately it was more like trial and error approach and I can't remember the exact steps that made it work. As far as I can remember I went along Recovering from expired control plane certificates for 4.2 doing the restart and remove of the kubelet things one server by another.

One thing I realized was that when it did not work, the new CSRs stayed in "Approved" state, where the new CSRs when working also went to "Approved,Issued" like the old ones.

Good luck, and maybe someone from RedHat wants to have a look at this?

Anyone landing here, if you are using the Red Hat Learning subscription, this is a paid product and therefore supported through tickets as well. You can submit tickets for abnormal behavior of systems etc for your use of the Red Hat Learning subscription. One solution that works for one individual may work for another (such as this example above), however, not in all cases (I'm told by the Red Hat person who deals with that service).

The fastest way to get Red Hat to look at this issue is to submit a ticket (if it is the Red Hat Learning subscription, or another supported product).

You can also register at learn.redhat.com for direct Learning Community support.

Regards,
RJ