Investigating OpenShift CSR Issues
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 3
Issue
- During an OCP 3.10+ installation, upgrade, or scaleup a certificate approval failure has occurred
- "Could not find csr for nodes" when installing Openshift 3.11
- One or more nodes have a "NotReady" status
- Cannot see logs in console and oc logs, oc exec, etc gives "x509: certificate has expired or is not yet valid"
Resolution
-
Ensure that all pending CSRs are approved
oc get csr -o name | xargs oc adm certificate approve
-
Ensure that
atomic-openshift-node
service is running on all relevant nodessystemctl status atomic-openshift-node
-
Ensure that the API server can proxy a request to the node's kubelet
oc get --raw /api/v1/nodes/${NAME}/proxy/healthz /// alternative to check all for i in $(oc get nodes --no-headers -o=custom-columns=NAME:.metadata.name); do printf "${i}\n"; oc get --raw /api/v1/nodes/${i}/proxy/healthz ; printf "\n"; done;
Diagnostic Steps
-
First, familiarize yourself with OpenShift Node TLS Bootstrapping
-
For more details on the creation the node certificates and bootstrap authentication see the KCS: Manually recreate OpenShift Node TLS bootstrapped certificates and kubeconfig files.
-
Review the journal of
atomic-openshift-node
for errors -
Determine the status of node's client and server certificates
# ls -la /etc/origin/node/certificates/ -rw-------. 1 root root 1167 Nov 5 14:40 kubelet-client-2018-11-05-14-40-27.pem lrwxrwxrwx. 1 root root 68 Nov 5 14:40 kubelet-client-current.pem -> /etc/origin/node/certificates/kubelet-client-2018-11-05-14-40-27.pem -rw-------. 1 root root 1366 Nov 5 14:40 kubelet-server-2018-11-05-14-40-31.pem lrwxrwxrwx. 1 root root 68 Nov 5 14:40 kubelet-server-current.pem -> /etc/origin/node/certificates/kubelet-server-2018-11-05-14-40-31.pem
-
If either kubelet-client-current.pem or kubelet-server-current.pem symlinks are missing check for pending CSRs, if necessary review them and approve them
oc get csr oc adm certificate approve csr-ABCDEF
-
If both kubelet-client-current.pem and kubelet-server-current.pem symlinks are present it's likely that the check that proxies a request to the node's kubelet has failed due to external factors, the following command should indicate why that has failed.
oc get --loglevel=9 --raw /api/v1/nodes/${NAME}/proxy/healthz
-
Review the apiserver logs for indications of failure
/usr/local/bin/master-logs api api
-
Increase the logging verbosity of ansible and gather logs by adding '-vvv' and using 'tee' to save to a log and display on the console.
ansible-playbook -i INVENTORY PLAYBOOK_PATH -vvv | tee ~/ansible.log
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments