Investigating OpenShift CSR Issues

Environment

Red Hat OpenShift Container Platform (RHOCP)
- 3

Issue

During an OCP 3.10+ installation, upgrade, or scaleup a certificate approval failure has occurred
"Could not find csr for nodes" when installing Openshift 3.11
One or more nodes have a "NotReady" status
Cannot see logs in console and oc logs, oc exec, etc gives "x509: certificate has expired or is not yet valid"

Resolution

Ensure that all pending CSRs are approved

oc get csr -o name | xargs oc adm certificate approve

Ensure that atomic-openshift-node service is running on all relevant nodes
Raw
```
systemctl status atomic-openshift-node
```

Ensure that the API server can proxy a request to the node's kubelet

oc get --raw /api/v1/nodes/${NAME}/proxy/healthz
/// alternative to check all
for i in $(oc get nodes --no-headers -o=custom-columns=NAME:.metadata.name); do printf "${i}\n"; oc get --raw /api/v1/nodes/${i}/proxy/healthz ; printf "\n"; done;

Diagnostic Steps

First, familiarize yourself with OpenShift Node TLS Bootstrapping
For more details on the creation the node certificates and bootstrap authentication see the KCS: Manually recreate OpenShift Node TLS bootstrapped certificates and kubeconfig files.

Review the journal of atomic-openshift-node for errors

Determine the status of node's client and server certificates

# ls -la /etc/origin/node/certificates/
-rw-------. 1 root root 1167 Nov  5 14:40 kubelet-client-2018-11-05-14-40-27.pem
lrwxrwxrwx. 1 root root   68 Nov  5 14:40 kubelet-client-current.pem -> /etc/origin/node/certificates/kubelet-client-2018-11-05-14-40-27.pem
-rw-------. 1 root root 1366 Nov  5 14:40 kubelet-server-2018-11-05-14-40-31.pem
lrwxrwxrwx. 1 root root   68 Nov  5 14:40 kubelet-server-current.pem -> /etc/origin/node/certificates/kubelet-server-2018-11-05-14-40-31.pem

If either kubelet-client-current.pem or kubelet-server-current.pem symlinks are missing check for pending CSRs, if necessary review them and approve them
Raw
```
oc get csr
oc adm certificate approve csr-ABCDEF
```
If both kubelet-client-current.pem and kubelet-server-current.pem symlinks are present it's likely that the check that proxies a request to the node's kubelet has failed due to external factors, the following command should indicate why that has failed.
Raw
```
oc get --loglevel=9 --raw /api/v1/nodes/${NAME}/proxy/healthz
```
Review the apiserver logs for indications of failure
Raw
```
/usr/local/bin/master-logs api api
```
Increase the logging verbosity of ansible and gather logs by adding '-vvv' and using 'tee' to save to a log and display on the console.
Raw
```
ansible-playbook -i INVENTORY PLAYBOOK_PATH  -vvv | tee ~/ansible.log
```

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Ansible.com

Red Hat Ecosystem Catalog

Red Hat Hybrid Cloud Console

Red Hat Store

Red Hat Summit and AnsibleFest

Environment

Issue

Resolution

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Environment

Issue

Resolution

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links