Compliance Operator scan fail with ocp4-cis-node-worker-* pods in 1/2 NotReady state
Environment
- Red Hat Openshift Container Platform
- 4.10
- Compliance Operator
- 0.1.59
Issue
ocp4-cis-node-worker-*
pods are in1/2 NotReady
state. These ocp4-cis-node-worker-* pods are supposed to be running on all nodes.- Compliance scans are running since more than 24 hours, however, the scan's aren't successful
- Deleting result-server and result-client secrets didn't fix the issue as these secrets gets recreated with old expired certificates
Resolution
Secret root-ca-ocp4-cis-node-worker
contains an old expired certificate. Delete this secret so that it gets recreated with a new valid certificate:
$ oc delete secret root-ca-ocp4-cis-node-worker
After deleting the above secret, all ocp4-cis-node-worker-*
pods get into Running state. Compliance scans also complete successfully.
Root Cause
The log-collector container for all failing ocp4-cis-node-worker-*
pods fail to upload results to server due to an expired certificate. The certificates of these pods are expired as the Compliance scan was running for more than 24 hours. The certificates of such pods are generated when scan is initiated and they expire within 24 hours.
Diagnostic Steps
-
Check if all
ocp4-cis-node-worker-*
pods are in NotReady state$ oc get pods -n openshift-compliance | grep -E 'ocp4-cis-node-worker-|^NAME' NAME READY STATUS RESTARTS AGE ocp4-cis-node-worker-infra01-xxxxxxx-xxxxx 1/2 NotReady 1 (3m24s ago) 9m1s ocp4-cis-node-worker-master01-xxxxxxx-xxxx 1/2 NotReady 1 (4m31s ago) 9m2s ocp4-cis-node-worker-worker01-xxxxxxx-xxxx 1/2 NotReady 1 (3m3s ago) 9m2s ocp4-cis-node-worker-worker02-xxxxxxx-xxxx 1/2 NotReady 1 (3m3s ago) 9m2s ocp4-cis-node-worker-worker03-xxxxxxx-xxxx 1/2 NotReady 1 (3m3s ago) 9m2s
-
Check if the log-collector container for the failing
ocp4-cis-node-worker-*
pods show the below message:$ oc logs ocp4-cis-node-worker-worker01-xxxxxxx-xxxx -c log-collector | grep 'certificate has expired' "msg":"Failed to upload results to server","error":"Post \"https://ocp4-cis-node-worker-rs:8443/\": x509: certificate has expired or is not yet valid: current time 2023-01-02T16:23:58Z is after 2022-12-07T01:00:29Z"
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments