Compliance Operator scan fail with ocp4-cis-node-worker-* pods in 1/2 NotReady state

Solution Verified - Updated 2024-06-13T19:48:27+00:00 -

Environment

Red Hat Openshift Container Platform
- 4.10
Compliance Operator
- 0.1.59

Issue

ocp4-cis-node-worker-* pods are in 1/2 NotReady state. These ocp4-cis-node-worker-* pods are supposed to be running on all nodes.
Compliance scans are running since more than 24 hours, however, the scan's aren't successful
Deleting result-server and result-client secrets didn't fix the issue as these secrets gets recreated with old expired certificates

Resolution

Secret root-ca-ocp4-cis-node-worker contains an old expired certificate. Delete this secret so that it gets recreated with a new valid certificate:

$ oc delete secret root-ca-ocp4-cis-node-worker

After deleting the above secret, all ocp4-cis-node-worker-* pods get into Running state. Compliance scans also complete successfully.

Root Cause

The log-collector container for all failing ocp4-cis-node-worker-* pods fail to upload results to server due to an expired certificate. The certificates of these pods are expired as the Compliance scan was running for more than 24 hours. The certificates of such pods are generated when scan is initiated and they expire within 24 hours.

Diagnostic Steps

Check if all ocp4-cis-node-worker-* pods are in NotReady state

$ oc get pods -n openshift-compliance | grep -E 'ocp4-cis-node-worker-|^NAME'
NAME                                                     READY   STATUS      RESTARTS        AGE
ocp4-cis-node-worker-infra01-xxxxxxx-xxxxx               1/2     NotReady    1 (3m24s ago)   9m1s
ocp4-cis-node-worker-master01-xxxxxxx-xxxx               1/2     NotReady    1 (4m31s ago)   9m2s
ocp4-cis-node-worker-worker01-xxxxxxx-xxxx               1/2     NotReady    1 (3m3s ago)    9m2s
ocp4-cis-node-worker-worker02-xxxxxxx-xxxx               1/2     NotReady    1 (3m3s ago)    9m2s
ocp4-cis-node-worker-worker03-xxxxxxx-xxxx               1/2     NotReady    1 (3m3s ago)    9m2s

Check if the log-collector container for the failing ocp4-cis-node-worker-* pods show the below message:

$ oc logs ocp4-cis-node-worker-worker01-xxxxxxx-xxxx -c log-collector | grep 'certificate has expired'
"msg":"Failed to upload results to server","error":"Post \"https://ocp4-cis-node-worker-rs:8443/\": x509: certificate has expired or is not yet valid: current time 2023-01-02T16:23:58Z is after 2022-12-07T01:00:29Z"

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Select Your Language

Compliance Operator scan fail with ocp4-cis-node-worker-* pods in 1/2 NotReady state

Environment

Issue

Resolution

Root Cause

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Environment

Issue

Resolution

Root Cause

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links