tekton-resource-pruner job is failing and eventually causing KubeJobFailed alert to fire on OpenShift Container Platform 4
Issue
-
Every morning at about 08:15 UTC we are seeing the below Prometheus Alert firing because some
tekton-resource-pruner
job seems to fail.alertname = KubeJobFailed condition = true container = kube-rbac-proxy-main endpoint = https-main job = kube-state-metrics job_name = tekton-resource-pruner-lvjmq-27537600 namespace = openshift-pipelines openshift_io_alert_source = platform prometheus = openshift-monitoring/k8s service = kube-state-metrics severity = warning Annotations description = Job openshift-pipelines/tekton-resource-pruner-lvjmq-12345678 failed to complete. Removing failed job after investigation should clear this alert. runbook_url = https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/KubeJobFailed.md summary = Job failed to complete.
-
tekton-resource-pruner
job is failing and eventually causingKubeJobFailed
alert to fire on OpenShift Container Platform 4 - Automatic pruning of task run and pipeline run is failing
Environment
- Red Hat OpenShift Container Platform 4
- Red Hat OpenShift Pipelines 1.6.2 up to 1.8
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.