Monitoring ClusterOperator degraded due to frequent restart of Prometheus Pods in RHOCP4

Solution Verified - Updated 2024-06-13T20:05:18+00:00 -

Issue

Prometheus Pod not Running.
Monitoring ClusterOperator Degraded
Prometheus Pod fails Startup Probe

Prometheus Pod restarting frequently

$ oc get pods prometheus-k8s-0
NAME               READY   STATUS    RESTARTS      AGE
prometheus-k8s-0   5/6     Running   1 (10m ago)   29m

$ oc get -o yaml clusteroperator monitoring

- lastTransitionTime: "2022-XX-XXTXX:XX:XXZ"
message: Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error.
reason: UpdatingPrometheusK8SFailed
status: "False"
type: Available
- lastTransitionTime: "2022-XX-XXTXX:XX:XXZ"
message: Rolling out the stack.
reason: RollOutInProgress
status: "True"
type: Progressing
- lastTransitionTime: "2022-XX-XXTXX:XX:XXZ"
message: 'Failed to rollout the stack. Error: updating prometheus-k8s: waiting
  for Prometheus object changes failed: waiting for Prometheus openshift-monitoring/k8s:
  expected 2 replicas, got 1 available replicas'
reason: UpdatingPrometheusK8SFailed

$ oc logs prometheus-k8s-1 -c prometheus -n openshift-monitoring

2023-07-14T05:33:11.353103215Z ts=2023-07-14T05:33:11.352Z caller=head.go:488 level=info component=tsdb msg="Replaying on-disk memory mappable chunks if any"
2023-07-14T05:33:11.792886616Z ts=2023-07-14T05:33:11.792Z caller=head.go:513 level=error component=tsdb msg="Loading on-disk chunks failed" err="iterate on on-disk chunks: out of sequence m-mapped chunk for series ref 24956"
2023-07-14T05:33:11.792886616Z ts=2023-07-14T05:33:11.792Z caller=head.go:668 level=info component=tsdb msg="Deleting mmapped chunk files"
2023-07-14T05:33:11.792886616Z ts=2023-07-14T05:33:11.792Z caller=head.go:671 level=info component=tsdb msg="Deletion of mmap chunk files failed, discarding chunk files completely" err="cannot handle error: iterate on on-disk chunks: out of sequence m-mapped chunk for series ref 24956"
2023-07-14T05:33:11.792886616Z ts=2023-07-14T05:33:11.792Z caller=head.go:522 level=info component=tsdb msg="On-disk memory mappable chunks replay completed" duration=439.405389ms
2023-07-14T05:33:11.792886616Z ts=2023-07-14T05:33:11.792Z caller=head.go:528 level=info component=tsdb msg="Replaying WAL, this may take a while"

Environment

Red Hat OpenShift Container Platform (RHOCP)
- 4

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Select Your Language

Monitoring ClusterOperator degraded due to frequent restart of Prometheus Pods in RHOCP4

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links