Monitoring ClusterOperator degraded due to frequent restart of Prometheus Pods in RHOCP4
Issue
PrometheusPod notRunning.MonitoringClusterOperatorDegradedPrometheusPod failsStartup Probe-
PrometheusPodrestartingfrequently$ oc get pods prometheus-k8s-0 NAME READY STATUS RESTARTS AGE prometheus-k8s-0 5/6 Running 1 (10m ago) 29m $ oc get -o yaml clusteroperator monitoring - lastTransitionTime: "2022-XX-XXTXX:XX:XXZ" message: Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error. reason: UpdatingPrometheusK8SFailed status: "False" type: Available - lastTransitionTime: "2022-XX-XXTXX:XX:XXZ" message: Rolling out the stack. reason: RollOutInProgress status: "True" type: Progressing - lastTransitionTime: "2022-XX-XXTXX:XX:XXZ" message: 'Failed to rollout the stack. Error: updating prometheus-k8s: waiting for Prometheus object changes failed: waiting for Prometheus openshift-monitoring/k8s: expected 2 replicas, got 1 available replicas' reason: UpdatingPrometheusK8SFailed $ oc logs prometheus-k8s-1 -c prometheus -n openshift-monitoring 2023-07-14T05:33:11.353103215Z ts=2023-07-14T05:33:11.352Z caller=head.go:488 level=info component=tsdb msg="Replaying on-disk memory mappable chunks if any" 2023-07-14T05:33:11.792886616Z ts=2023-07-14T05:33:11.792Z caller=head.go:513 level=error component=tsdb msg="Loading on-disk chunks failed" err="iterate on on-disk chunks: out of sequence m-mapped chunk for series ref 24956" 2023-07-14T05:33:11.792886616Z ts=2023-07-14T05:33:11.792Z caller=head.go:668 level=info component=tsdb msg="Deleting mmapped chunk files" 2023-07-14T05:33:11.792886616Z ts=2023-07-14T05:33:11.792Z caller=head.go:671 level=info component=tsdb msg="Deletion of mmap chunk files failed, discarding chunk files completely" err="cannot handle error: iterate on on-disk chunks: out of sequence m-mapped chunk for series ref 24956" 2023-07-14T05:33:11.792886616Z ts=2023-07-14T05:33:11.792Z caller=head.go:522 level=info component=tsdb msg="On-disk memory mappable chunks replay completed" duration=439.405389ms 2023-07-14T05:33:11.792886616Z ts=2023-07-14T05:33:11.792Z caller=head.go:528 level=info component=tsdb msg="Replaying WAL, this may take a while"
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.