Monitoring ClusterOperator degraded due to frequent restart of Prometheus Pods in RHOCP4

Solution Verified - Updated -

Issue

  • Prometheus Pod not Running.
  • Monitoring ClusterOperator Degraded
  • Prometheus Pod fails Startup Probe
  • Prometheus Pod restarting frequently

    $ oc get pods prometheus-k8s-0
    NAME               READY   STATUS    RESTARTS      AGE
    prometheus-k8s-0   5/6     Running   1 (10m ago)   29m
    
    $ oc get -o yaml clusteroperator monitoring
    
    - lastTransitionTime: "2022-XX-XXTXX:XX:XXZ"
    message: Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error.
    reason: UpdatingPrometheusK8SFailed
    status: "False"
    type: Available
    - lastTransitionTime: "2022-XX-XXTXX:XX:XXZ"
    message: Rolling out the stack.
    reason: RollOutInProgress
    status: "True"
    type: Progressing
    - lastTransitionTime: "2022-XX-XXTXX:XX:XXZ"
    message: 'Failed to rollout the stack. Error: updating prometheus-k8s: waiting
      for Prometheus object changes failed: waiting for Prometheus openshift-monitoring/k8s:
      expected 2 replicas, got 1 available replicas'
    reason: UpdatingPrometheusK8SFailed
    
    $ oc logs prometheus-k8s-1 -c prometheus -n openshift-monitoring
    
    2023-07-14T05:33:11.353103215Z ts=2023-07-14T05:33:11.352Z caller=head.go:488 level=info component=tsdb msg="Replaying on-disk memory mappable chunks if any"
    2023-07-14T05:33:11.792886616Z ts=2023-07-14T05:33:11.792Z caller=head.go:513 level=error component=tsdb msg="Loading on-disk chunks failed" err="iterate on on-disk chunks: out of sequence m-mapped chunk for series ref 24956"
    2023-07-14T05:33:11.792886616Z ts=2023-07-14T05:33:11.792Z caller=head.go:668 level=info component=tsdb msg="Deleting mmapped chunk files"
    2023-07-14T05:33:11.792886616Z ts=2023-07-14T05:33:11.792Z caller=head.go:671 level=info component=tsdb msg="Deletion of mmap chunk files failed, discarding chunk files completely" err="cannot handle error: iterate on on-disk chunks: out of sequence m-mapped chunk for series ref 24956"
    2023-07-14T05:33:11.792886616Z ts=2023-07-14T05:33:11.792Z caller=head.go:522 level=info component=tsdb msg="On-disk memory mappable chunks replay completed" duration=439.405389ms
    2023-07-14T05:33:11.792886616Z ts=2023-07-14T05:33:11.792Z caller=head.go:528 level=info component=tsdb msg="Replaying WAL, this may take a while"
    

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content