Failing to evaluate ceph.rules with "many-to-many matching not allowed" in RHOCP 4

Solution Verified - Updated -

Issue

  • The Prometheus pods are throwing the error:

    2024-12-04T21:31:30.125756093Z ts=2024-12-04T21:31:30.125Z caller=group.go:492 level=warn name=cluster:ceph_disk_latency:join_ceph_node_disk_irate1m index=1 component="rule manager" file=/etc/prometheus/rules/prometheus-k8s-rulefiles-0/openshift-storage-prometheus-ceph-rules-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.yaml group=ceph.rules msg="Evaluating rule failed" rule="record: cluster:ceph_disk_latency:join_ceph_node_disk_irate1m\nexpr: avg by (namespace) (topk by (ceph_daemon, namespace) (1, label_replace(label_replace(ceph_disk_occupation{job=\"rook-ceph-mgr\"},\n  \"instance\", \"$1\", \"exported_instance\", \"(.*)\"), \"device\", \"$1\", \"device\", \"/dev/(.*)\"))\n  * on (instance, device) group_right (ceph_daemon, namespace) topk by (instance,\n  device, namespace) (1, (irate(node_disk_read_time_seconds_total[1m]) + irate(node_disk_write_time_seconds_total[1m])\n  / (clamp_min(irate(node_disk_reads_completed_total[1m]), 1) + irate(node_disk_writes_completed_total[1m])))))\n" err="found duplicate series for the match group {device=\"sdb\"} on the left hand-side of the operation: [{__name__=\"ceph_disk_occupation\", ceph_daemon=\"osd.2\", container=\"mgr\", device=\"sdb\", device_ids=\"sdb=VMware_Virtual_disk_xxxx\", devices=\"sdb\", endpoint=\"http-metrics\", job=\"rook-ceph-mgr\", managedBy=\"ocs-storagecluster\", namespace=\"openshift-storage\", pod=\"rook-ceph-mgr-a-xxxxxxxxxx-xxxxx\", service=\"rook-ceph-mgr\"}, {__name__=\"ceph_disk_occupation\", ceph_daemon=\"osd.0\", container=\"mgr\", device=\"sdb\", device_ids=\"sdb=VMware_Virtual_disk_xxxx\", devices=\"sdb\", endpoint=\"http-metrics\", job=\"rook-ceph-mgr\", managedBy=\"ocs-storagecluster\", namespace=\"openshift-storage\", pod=\"rook-ceph-mgr-a-xxxxxxxxxx-xxxxx\", service=\"rook-ceph-mgr\"}];many-to-many matching not allowed: matching labels must be unique on one side"
    
  • Alert PrometheusRuleFailures triggered for the rule_group: /etc/prometheus/rules/prometheus-k8s-rulefiles-0/openshift-storage-prometheus-ceph-rules-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.yaml;ceph.rules

Environment

Red Hat OpenShift Container Platform (OCP) 4
Red Hat OpenShift Data Foundation (ODF) 4

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content