Failing to evaluate ceph.rules with "many-to-many matching not allowed" in RHOCP 4
Issue
-
The Prometheus pods are throwing the error:
2024-12-04T21:31:30.125756093Z ts=2024-12-04T21:31:30.125Z caller=group.go:492 level=warn name=cluster:ceph_disk_latency:join_ceph_node_disk_irate1m index=1 component="rule manager" file=/etc/prometheus/rules/prometheus-k8s-rulefiles-0/openshift-storage-prometheus-ceph-rules-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.yaml group=ceph.rules msg="Evaluating rule failed" rule="record: cluster:ceph_disk_latency:join_ceph_node_disk_irate1m\nexpr: avg by (namespace) (topk by (ceph_daemon, namespace) (1, label_replace(label_replace(ceph_disk_occupation{job=\"rook-ceph-mgr\"},\n \"instance\", \"$1\", \"exported_instance\", \"(.*)\"), \"device\", \"$1\", \"device\", \"/dev/(.*)\"))\n * on (instance, device) group_right (ceph_daemon, namespace) topk by (instance,\n device, namespace) (1, (irate(node_disk_read_time_seconds_total[1m]) + irate(node_disk_write_time_seconds_total[1m])\n / (clamp_min(irate(node_disk_reads_completed_total[1m]), 1) + irate(node_disk_writes_completed_total[1m])))))\n" err="found duplicate series for the match group {device=\"sdb\"} on the left hand-side of the operation: [{__name__=\"ceph_disk_occupation\", ceph_daemon=\"osd.2\", container=\"mgr\", device=\"sdb\", device_ids=\"sdb=VMware_Virtual_disk_xxxx\", devices=\"sdb\", endpoint=\"http-metrics\", job=\"rook-ceph-mgr\", managedBy=\"ocs-storagecluster\", namespace=\"openshift-storage\", pod=\"rook-ceph-mgr-a-xxxxxxxxxx-xxxxx\", service=\"rook-ceph-mgr\"}, {__name__=\"ceph_disk_occupation\", ceph_daemon=\"osd.0\", container=\"mgr\", device=\"sdb\", device_ids=\"sdb=VMware_Virtual_disk_xxxx\", devices=\"sdb\", endpoint=\"http-metrics\", job=\"rook-ceph-mgr\", managedBy=\"ocs-storagecluster\", namespace=\"openshift-storage\", pod=\"rook-ceph-mgr-a-xxxxxxxxxx-xxxxx\", service=\"rook-ceph-mgr\"}];many-to-many matching not allowed: matching labels must be unique on one side" -
Alert PrometheusRuleFailures triggered for the
rule_group: /etc/prometheus/rules/prometheus-k8s-rulefiles-0/openshift-storage-prometheus-ceph-rules-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.yaml;ceph.rules
Environment
Red Hat OpenShift Container Platform (OCP) 4
Red Hat OpenShift Data Foundation (ODF) 4
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.