How to get more details about the cluster in an Alert sent from Alertmanager in OCP 3.11?
Issue
-
Prometheus Alerting Rules are generic and hence the alert should include more details of the component.
-
Below are two examples for the detailed explanation of issue:
- The alert is
NodeDiskRunningFull
and the alert description has pod name specified , but in a big cluster with 100's and 1000's of nodes, including aNode name
would be more informative to check for which particular node the alert is generated.
Labels alertname = NodeDiskRunningFull cluster = <hostname> device = /dev/mapper/apvg01-ap1000 namespace = openshift-monitoring pod = <node-exporter-pod-name> prometheus = openshift-monitoring/k8s severity = warning Annotations message = Device /dev/mapper/apvg01-ap1000 of node-exporter <openshift-monitoring/node-exporter-pod-name> is running full within the next 24 hours. Source
- The alert is
KubeAPIErrorsHigh
and the error is specific to APIserver but the endpoint field displays partial information.
Labels alertname = KubeAPIErrorsHigh client = openshift/v1.11.0+d4cacc0 (linux/amd64) kubernetes/d4cacc0/system:serviceaccount:openshift-infra:image-trigger-controller cluster = <hostname> code = 500 contentType = application/vnd.kubernetes.protobuf endpoint = https job = apiserver namespace = default prometheus = openshift-monitoring/k8s resource = buildconfigs scope = namespace service = kubernetes severity = critical subresource = instantiate verb = POST Annotations message = API server is erroring for 100% of requests. Source
- The alert is
Environment
-
Red Hat OpenShift Container Platform
- 3.11
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.