Chapter 10. Scaling Cluster Monitoring Operator
10.1. Overview
OpenShift Container Platform exposes metrics that can be collected and stored in back-ends by the cluster-monitoring-operator. As an OpenShift Container Platform administrator, you can view system resources, containers and components metrics in one dashboard interface, Grafana.
This topic provides information on scaling the cluster monitoring operator.
10.2. Recommendations for OpenShift Container Platform
- Use at least three infrastructure (infra) nodes.
- Use at least three openshift-container-storage nodes with non-volatile memory express (NVMe) drives.
-
Use persistent storage when configuring openshift-monitoring. Set
openshift_cluster_monitoring_operator_prometheus_storage_enabled=true. - Use GlusterFS as storage on top of NVMe drives.
10.3. Capacity Planning for Cluster Monitoring Operator
Various tests were performed for different scale sizes. The Prometheus database grew, as reflected in the table below.
Table 10.1. Prometheus Database storage requirements based on number of nodes/pods in the cluster
| Number of Nodes | Number of Pods | Prometheus storage growth per day | Prometheus storage growth per 15 days | RAM Space (per scale size) | Network (per tsdb chunk) |
|---|---|---|---|---|---|
| 50 | 1800 | 6.3 GB | 94 GB | 6 GB | 16 MB |
| 100 | 3600 | 13 GB | 195 GB | 10 GB | 26 MB |
| 150 | 5400 | 19 GB | 283 GB | 12 GB | 36 MB |
| 200 | 7200 | 25 GB | 375 GB | 14 GB | 46 MB |
In the above calculation, approximately 20 percent of the expected size was added as overhead to ensure that the storage requirements do not exceed the calculated value.
The above calculation was developed for the default OpenShift Container Platform cluster-monitoring-operator. For higher scale, edit the openshift_cluster_monitoring_operator_prometheus_storage_capacity variable in the Ansible inventory file, which defaults to 50Gi.
CPU utilization has minor impact. The ratio is approximately 1 core out of 40 per 50 nodes and 1800 pods.
10.3.1. Lab Environment
All experiments were performed in an OpenShift Container Platform on OpenStack environment:
- Infra nodes (VMs) - 40 cores, 157 GB RAM.
- CNS nodes (VMs) - 16 cores, 62 GB RAM, NVMe drives.
10.3.2. Prerequisites
Based on you scale destination, compute and set the relevant PV size for the Prometheus data store. Since the default Prometheus pods replicas is 2, for 100 nodes with 3600 pods you will need 188 GB.
For example:
94 GB (space per 15 days ) * 2 (pods) = 188 GB free
Based on this equation, set openshift_cluster_monitoring_operator_prometheus_storage_capacity=94Gi.
10.3.3. Scaling the Prometheus Components
To scale up the number of OpenShift Container Platform Prometheus pods replicas, run:
# oc scale -n openshift-monitoring --replicas=3 statefulset prometheus-k8s
- The default replicas are 2 Prometheus pods.
- If you add a new node to or remove an existing node from a Prometheus cluster, the data stored in the cluster rebalances across the cluster.
To scale down:
# oc scale -n openshift-monitoring --replicas=0 statefulset prometheus-k8s

Where did the comment section go?
Red Hat's documentation publication system recently went through an upgrade to enable speedier, more mobile-friendly content. We decided to re-evaluate our commenting platform to ensure that it meets your expectations and serves as an optimal feedback mechanism. During this redesign, we invite your input on providing feedback on Red Hat documentation via the discussion platform.