Chapter 5. Using operational features of Service Telemetry Framework
You can use the following operational features to provide additional functionality to the Service Telemetry Framework (STF):
- Configuring dashboards
- Configuring the metrics retention time period
- Configuring alerts
- Configuring SNMP traps
- Configuring high availability
- Configuring ephemeral storage
- Creating a route in Red Hat OpenShift Container Platform
- Monitoring the resource use of OpenStack services
- Monitoring container health and API status
5.1. Dashboards in Service Telemetry Framework
Use the third-party application, Grafana, to visualize system-level metrics that collectd and Ceilometer gathers for each individual host node.
For more information about configuring collectd, see Section 4.1, “Deploying Red Hat OpenStack Platform overcloud for Service Telemetry Framework”.
You can use two dashboards to monitor a cloud:
- Infrastructure dashboard
- Use the infrastructure dashboard to view metrics for a single node at a time. Select a node from the upper left corner of the dashboard.
- Cloud view dashboard
Use the cloud view dashboard to view panels to monitor service resource usage, API stats, and cloud events. You must enable API health monitoring and service monitoring to provide the data for this dashboard. API health monitoring is enabled by default in the STF base configuration. For more information, see Section 4.1.2, “Creating the base configuration for STF”.
- For more information about API health monitoring, see Section 5.9, “Red Hat OpenStack Platform API status and containerized services health”.
- For more information about RHOSP service monitoring, see Section 5.8, “Resource usage of Red Hat OpenStack Platform services”.
5.1.1. Configuring Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you must deploy the Grafana Operator from OperatorHub.io. When you use the Service Telemetry Operator to deploy Grafana, it results in a Grafana instance and the configuration of the default data sources for the local STF deployment.
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the
service-telemetry
namespace:$ oc project service-telemetry
Deploy the Grafana operator:
$ oc apply -f - <<EOF apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: grafana-operator namespace: service-telemetry spec: channel: alpha installPlanApproval: Automatic name: grafana-operator source: operatorhubio-operators sourceNamespace: openshift-marketplace EOF
Verify that the Operator launched successfully. In the command output, if the value of the
PHASE
column isSucceeded
, the Operator launched successfully:$ oc get csv --selector operators.coreos.com/grafana-operator.service-telemetry NAME DISPLAY VERSION REPLACES PHASE grafana-operator.v3.10.3 Grafana Operator 3.10.3 grafana-operator.v3.10.2 Succeeded
To launch a Grafana instance, create or modify the
ServiceTelemetry
object. Setgraphing.enabled
andgraphing.grafana.ingressEnabled
totrue
:$ oc edit stf default apiVersion: infra.watch/v1beta1 kind: ServiceTelemetry ... spec: ... graphing: enabled: true grafana: ingressEnabled: true
Verify that the Grafana instance deployed:
$ oc get pod -l app=grafana NAME READY STATUS RESTARTS AGE grafana-deployment-7fc7848b56-sbkhv 1/1 Running 0 1m
Verify that the Grafana data sources installed correctly:
$ oc get grafanadatasources NAME AGE default-datasources 20h
Verify that the Grafana route exists:
$ oc get route grafana-route NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD grafana-route grafana-route-service-telemetry.apps.infra.watch grafana-service 3000 edge None
5.1.2. Overriding the default Grafana container image
The dashboards in Service Telemetry Framework (STF) require features that are available only in Grafana version 8.1.0 and later. By default, the Service Telemetry Operator installs a compatible version. You can override the base Grafana image by specifying the image path to an image registry with graphing.grafana.baseImage
.
Procedure
Ensure that you have the correct version of Grafana:
$ oc get pod -l "app=grafana" -ojsonpath='{.items[0].spec.containers[0].image}' docker.io/grafana/grafana:7.3.10
If the running image is older than 8.1.0, patch the ServiceTelemetry object to update the image. Service Telemetry Operator updates the Grafana manifest, which restarts the Grafana deployment:
$ oc patch stf/default --type merge -p '{"spec":{"graphing":{"grafana":{"baseImage":"docker.io/grafana/grafana:8.1.5"}}}}'
Verify that a new Grafana pod exists and has a
STATUS
value ofRunning
:$ oc get pod -l "app=grafana" NAME READY STATUS RESTARTS AGE grafana-deployment-fb9799b58-j2hj2 1/1 Running 0 10s
Verify that the new instance is running the updated image:
$ oc get pod -l "app=grafana" -ojsonpath='{.items[0].spec.containers[0].image}' docker.io/grafana/grafana:8.1.0
5.1.3. Importing dashboards
The Grafana Operator can import and manage dashboards by creating GrafanaDashboard
objects. You can view example dashboards at https://github.com/infrawatch/dashboards.
Procedure
Import the infrastructure dashboard:
$ oc apply -f https://raw.githubusercontent.com/infrawatch/dashboards/master/deploy/stf-1.3/rhos-dashboard.yaml grafanadashboard.integreatly.org/rhos-dashboard-1.3 created
Import the cloud dashboard:
WarningFor some panels in the cloud dashboard, you must set the value of the collectd
virt
plugin parameterhostname_format
toname uuid hostname
in thestf-connectors.yaml
file. If you do not configure this parameter, affected dashboards remain empty. For more information about thevirt
plugin, see collectd plugins.$ oc apply -f https://raw.githubusercontent.com/infrawatch/dashboards/master/deploy/stf-1.3/rhos-cloud-dashboard.yaml grafanadashboard.integreatly.org/rhos-cloud-dashboard-1.3 created
Import the cloud events dashboard:
$ oc apply -f https://raw.githubusercontent.com/infrawatch/dashboards/master/deploy/stf-1.3/rhos-cloudevents-dashboard.yaml grafanadashboard.integreatly.org/rhos-cloudevents-dashboard created
Import the virtual machine dashboard:
$ oc apply -f https://raw.githubusercontent.com/infrawatch/dashboards/master/deploy/stf-1.3/virtual-machine-view.yaml grafanadashboard.integreatly.org/virtual-machine-view-1.3 configured
Import the memcached dashboard:
$ oc apply -f https://raw.githubusercontent.com/infrawatch/dashboards/master/deploy/stf-1.3/memcached-dashboard.yaml grafanadashboard.integreatly.org/memcached-dashboard-1.3 created
Verify that the dashboards are available:
$ oc get grafanadashboards NAME AGE memcached-dashboard-1.3 115s rhos-cloud-dashboard-1.3 2m12s rhos-cloudevents-dashboard 2m6s rhos-dashboard-1.3 2m17s virtual-machine-view-1.3 2m
Retrieve the Grafana route address:
$ oc get route grafana-route -ojsonpath='{.spec.host}' grafana-route-service-telemetry.apps.infra.watch
- In a web browser, navigate to https://<grafana_route_address>. Replace <grafana_route_address> with the value that you retrieved in the previous step.
- To view the dashboard, click Dashboards and Manage.
5.1.4. Retrieving and setting Grafana login credentials
Service Telemetry Framework (STF) sets default login credentials when Grafana is enabled. You can override the credentials in the ServiceTelemetry
object.
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the
service-telemetry
namespace:$ oc project service-telemetry
To retrieve the default username and password, describe the Grafana object:
$ oc describe grafana default
-
To modify the default values of the Grafana administrator username and password through the ServiceTelemetry object, use the
graphing.grafana.adminUser
andgraphing.grafana.adminPassword
parameters.
5.2. Metrics retention time period in Service Telemetry Framework
The default retention time for metrics stored in Service Telemetry Framework (STF) is 24 hours, which provides enough data for trends to develop for the purposes of alerting.
For long-term storage, use systems designed for long-term data retention, for example, Thanos.
Additional resources
- To adjust STF for additional metrics retention time, see Section 5.2.1, “Editing the metrics retention time period in Service Telemetry Framework”.
- For recommendations about Prometheus data storage and estimating storage space, see https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects
- For more information about Thanos, see https://thanos.io/
5.2.1. Editing the metrics retention time period in Service Telemetry Framework
You can adjust Service Telemetry Framework (STF) for additional metrics retention time.
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the service-telemetry namespace:
$ oc project service-telemetry
Edit the ServiceTelemetry object:
$ oc edit stf default
Add
retention: 7d
to the storage section of backends.metrics.prometheus.storage to increase the retention period to seven days:NoteIf you set a long retention period, retrieving data from heavily populated Prometheus systems can result in queries returning results slowly.
apiVersion: infra.watch/v1beta1 kind: ServiceTelemetry metadata: name: stf-default namespace: service-telemetry spec: ... backends: metrics: prometheus: enabled: true storage: strategy: persistent retention: 7d ...
- Save your changes and close the object.
Additional resources
- For more information about the metrics retention time, see Section 5.2, “Metrics retention time period in Service Telemetry Framework”.
5.3. Alerts in Service Telemetry Framework
You create alert rules in Prometheus and alert routes in Alertmanager. Alert rules in Prometheus servers send alerts to an Alertmanager, which manages the alerts. Alertmanager can silence, inhibit, or aggregate alerts, and send notifications by using email, on-call notification systems, or chat platforms.
To create an alert, complete the following tasks:
- Create an alert rule in Prometheus. For more information, see Section 5.3.1, “Creating an alert rule in Prometheus”.
Create an alert route in Alertmanager. There are two ways in which you can create an alert route:
Additional resources
For more information about alerts or notifications with Prometheus and Alertmanager, see https://prometheus.io/docs/alerting/overview/
To view an example set of alerts that you can use with Service Telemetry Framework (STF), see https://github.com/infrawatch/service-telemetry-operator/tree/master/deploy/alerts
5.3.1. Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications. If the rule condition returns an empty result set, the condition is false. Otherwise, the rule is true and it triggers an alert.
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the
service-telemetry
namespace:$ oc project service-telemetry
Create a
PrometheusRule
object that contains the alert rule. The Prometheus Operator loads the rule into Prometheus:$ oc apply -f - <<EOF apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: creationTimestamp: null labels: prometheus: default role: alert-rules name: prometheus-alarm-rules namespace: service-telemetry spec: groups: - name: ./openstack.rules rules: - alert: Collectd metrics receive rate is zero expr: rate(sg_total_collectd_msg_received_count[1m]) == 0 1 EOF
- 1
- To change the rule, edit the value of the
expr
parameter.
To verify that the Operator loaded the rules into Prometheus, create a pod with access to
curl
:$ oc run curl --image=radial/busyboxplus:curl -i --tty
Run the
curl
command to access theprometheus-operated
service to return the rules loaded into memory:[ root@curl:/ ]$ curl prometheus-operated:9090/api/v1/rules {"status":"success","data":{"groups":[{"name":"./openstack.rules","file":"/etc/prometheus/rules/prometheus-default-rulefiles-0/service-telemetry-prometheus-alarm-rules.yaml","rules":[{"state":"inactive","name":"Collectd metrics receive rate is zero","query":"rate(sg_total_collectd_msg_received_count[1m]) == 0","duration":0,"labels":{},"annotations":{},"alerts":[],"health":"ok","evaluationTime":0.000525886,"lastEvaluation":"2022-02-01T17:42:52.161007803Z","type":"alerting"}],"interval":30,"limit":0,"evaluationTime":0.000541524,"lastEvaluation":"2022-02-01T17:42:52.161000138Z"}]}}
To verify that the output shows the rules loaded into the
PrometheusRule
object, for example the output contains the defined./openstack.rules
, exit the pod:[ root@curl:/ ]$ exit
Clean up the environment by deleting the
curl
pod:$ oc delete pod curl pod "curl" deleted
Additional resources
- For more information on alerting, see https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/alerting.md
5.3.2. Configuring custom alerts
You can add custom alerts to the PrometheusRule
object that you created in Section 5.3.1, “Creating an alert rule in Prometheus”.
Procedure
Use the
oc edit
command:$ oc edit prometheusrules prometheus-alarm-rules
-
Edit the
PrometheusRules
manifest. - Save and close the manifest.
Additional resources
- For more information about how to configure alerting rules, see https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/.
- For more information about PrometheusRules objects, see https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/alerting.md
5.3.3. Creating a standard alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system, such as email, IRC, or other notification channel. The Prometheus Operator manages the Alertmanager configuration as a Red Hat OpenShift Container Platform secret. By default, Service Telemetry Framework (STF) deploys a basic configuration that results in no receivers:
alertmanager.yaml: |- global: resolve_timeout: 5m route: group_by: ['job'] group_wait: 30s group_interval: 5m repeat_interval: 12h receiver: 'null' receivers: - name: 'null'
To deploy a custom Alertmanager route with STF, you must pass an alertmanagerConfigManifest
parameter to the Service Telemetry Operator that results in an updated secret, managed by the Prometheus Operator.
If your alertmanagerConfigManifest
contains a custom template to construct the title and text of the sent alert, deploy the contents of the alertmanagerConfigManifest
using a base64-encoded configuration. For more information, see Section 5.3.4, “Creating an alert route with templating in Alertmanager”.
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the
service-telemetry
namespace:$ oc project service-telemetry
Edit the
ServiceTelemetry
object for your STF deployment:$ oc edit stf default
Add the new parameter
alertmanagerConfigManifest
and theSecret
object contents to define thealertmanager.yaml
configuration for Alertmanager:NoteThis step loads the default template that the Service Telemetry Operator manages. To verify that the changes are populating correctly, change a value, return the
alertmanager-default
secret, and verify that the new value is loaded into memory. For example, change the value of the parameterglobal.resolve_timeout
from5m
to10m
.apiVersion: infra.watch/v1beta1 kind: ServiceTelemetry metadata: name: default namespace: service-telemetry spec: backends: metrics: prometheus: enabled: true alertmanagerConfigManifest: | apiVersion: v1 kind: Secret metadata: name: 'alertmanager-default' namespace: 'service-telemetry' type: Opaque stringData: alertmanager.yaml: |- global: resolve_timeout: 10m route: group_by: ['job'] group_wait: 30s group_interval: 5m repeat_interval: 12h receiver: 'null' receivers: - name: 'null'
Verify that the configuration has been applied to the secret:
$ oc get secret alertmanager-default -o go-template='{{index .data "alertmanager.yaml" | base64decode }}' global: resolve_timeout: 10m route: group_by: ['job'] group_wait: 30s group_interval: 5m repeat_interval: 12h receiver: 'null' receivers: - name: 'null'
To verify the configuration is loaded into Alertmanager, create a pod with access to
curl
:$ oc run curl --image=radial/busyboxplus:curl -i --tty
Run the
curl
command against thealertmanager-operated
service to retrieve the status andconfigYAML
contents, and verify that the supplied configuration matches the configuration in Alertmanager:[ root@curl:/ ]$ curl alertmanager-operated:9093/api/v1/status {"status":"success","data":{"configYAML":"...",...}}
-
Verify that the
configYAML
field contains the changes you expect. Exit the pod:
[ root@curl:/ ]$ exit
To clean up the environment, delete the
curl
pod:$ oc delete pod curl pod "curl" deleted
Additional resources
- For more information about the Red Hat OpenShift Container Platform secret and the Prometheus operator, see Prometheus user guide on alerting.
5.3.4. Creating an alert route with templating in Alertmanager
Use Alertmanager to deliver alerts to an external system, such as email, IRC, or other notification channel. The Prometheus Operator manages the Alertmanager configuration as a Red Hat OpenShift Container Platform secret. By default, Service Telemetry Framework (STF) deploys a basic configuration that results in no receivers:
alertmanager.yaml: |- global: resolve_timeout: 5m route: group_by: ['job'] group_wait: 30s group_interval: 5m repeat_interval: 12h receiver: 'null' receivers: - name: 'null'
If the alertmanagerConfigManifest
parameter contains a custom template, for example, to construct the title and text of the sent alert, deploy the contents of the alertmanagerConfigManifest
by using a base64-encoded configuration.
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the
service-telemetry
namespace:$ oc project service-telemetry
Edit the
ServiceTelemetry
object for your STF deployment:$ oc edit stf default
To deploy a custom Alertmanager route with STF, you must pass an
alertmanagerConfigManifest
parameter to the Service Telemetry Operator that results in an updated secret that is managed by the Prometheus Operator:apiVersion: infra.watch/v1beta1 kind: ServiceTelemetry metadata: name: default namespace: service-telemetry spec: backends: metrics: prometheus: enabled: true alertmanagerConfigManifest: | apiVersion: v1 kind: Secret metadata: name: 'alertmanager-default' namespace: 'service-telemetry' type: Opaque data: alertmanager.yaml: Z2xvYmFsOgogIHJlc29sdmVfdGltZW91dDogMTBtCiAgc2xhY2tfYXBpX3VybDogPHNsYWNrX2FwaV91cmw+CnJlY2VpdmVyczoKICAtIG5hbWU6IHNsYWNrCiAgICBzbGFja19jb25maWdzOgogICAgLSBjaGFubmVsOiAjc3RmLWFsZXJ0cwogICAgICB0aXRsZTogfC0KICAgICAgICAuLi4KICAgICAgdGV4dDogPi0KICAgICAgICAuLi4Kcm91dGU6CiAgZ3JvdXBfYnk6IFsnam9iJ10KICBncm91cF93YWl0OiAzMHMKICBncm91cF9pbnRlcnZhbDogNW0KICByZXBlYXRfaW50ZXJ2YWw6IDEyaAogIHJlY2VpdmVyOiAnc2xhY2snCg==
Verify that the configuration has been applied to the secret:
$ oc get secret alertmanager-default -o go-template='{{index .data "alertmanager.yaml" | base64decode }}' global: resolve_timeout: 10m slack_api_url: <slack_api_url> receivers: - name: slack slack_configs: - channel: #stf-alerts title: |- ... text: >- ... route: group_by: ['job'] group_wait: 30s group_interval: 5m repeat_interval: 12h receiver: 'slack'
To verify that the configuration loaded into Alertmanager, create a pod with access to the
curl
command:$ oc run curl --image=radial/busyboxplus:curl -i --tty
Run the
curl
command against thealertmanager-operated
service to retrieve the status andconfigYAML
contents, and verify that the supplied configuration matches the configuration in Alertmanager:[ root@curl:/ ]$ curl alertmanager-operated:9093/api/v1/status {"status":"success","data":{"configYAML":"...",...}}
-
Verify that the
configYAML
field contains the changes you expect. Exit the pod:
[ root@curl:/ ]$ exit
To clean up the environment, delete the
curl
pod:$ oc delete pod curl pod "curl" deleted
Additional resources
- For more information about the Red Hat OpenShift Container Platform secret and the Prometheus operator, see Prometheus user guide on alerting.
5.4. Configuring SNMP traps
You can integrate Service Telemetry Framework (STF) with an existing infrastructure monitoring platform that receives notifications through SNMP traps. To enable SNMP traps, modify the ServiceTelemetry
object and configure the snmpTraps
parameters.
For more information about configuring alerts, see Section 5.3, “Alerts in Service Telemetry Framework”.
Prerequisites
- Know the IP address or hostname of the SNMP trap receiver where you want to send the alerts
Procedure
To enable SNMP traps, modify the
ServiceTelemetry
object:$ oc edit stf default
Set the
alerting.alertmanager.receivers.snmpTraps
parameters:apiVersion: infra.watch/v1beta1 kind: ServiceTelemetry ... spec: ... alerting: alertmanager: receivers: snmpTraps: enabled: true target: 10.10.10.10
-
Ensure that you set the value of
target
to the IP address or hostname of the SNMP trap receiver.
5.5. High availability
With high availability, Service Telemetry Framework (STF) can rapidly recover from failures in its component services. Although Red Hat OpenShift Container Platform restarts a failed pod if nodes are available to schedule the workload, this recovery process might take more than one minute, during which time events and metrics are lost. A high availability configuration includes multiple copies of STF components, which reduces recovery time to approximately 2 seconds. To protect against failure of an Red Hat OpenShift Container Platform node, deploy STF to an Red Hat OpenShift Container Platform cluster with three or more nodes.
STF is not yet a fully fault tolerant system. Delivery of metrics and events during the recovery period is not guaranteed.
Enabling high availability has the following effects:
- Three ElasticSearch pods run instead of the default one.
The following components run two pods instead of the default one:
- AMQ Interconnect
- Alertmanager
- Prometheus
- Events Smart Gateway
- Metrics Smart Gateway
- Recovery time from a lost pod in any of these services reduces to approximately 2 seconds.
5.5.1. Configuring high availability
To configure Service Telemetry Framework (STF) for high availability, add highAvailability.enabled: true
to the ServiceTelemetry object in Red Hat OpenShift Container Platform. You can set this parameter at installation time or, if you already deployed STF, complete the following steps:
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the
service-telemetry
namespace:$ oc project service-telemetry
Use the oc command to edit the ServiceTelemetry object:
$ oc edit stf default
Add
highAvailability.enabled: true
to thespec
section:apiVersion: infra.watch/v1beta1 kind: ServiceTelemetry ... spec: ... highAvailability: enabled: true
- Save your changes and close the object.
5.6. Ephemeral storage
You can use ephemeral storage to run Service Telemetry Framework (STF) without persistently storing data in your Red Hat OpenShift Container Platform cluster.
If you use ephemeral storage, you might experience data loss if a pod is restarted, updated, or rescheduled onto another node. Use ephemeral storage only for development or testing, and not production environments.
5.6.1. Configuring ephemeral storage
To configure STF components for ephemeral storage, add ...storage.strategy: ephemeral
to the corresponding parameter. For example, to enable ephemeral storage for the Prometheus back end, set backends.metrics.prometheus.storage.strategy: ephemeral
. Components that support configuration of ephemeral storage include alerting.alertmanager
, backends.metrics.prometheus
, and backends.events.elasticsearch
. You can add ephemeral storage configuration at installation time or, if you already deployed STF, complete the following steps:
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the
service-telemetry
namespace:$ oc project service-telemetry
Edit the ServiceTelemetry object:
$ oc edit stf default
Add the
...storage.strategy: ephemeral
parameter to thespec
section of the relevant component:apiVersion: infra.watch/v1beta1 kind: ServiceTelemetry metadata: name: stf-default namespace: service-telemetry spec: alerting: enabled: true alertmanager: storage: strategy: ephemeral backends: metrics: prometheus: enabled: true storage: strategy: ephemeral events: elasticsearch: enabled: true storage: strategy: ephemeral
- Save your changes and close the object.
5.7. Creating a route in Red Hat OpenShift Container Platform
In Red Hat OpenShift Container Platform, you can expose applications to the external network through a route. For more information, see Configuring ingress cluster traffic.
In Service Telemetry Framework (STF), routes are not exposed by default to limit the attack surface of STF deployments. To access some services deployed in STF, you must expose the services in Red Hat OpenShift Container Platform for access.
A common service to expose in STF is Prometheus, as shown in the following example:
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the
service-telemetry
namespace:$ oc project service-telemetry
List the available services in the
service-telemetry
project:$ oc get services NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 93m default-cloud1-ceil-meter-smartgateway ClusterIP 172.30.114.195 <none> 8081/TCP 93m default-cloud1-coll-meter-smartgateway ClusterIP 172.30.133.180 <none> 8081/TCP 93m default-interconnect ClusterIP 172.30.3.241 <none> 5672/TCP,8672/TCP,55671/TCP,5671/TCP,5673/TCP 93m ibm-auditlogging-operator-metrics ClusterIP 172.30.216.249 <none> 8383/TCP,8686/TCP 11h prometheus-operated ClusterIP None <none> 9090/TCP 93m service-telemetry-operator-metrics ClusterIP 172.30.11.66 <none> 8383/TCP,8686/TCP 11h smart-gateway-operator-metrics ClusterIP 172.30.145.199 <none> 8383/TCP,8686/TCP 11h
-
Take note of the port and service name that you want to expose as a route, for example, service
prometheus-operated
and port9090
. Expose the
prometheus-operated
service as an edge route and redirect insecure traffic to the secure endpoint of port9090
:$ oc create route edge metrics-store --service=prometheus-operated --insecure-policy="Redirect" --port=9090 route.route.openshift.io/metrics-store created
To verify and find the exposed external DNS for the route, use the
oc get route
command:$ oc get route metrics-store -ogo-template='{{.spec.host}}' metrics-store-service-telemetry.apps.infra.watch
The
prometheus-operated
service is now available at the exposed DNS address, for example, https://metrics-store-service-telemetry.apps.infra.watchNoteThe address of the route must be resolvable and configuration is environment specific.
Additional resources
- For more information about Red Hat OpenShift Container Platform networking, see Understanding networking
- For more information about route configuration, see Route configuration
- For more information about ingress cluster traffic, see Configuring ingress cluster traffic overview
5.8. Resource usage of Red Hat OpenStack Platform services
You can monitor the resource usage of the Red Hat OpenStack Platform (RHOSP) services, such as the APIs and other infrastructure processes, to identify bottlenecks in the overcloud by showing services that run out of compute power. Resource usage monitoring is enabled by default.
Additional resources
- To disable resource usage monitoring, see Section 5.8.1, “Disabling resource usage monitoring of Red Hat OpenStack Platform services”.
5.8.1. Disabling resource usage monitoring of Red Hat OpenStack Platform services
To disable the monitoring of RHOSP containerized service resource usage, you must set the CollectdEnableLibpodstats
parameter to false
.
Prerequisites
-
You have created the
stf-connectors.yaml
file. For more information, see Section 4.1, “Deploying Red Hat OpenStack Platform overcloud for Service Telemetry Framework”. - You are using the most current version of Red Hat OpenStack Platform (RHOSP) 16.2.
Procedure
Open the
stf-connectors.yaml
file and add theCollectdEnableLibpodstats
parameter to override the setting inenable-stf.yaml
. Ensure thatstf-connectors.yaml
is called from theopenstack overcloud deploy
command afterenable-stf.yaml
:CollectdEnableLibpodstats: false
- Continue with the overcloud deployment procedure. For more information, see Section 4.1.4, “Deploying the overcloud”.
5.9. Red Hat OpenStack Platform API status and containerized services health
You can use the OCI (Open Container Initiative) standard to assess the container health status of each Red Hat OpenStack Platform (RHOSP) service by periodically running a health check script. Most RHOSP services implement a health check that logs issues and returns a binary status. For the RHOSP APIs, the health checks query the root endpoint and determine the health based on the response time.
Monitoring of RHOSP container health and API status is enabled by default.
Additional resources
- To disable RHOSP container health and API status monitoring, see Section 5.9.1, “Disabling container health and API status monitoring”.
5.9.1. Disabling container health and API status monitoring
To disable RHOSP containerized service health and API status monitoring, you must set the CollectdEnableSensubility
parameter to false
.
Prerequisites
-
You have created the
stf-connectors.yaml
file in your templates directory. For more information, see Section 4.1, “Deploying Red Hat OpenStack Platform overcloud for Service Telemetry Framework”. - You are using the most current version of Red Hat OpenStack Platform (RHOSP) 16.2.
Procedure
Open the
stf-connectors.yaml
and add theCollectdEnableSensubility
parameter to override the setting inenable-stf.yaml
. Ensure thatstf-connectors.yaml
is called from theopenstack overcloud deploy
command afterenable-stf.yaml
:CollectdEnableSensubility: false
- Continue with the overcloud deployment procedure. For more information, see Section 4.1.4, “Deploying the overcloud”.
Additional resources
- For more information about multiple cloud addresses, see Section 4.4, “Configuring multiple clouds”.