Chapter 3. Installing the core components of Service Telemetry Framework
You can use Operators to load the Service Telemetry Framework (STF) components and objects. Operators manage each of the following STF core and community components:
- cert-manager
- AMQ Interconnect
- Smart Gateway
- Prometheus and AlertManager
- Elasticsearch
- Grafana
Prerequisites
- An Red Hat OpenShift Container Platform version inclusive of 4.10 through 4.12 is running.
- You have prepared your Red Hat OpenShift Container Platform environment and ensured that there is persistent storage and enough resources to run the STF components on top of the Red Hat OpenShift Container Platform environment. For more information, see Service Telemetry Framework Performance and Scaling.
- Your environment is fully connected. STF does not work in a Red Hat OpenShift Container Platform-disconnected environments or network proxy environments.
STF is compatible with Red Hat OpenShift Container Platform version 4.10 through 4.12.
Additional resources
- For more information about Operators, see the Understanding Operators guide.
- For more information about Operator catalogs, see Red Hat-provided Operator catalogs.
3.1. Deploying Service Telemetry Framework to the Red Hat OpenShift Container Platform environment
Deploy Service Telemetry Framework (STF) to collect, store, and monitor events:
Procedure
Create a namespace to contain the STF components, for example,
service-telemetry
:$ oc new-project service-telemetry
Create an OperatorGroup in the namespace so that you can schedule the Operator pods:
$ oc create -f - <<EOF apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: service-telemetry-operator-group namespace: service-telemetry spec: targetNamespaces: - service-telemetry EOF
For more information, see OperatorGroups.
Create a namespace for the cert-manager Operator:
$ oc create -f - <<EOF apiVersion: project.openshift.io/v1 kind: Project metadata: name: openshift-cert-manager-operator spec: finalizers: - kubernetes EOF
Create an OperatorGroup for the cert-manager Operator:
$ oc create -f - <<EOF apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: openshift-cert-manager-operator namespace: openshift-cert-manager-operator spec: {} EOF
Subscribe to the cert-manager Operator by using the redhat-operators CatalogSource:
$ oc create -f - <<EOF apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: openshift-cert-manager-operator namespace: openshift-cert-manager-operator spec: channel: tech-preview installPlanApproval: Automatic name: openshift-cert-manager-operator source: redhat-operators sourceNamespace: openshift-marketplace EOF
Validate your ClusterServiceVersion. Ensure that cert-manager Operator displays a phase of
Succeeded
:$ oc get csv --namespace openshift-cert-manager-operator --selector=operators.coreos.com/openshift-cert-manager-operator.openshift-cert-manager-operator NAME DISPLAY VERSION REPLACES PHASE openshift-cert-manager.v1.7.1 cert-manager Operator for Red Hat OpenShift 1.7.1-1 Succeeded
Subscribe to the AMQ Interconnect Operator by using the redhat-operators CatalogSource:
$ oc create -f - <<EOF apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: amq7-interconnect-operator namespace: service-telemetry spec: channel: 1.10.x installPlanApproval: Automatic name: amq7-interconnect-operator source: redhat-operators sourceNamespace: openshift-marketplace EOF
Validate your ClusterServiceVersion. Ensure that amq7-interconnect-operator.v1.10.x displays a phase of
Succeeded
:$ oc get csv --selector=operators.coreos.com/amq7-interconnect-operator.service-telemetry NAME DISPLAY VERSION REPLACES PHASE amq7-interconnect-operator.v1.10.15 Red Hat Integration - AMQ Interconnect 1.10.15 amq7-interconnect-operator.v1.10.4 Succeeded
To store metrics in Prometheus, you must enable the Prometheus Operator by using the community-operators CatalogSource:
WarningCommunity Operators are Operators which have not been vetted or verified by Red Hat. Community Operators should be used with caution because their stability is unknown. Red Hat provides no support for community Operators.
Learn more about Red Hat’s third party software support policy
$ oc create -f - <<EOF apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: prometheus-operator namespace: service-telemetry spec: channel: beta installPlanApproval: Automatic name: prometheus source: community-operators sourceNamespace: openshift-marketplace EOF
Verify that the ClusterServiceVersion for Prometheus
Succeeded
:$ oc get csv --selector=operators.coreos.com/prometheus.service-telemetry NAME DISPLAY VERSION REPLACES PHASE prometheusoperator.0.56.3 Prometheus Operator 0.56.3 prometheusoperator.0.47.0 Succeeded
To store events in Elasticsearch, you must enable the Elastic Cloud on Kubernetes (ECK) Operator by using the certified-operators CatalogSource:
WarningCertified Operators are Operators from leading independent software vendors (ISVs). Red Hat partners with ISVs to package and ship, but not support, the certified Operators. Supported is provided by the ISV.
Learn more about Red Hat’s third party software support policy
$ oc create -f - <<EOF apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: elasticsearch-eck-operator-certified namespace: service-telemetry spec: channel: stable installPlanApproval: Automatic name: elasticsearch-eck-operator-certified source: certified-operators sourceNamespace: openshift-marketplace EOF
Verify that the ClusterServiceVersion for Elastic Cloud on Kubernetes
Succeeded
:$ oc get csv --selector=operators.coreos.com/elasticsearch-eck-operator-certified.service-telemetry NAME DISPLAY VERSION REPLACES PHASE elasticsearch-eck-operator-certified.v2.8.0 Elasticsearch (ECK) Operator 2.8.0 elasticsearch-eck-operator-certified.v2.7.0 Succeeded
Create the Service Telemetry Operator subscription to manage the STF instances:
$ oc create -f - <<EOF apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: service-telemetry-operator namespace: service-telemetry spec: channel: stable-1.5 installPlanApproval: Automatic name: service-telemetry-operator source: redhat-operators sourceNamespace: openshift-marketplace EOF
Validate the Service Telemetry Operator and the dependent operators have their phase as Succeeded:
$ oc get csv --namespace service-telemetry NAME DISPLAY VERSION REPLACES PHASE amq7-interconnect-operator.v1.10.15 Red Hat Integration - AMQ Interconnect 1.10.15 amq7-interconnect-operator.v1.10.4 Succeeded elasticsearch-eck-operator-certified.v2.8.0 Elasticsearch (ECK) Operator 2.8.0 elasticsearch-eck-operator-certified.v2.7.0 Succeeded openshift-cert-manager.v1.7.1 cert-manager Operator for Red Hat OpenShift 1.7.1-1 Succeeded prometheusoperator.0.56.3 Prometheus Operator 0.56.3 prometheusoperator.0.47.0 Succeeded service-telemetry-operator.v1.5.1680516659 Service Telemetry Operator 1.5.1680516659 Succeeded smart-gateway-operator.v5.0.1680516659 Smart Gateway Operator 5.0.1680516659 Succeeded
3.2. Creating a ServiceTelemetry object in Red Hat OpenShift Container Platform
Create a ServiceTelemetry
object in Red Hat OpenShift Container Platform to result in the Service Telemetry Operator creating the supporting components for a Service Telemetry Framework (STF) deployment. For more information, see Section 3.2.1, “Primary parameters of the ServiceTelemetry object”.
Procedure
To create a
ServiceTelemetry
object that results in an STF deployment that uses the default values, create aServiceTelemetry
object with an emptyspec
parameter:$ oc apply -f - <<EOF apiVersion: infra.watch/v1beta1 kind: ServiceTelemetry metadata: name: default namespace: service-telemetry spec: {} EOF
Creating a
ServiceTelemetry
object with an emptyspec
parameter results in an STF deployment with the following default settings:apiVersion: infra.watch/v1beta1 kind: ServiceTelemetry metadata: name: default namespace: service-telemetry spec: alerting: alertmanager: receivers: snmpTraps: alertOidLabel: oid community: public enabled: false port: 162 retries: 5 target: 192.168.24.254 timeout: 1 trapDefaultOid: 1.3.6.1.4.1.50495.15.1.2.1 trapDefaultSeverity: '' trapOidPrefix: 1.3.6.1.4.1.50495.15 storage: persistent: pvcStorageRequest: 20G strategy: persistent enabled: true backends: events: elasticsearch: certificates: caCertDuration: 70080h endpointCertDuration: 70080h storage: persistent: pvcStorageRequest: 20Gi strategy: persistent enabled: false version: 7.16.1 logs: loki: storage: objectStorageSecret: test storageClass: standard enabled: false flavor: 1x.extra-small replicationFactor: 1 metrics: prometheus: storage: persistent: pvcStorageRequest: 20G retention: 24h strategy: persistent enabled: true scrapeInterval: 10s clouds: - events: collectors: - bridge: ringBufferCount: 15000 ringBufferSize: 16384 verbose: false collectorType: collectd debugEnabled: false subscriptionAddress: collectd/cloud1-notify - bridge: ringBufferCount: 15000 ringBufferSize: 16384 verbose: false collectorType: ceilometer debugEnabled: false subscriptionAddress: anycast/ceilometer/cloud1-event.sample metrics: collectors: - bridge: ringBufferCount: 15000 ringBufferSize: 16384 verbose: false collectorType: collectd debugEnabled: false subscriptionAddress: collectd/cloud1-telemetry - bridge: ringBufferCount: 15000 ringBufferSize: 16384 verbose: false collectorType: ceilometer debugEnabled: false subscriptionAddress: anycast/ceilometer/cloud1-metering.sample - bridge: ringBufferCount: 15000 ringBufferSize: 16384 verbose: false collectorType: sensubility debugEnabled: false subscriptionAddress: sensubility/cloud1-telemetry name: cloud1 graphing: grafana: adminPassword: secret adminUser: root disableSignoutMenu: false ingressEnabled: false enabled: false highAvailability: enabled: false transports: qdr: certificates: caCertDuration: 70080h endpointCertDuration: 70080h web: enabled: false enabled: true observabilityStrategy: use_community
To override these defaults, add the configuration to the
spec
parameter.View the STF deployment logs in the Service Telemetry Operator:
$ oc logs --selector name=service-telemetry-operator ... --------------------------- Ansible Task Status Event StdOut ----------------- PLAY RECAP ********************************************************************* localhost : ok=90 changed=0 unreachable=0 failed=0 skipped=26 rescued=0 ignored=0
Verification
To determine that all workloads are operating correctly, view the pods and the status of each pod.
NoteIf you set the
backends.events.elasticsearch.enabled
parameter totrue
, the notification Smart Gateways reportError
andCrashLoopBackOff
error messages for a period of time before Elasticsearch starts.$ oc get pods NAME READY STATUS RESTARTS AGE alertmanager-default-0 3/3 Running 0 4m7s default-cloud1-ceil-meter-smartgateway-669c6cdcf9-xvdvx 3/3 Running 0 3m46s default-cloud1-coll-meter-smartgateway-585855c59d-858rf 3/3 Running 0 3m46s default-cloud1-sens-meter-smartgateway-6f8dffb645-hhgkw 3/3 Running 0 3m46s default-interconnect-6994ff546-fx7jn 1/1 Running 0 4m18s elastic-operator-9f44cdf6c-csvjq 1/1 Running 0 19m interconnect-operator-646bfc886c-gx55n 1/1 Running 0 25m prometheus-default-0 3/3 Running 0 3m33s prometheus-operator-54d644d8d7-wzdlh 1/1 Running 0 20m service-telemetry-operator-54f6f7b6d-nfhwx 1/1 Running 0 18m smart-gateway-operator-9bbd7c56c-76w67 1/1 Running 0 18m
3.2.1. Primary parameters of the ServiceTelemetry object
The ServiceTelemetry
object comprises the following primary configuration parameters:
-
alerting
-
backends
-
clouds
-
graphing
-
highAvailability
-
transports
You can configure each of these configuration parameters to provide different features in an STF deployment.
The backends parameter
Use the backends
parameter to control which storage back ends are available for storage of metrics and events, and to control the enablement of Smart Gateways that the clouds
parameter defines. For more information, see the section called “The clouds parameter”.
You can use Prometheus as the metrics storage back end and Elasticsearch as the events storage back end. You can use the Service Telemetry Operator to create other custom resource objects that the Prometheus Operator and Elastic Cloud on Kubernetes Operator watch to create Prometheus and Elasticsearch workloads.
Enabling Prometheus as a storage back end for metrics
To enable Prometheus as a storage back end for metrics, you must configure the ServiceTelemetry
object.
Procedure
Edit the
ServiceTelemetry
object:$ oc edit stf default
Set the value of the backends.metrics.prometheus.enabled parameter to
true
:apiVersion: infra.watch/v1beta1 kind: ServiceTelemetry metadata: name: default namespace: service-telemetry spec: [...] backends: metrics: prometheus: enabled: true
Configuring persistent storage for Prometheus
Use the additional parameters that are defined in backends.metrics.prometheus.storage.persistent
to configure persistent storage options for Prometheus, such as storage class and volume size.
Use storageClass
to define the back end storage class. If you do not set this parameter, the Service Telemetry Operator uses the default storage class for the Red Hat OpenShift Container Platform cluster.
Use the pvcStorageRequest
parameter to define the minimum required volume size to satisfy the storage request. If volumes are statically defined, it is possible that a volume size larger than requested is used. By default, Service Telemetry Operator requests a volume size of 20G
(20 Gigabytes).
Procedure
List the available storage classes:
$ oc get storageclasses NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE csi-manila-ceph manila.csi.openstack.org Delete Immediate false 20h standard (default) kubernetes.io/cinder Delete WaitForFirstConsumer true 20h standard-csi cinder.csi.openstack.org Delete WaitForFirstConsumer true 20h
Edit the
ServiceTelemetry
object:$ oc edit stf default
Set the value of the backends.metrics.prometheus.enabled parameter to
true
and the value of backends.metrics.prometheus.storage.strategy topersistent
:apiVersion: infra.watch/v1beta1 kind: ServiceTelemetry metadata: name: default namespace: service-telemetry spec: [...] backends: metrics: prometheus: enabled: true storage: strategy: persistent persistent: storageClass: standard-csi pvcStorageRequest: 50G
Enabling Elasticsearch as a storage back end for events
To enable Elasticsearch as a storage back end for events, you must configure the ServiceTelemetry
object.
Procedure
Edit the
ServiceTelemetry
object:$ oc edit stf default
Set the value of the backends.events.elasticsearch.enabled parameter to
true
:apiVersion: infra.watch/v1beta1 kind: ServiceTelemetry metadata: name: default namespace: service-telemetry spec: [...] backends: events: elasticsearch: enabled: true
Configuring persistent storage for Elasticsearch
Use the additional parameters defined in backends.events.elasticsearch.storage.persistent
to configure persistent storage options for Elasticsearch, such as storage class and volume size.
Use storageClass
to define the back end storage class. If you do not set this parameter, the Service Telemetry Operator uses the default storage class for the Red Hat OpenShift Container Platform cluster.
Use the pvcStorageRequest
parameter to define the minimum required volume size to satisfy the storage request. If volumes are statically defined, it is possible that a volume size larger than requested is used. By default, Service Telemetry Operator requests a volume size of 20Gi
(20 Gibibytes).
Procedure
List the available storage classes:
$ oc get storageclasses NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE csi-manila-ceph manila.csi.openstack.org Delete Immediate false 20h standard (default) kubernetes.io/cinder Delete WaitForFirstConsumer true 20h standard-csi cinder.csi.openstack.org Delete WaitForFirstConsumer true 20h
Edit the
ServiceTelemetry
object:$ oc edit stf default
Set the value of the backends.events.elasticsearch.enabled parameter to
true
and the value of backends.events.elasticsearch.storage.strategy topersistent
:apiVersion: infra.watch/v1beta1 kind: ServiceTelemetry metadata: name: default namespace: service-telemetry spec: [...] backends: events: elasticsearch: enabled: true version: 7.16.1 storage: strategy: persistent persistent: storageClass: standard-csi pvcStorageRequest: 50G
The clouds parameter
Use the clouds
parameter to define which Smart Gateway objects deploy, thereby providing the interface for multiple monitored cloud environments to connect to an instance of STF. If a supporting back end is available, then metrics and events Smart Gateways for the default cloud configuration are created. By default, the Service Telemetry Operator creates Smart Gateways for cloud1
.
You can create a list of cloud objects to control which Smart Gateways are created for the defined clouds. Each cloud consists of data types and collectors. Data types are metrics
or events
. Each data type consists of a list of collectors, the message bus subscription address, and a parameter to enable debugging. Available collectors for metrics are collectd
, ceilometer
, and sensubility
. Available collectors for events are collectd
and ceilometer
. Ensure that the subscription address for each of these collectors is unique for every cloud, data type, and collector combination.
The default cloud1
configuration is represented by the following ServiceTelemetry
object, which provides subscriptions and data storage of metrics and events for collectd, Ceilometer, and Sensubility data collectors for a particular cloud instance:
apiVersion: infra.watch/v1beta1 kind: ServiceTelemetry metadata: name: default namespace: service-telemetry spec: clouds: - name: cloud1 metrics: collectors: - collectorType: collectd subscriptionAddress: collectd/cloud1-telemetry - collectorType: ceilometer subscriptionAddress: anycast/ceilometer/cloud1-metering.sample - collectorType: sensubility subscriptionAddress: sensubility/cloud1-telemetry debugEnabled: false events: collectors: - collectorType: collectd subscriptionAddress: collectd/cloud1-notify - collectorType: ceilometer subscriptionAddress: anycast/ceilometer/cloud1-event.sample
Each item of the clouds
parameter represents a cloud instance. A cloud instance consists of three top-level parameters: name
, metrics
, and events
. The metrics
and events
parameters represent the corresponding back end for storage of that data type. The collectors
parameter specifies a list of objects made up of two required parameters, collectorType
and subscriptionAddress
, and these represent an instance of the Smart Gateway. The collectorType
parameter specifies data collected by either collectd, Ceilometer, or Sensubility. The subscriptionAddress
parameter provides the AMQ Interconnect address to which a Smart Gateway subscribes.
You can use the optional Boolean parameter debugEnabled
within the collectors
parameter to enable additional console debugging in the running Smart Gateway pod.
Additional resources
- For more information about deleting default Smart Gateways, see Section 4.3.3, “Deleting the default Smart Gateways”.
- For more information about how to configure multiple clouds, see Section 4.3, “Configuring multiple clouds”.
The alerting parameter
Use the alerting
parameter to control creation of an Alertmanager instance and the configuration of the storage back end. By default, alerting
is enabled. For more information, see Section 5.3, “Alerts in Service Telemetry Framework”.
The graphing parameter
Use the graphing
parameter to control the creation of a Grafana instance. By default, graphing
is disabled. For more information, see Section 5.1, “Dashboards in Service Telemetry Framework”.
The highAvailability parameter
Use the highAvailability
parameter to control the instantiation of multiple copies of STF components to reduce recovery time of components that fail or are rescheduled. By default, highAvailability
is disabled. For more information, see Section 5.6, “High availability”.
The transports parameter
Use the transports
parameter to control the enablement of the message bus for a STF deployment. The only transport currently supported is AMQ Interconnect. By default, the qdr
transport is enabled.
3.3. Accessing user interfaces for STF components
In Red Hat OpenShift Container Platform, applications are exposed to the external network through a route. For more information about routes, see Configuring ingress cluster traffic.
In Service Telemetry Framework (STF), HTTPS routes are exposed for each service that has a web-based interface. These routes are protected by Red Hat OpenShift Container Platform RBAC and any user that has a ClusterRoleBinding
that enables them to view Red Hat OpenShift Container Platform Namespaces can log in. For more information about RBAC, see Using RBAC to define and apply permissions.
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the
service-telemetry
namespace:$ oc project service-telemetry
List the available web UI routes in the
service-telemetry
project:$ oc get routes | grep web default-alertmanager-proxy default-alertmanager-proxy-service-telemetry.apps.infra.watch default-alertmanager-proxy web reencrypt/Redirect None default-prometheus-proxy default-prometheus-proxy-service-telemetry.apps.infra.watch default-prometheus-proxy web reencrypt/Redirect None
- In a web browser, navigate to https://<route_address> to access the web interface for the corresponding service.
3.4. Configuring an alternate observability strategy
To configure STF to skip the deployment of storage, visualization, and alerting backends, add observabilityStrategy: none
to the ServiceTelemetry spec. In this mode, only AMQ Interconnect routers and metrics Smart Gateways are deployed, and you must configure an external Prometheus-compatible system to collect metrics from the STF Smart Gateways.
Currently, only metrics are supported when you set observabilityStrategy
to none
. Events Smart Gateways are not deployed.
Procedure
Create a
ServiceTelemetry
object with the propertyobservabilityStrategy: none
in thespec
parameter. The manifest shows results in a default deployment of STF that is suitable for receiving telemetry from a single cloud with all metrics collector types.$ oc apply -f - <<EOF apiVersion: infra.watch/v1beta1 kind: ServiceTelemetry metadata: name: default namespace: service-telemetry spec: observabilityStrategy: none EOF
Delete the left over objects that are managed by community operators
$ for o in alertmanager/default prometheus/default elasticsearch/elasticsearch grafana/default; do oc delete $o; done
To verify that all workloads are operating correctly, view the pods and the status of each pod:
$ oc get pods NAME READY STATUS RESTARTS AGE default-cloud1-ceil-meter-smartgateway-59c845d65b-gzhcs 3/3 Running 0 132m default-cloud1-coll-meter-smartgateway-75bbd948b9-d5phm 3/3 Running 0 132m default-cloud1-sens-meter-smartgateway-7fdbb57b6d-dh2g9 3/3 Running 0 132m default-interconnect-668d5bbcd6-57b2l 1/1 Running 0 132m interconnect-operator-b8f5bb647-tlp5t 1/1 Running 0 47h service-telemetry-operator-566b9dd695-wkvjq 1/1 Running 0 156m smart-gateway-operator-58d77dcf7-6xsq7 1/1 Running 0 47h
Additional resources
For more information about configuring additional clouds or to change the set of supported collectors, see Section 4.3.2, “Deploying Smart Gateways”