Chapter 3. Installing the core components of Service Telemetry Framework

You can use Operators to load the Service Telemetry Framework (STF) components and objects. Operators manage each of the following STF core components:

  • Certificate Management
  • AMQ Interconnect
  • Smart Gateways
  • Prometheus and Alertmanager

Service Telemetry Framework (STF) uses other supporting Operators as part of the deployment. STF can resolve most dependencies automatically, but you need to pre-install some Operators, such as Cluster Observability Operator, which provides an instance of Prometheus and Alertmanager, and cert-manager for Red Hat OpenShift, which provides management of certificates.

Prerequisites

  • An Red Hat OpenShift Container Platform Extended Update Support (EUS) release version 4.12 or 4.14 is running.
  • You have prepared your Red Hat OpenShift Container Platform environment and ensured that there is persistent storage and enough resources to run the STF components on top of the Red Hat OpenShift Container Platform environment. For more information about STF performance, see the Red Hat Knowledge Base article Service Telemetry Framework Performance and Scaling.
  • You have deployed STF in a fully connected or Red Hat OpenShift Container Platform-disconnected environments. STF is unavailable in network proxy environments.
Important

STF is compatible with Red Hat OpenShift Container Platform versions 4.12 and 4.14.

Additional resources

3.1. Deploying Service Telemetry Framework to the Red Hat OpenShift Container Platform environment

Deploy Service Telemetry Framework (STF) to collect and store Red Hat OpenStack Platform (RHOSP) telemetry.

3.1.1. Deploying Cluster Observability Operator

You must install the Cluster Observability Operator (COO) before you create an instance of Service Telemetry Framework (STF) if the observabilityStrategy is set to use_redhat and the backends.metrics.prometheus.enabled is set to true in the ServiceTelemetry object. For more information about COO, see Cluster Observability Operator overview in the OpenShift Container Platform Documentation.

Procedure

  1. Log in to your Red Hat OpenShift Container Platform environment where STF is hosted.
  2. To store metrics in Prometheus, enable the Cluster Observability Operator by using the redhat-operators CatalogSource:

    $ oc create -f - <<EOF
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: cluster-observability-operator
      namespace: openshift-operators
    spec:
      channel: development
      installPlanApproval: Automatic
      name: cluster-observability-operator
      source: redhat-operators
      sourceNamespace: openshift-marketplace
    EOF
  3. Verify that the ClusterServiceVersion for Cluster Observability Operator has a status of Succeeded:

    $ oc wait --for jsonpath="{.status.phase}"=Succeeded csv --namespace=openshift-operators -l operators.coreos.com/cluster-observability-operator.openshift-operators
    
    clusterserviceversion.operators.coreos.com/observability-operator.v0.0.26 condition met

3.1.2. Deploying cert-manager for Red Hat OpenShift

The cert-manager for Red Hat OpenShift (cert-manager) Operator must be pre-installed before creating an instance of Service Telemetry Framework (STF). For more information about cert-manager, see cert-manager for Red Hat OpenShift overview.

In previous versions of STF, the only available cert-manager channel was tech-preview which is available until Red Hat OpenShift Container Platform v4.12. Installations of cert-manager on versions of Red Hat OpenShift Container Platform v4.14 and later must be installed from the stable-v1 channel. For new installations of STF it is recommended to install cert-manager from the stable-v1 channel.

Warning

Only one deployment of cert-manager can be installed per Red Hat OpenShift Container Platform cluster. Subscribing to cert-manager in more than one project causes the deployments to conflict with each other.

Procedure

  1. Log in to your Red Hat OpenShift Container Platform environment where STF is hosted.
  2. Verify cert-manager is not already installed on the Red Hat OpenShift Container Platform cluster. If any results are returned, do not install another instance of cert-manager:

    $ oc get sub --all-namespaces -o json | jq '.items[] | select(.metadata.name | match("cert-manager")) | .metadata.name'
  3. Create a namespace for the cert-manager Operator:

    $ oc create -f - <<EOF
    apiVersion: project.openshift.io/v1
    kind: Project
    metadata:
      name: cert-manager-operator
    spec:
      finalizers:
      - kubernetes
    EOF
  4. Create an OperatorGroup for the cert-manager Operator:

    $ oc create -f - <<EOF
    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
      name: cert-manager-operator
      namespace: cert-manager-operator
    spec:
      targetNamespaces:
      - cert-manager-operator
      upgradeStrategy: Default
    EOF
  5. Subscribe to the cert-manager Operator by using the redhat-operators CatalogSource:

    $ oc create -f - <<EOF
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: openshift-cert-manager-operator
      namespace: cert-manager-operator
      labels:
        operators.coreos.com/openshift-cert-manager-operator.cert-manager-operator: ""
    spec:
      channel: stable-v1
      installPlanApproval: Automatic
      name: openshift-cert-manager-operator
      source: redhat-operators
      sourceNamespace: openshift-marketplace
    EOF
  6. Validate your ClusterServiceVersion. Ensure that cert-manager Operator displays a phase of Succeeded:

    oc wait --for jsonpath="{.status.phase}"=Succeeded csv --namespace=cert-manager-operator --selector=operators.coreos.com/openshift-cert-manager-operator.cert-manager-operator
    
    clusterserviceversion.operators.coreos.com/cert-manager-operator.v1.12.1 condition met

3.1.3. Deploying Service Telemetry Operator

Deploy Service Telemetry Operator on Red Hat OpenShift Container Platform to provide the supporting Operators and interface for creating an instance of Service Telemetry Framework (STF) to monitor Red Hat OpenStack Platform (RHOSP) cloud platforms.

Prerequisites

Procedure

  1. Create a namespace to contain the STF components, for example, service-telemetry:

    $ oc new-project service-telemetry
  2. Create an OperatorGroup in the namespace so that you can schedule the Operator pods:

    $ oc create -f - <<EOF
    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
      name: service-telemetry-operator-group
      namespace: service-telemetry
    spec:
      targetNamespaces:
      - service-telemetry
    EOF

    For more information, see OperatorGroups.

  3. Create the Service Telemetry Operator subscription to manage the STF instances:

    $ oc create -f - <<EOF
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: service-telemetry-operator
      namespace: service-telemetry
    spec:
      channel: stable-1.5
      installPlanApproval: Automatic
      name: service-telemetry-operator
      source: redhat-operators
      sourceNamespace: openshift-marketplace
    EOF
  4. Validate the Service Telemetry Operator and the dependent operators have their phase as Succeeded:

    $ oc wait --for jsonpath="{.status.phase}"=Succeeded csv --namespace=service-telemetry -l operators.coreos.com/service-telemetry-operator.service-telemetry ; oc get csv --namespace service-telemetry
    
    clusterserviceversion.operators.coreos.com/service-telemetry-operator.v1.5.1700688542 condition met
    
    NAME                                         DISPLAY                                  VERSION          REPLACES                             PHASE
    amq7-interconnect-operator.v1.10.17          Red Hat Integration - AMQ Interconnect   1.10.17          amq7-interconnect-operator.v1.10.4   Succeeded
    observability-operator.v0.0.26               Cluster Observability Operator           0.1.0                                                 Succeeded
    service-telemetry-operator.v1.5.1700688542   Service Telemetry Operator               1.5.1700688542                                        Succeeded
    smart-gateway-operator.v5.0.1700688539       Smart Gateway Operator                   5.0.1700688539                                        Succeeded

3.2. Creating a ServiceTelemetry object in Red Hat OpenShift Container Platform

Create a ServiceTelemetry object in Red Hat OpenShift Container Platform to result in the Service Telemetry Operator creating the supporting components for a Service Telemetry Framework (STF) deployment. For more information, see Section 3.2.1, “Primary parameters of the ServiceTelemetry object”.

Prerequisites

Procedure

  1. Log in to your Red Hat OpenShift Container Platform environment where STF is hosted.
  2. To deploy STF that results in the core components for metrics delivery being configured, create a ServiceTelemetry object:

    $ oc apply -f - <<EOF
    apiVersion: infra.watch/v1beta1
    kind: ServiceTelemetry
    metadata:
      name: default
      namespace: service-telemetry
    spec:
      alerting:
        alertmanager:
          storage:
            persistent:
              pvcStorageRequest: 20G
            strategy: persistent
        enabled: true
      backends:
        metrics:
          prometheus:
            enabled: true
            scrapeInterval: 30s
            storage:
              persistent:
                pvcStorageRequest: 20G
              retention: 24h
              strategy: persistent
      clouds:
      - metrics:
          collectors:
          - bridge:
              ringBufferCount: 15000
              ringBufferSize: 16384
              verbose: false
            collectorType: collectd
            debugEnabled: false
            subscriptionAddress: collectd/cloud1-telemetry
          - bridge:
              ringBufferCount: 15000
              ringBufferSize: 16384
              verbose: false
            collectorType: ceilometer
            debugEnabled: false
            subscriptionAddress: anycast/ceilometer/cloud1-metering.sample
          - bridge:
              ringBufferCount: 15000
              ringBufferSize: 65535
              verbose: false
            collectorType: sensubility
            debugEnabled: false
            subscriptionAddress: sensubility/cloud1-telemetry
        name: cloud1
      observabilityStrategy: use_redhat
      transports:
        qdr:
          auth: basic
          certificates:
            caCertDuration: 70080h
            endpointCertDuration: 70080h
          enabled: true
          web:
            enabled: false
    EOF

    To override these defaults, add the configuration to the spec parameter.

  3. View the STF deployment logs in the Service Telemetry Operator:

    $ oc logs --selector name=service-telemetry-operator
    
    ...
    --------------------------- Ansible Task Status Event StdOut  -----------------
    
    PLAY RECAP *********************************************************************
    localhost                  : ok=90   changed=0    unreachable=0    failed=0    skipped=26   rescued=0    ignored=0

Verification

  • To determine that all workloads are operating correctly, view the pods and the status of each pod.

    $ oc get pods
    
    NAME                                                        READY   STATUS    RESTARTS   AGE
    alertmanager-default-0                                      3/3     Running   0          123m
    default-cloud1-ceil-meter-smartgateway-7dfb95fcb6-bs6jl     3/3     Running   0          122m
    default-cloud1-coll-meter-smartgateway-674d88d8fc-858jk     3/3     Running   0          122m
    default-cloud1-sens-meter-smartgateway-9b869695d-xcssf      3/3     Running   0          122m
    default-interconnect-6cbf65d797-hk7l6                       1/1     Running   0          123m
    interconnect-operator-7bb99c5ff4-l6xc2                      1/1     Running   0          138m
    prometheus-default-0                                        3/3     Running   0          122m
    service-telemetry-operator-7966cf57f-g4tx4                  1/1     Running   0          138m
    smart-gateway-operator-7d557cb7b7-9ppls                     1/1     Running   0          138m

3.2.1. Primary parameters of the ServiceTelemetry object

You can set the following primary configuration parameters of the ServiceTelemetry object to configure your STF deployment:

  • alerting
  • backends
  • clouds
  • graphing
  • highAvailability
  • transports
The backends parameter

Set the value of the backends parameter to allocate the storage back ends for metrics and events, and to enable the Smart Gateways that the clouds parameter defines. For more information, see the section called “The clouds parameter”.

You can use Prometheus as the metrics storage back end and Elasticsearch as the events storage back end. The Service Telemetry Operator can create custom resource objects that the Prometheus Operator watches to create a Prometheus workload. You need an external deployment of Elasticsearch to store events.

Enabling Prometheus as a storage back end for metrics

To enable Prometheus as a storage back end for metrics, you must configure the ServiceTelemetry object.

Procedure

  1. Edit the ServiceTelemetry object:

    $ oc edit stf default
  2. Set the value of the backends.metrics.prometheus.enabled parameter to true:

    apiVersion: infra.watch/v1beta1
    kind: ServiceTelemetry
    metadata:
      name: default
      namespace: service-telemetry
    spec:
      [...]
      backends:
        metrics:
          prometheus:
            enabled: true
Configuring persistent storage for Prometheus

Set the additional parameters in backends.metrics.prometheus.storage.persistent to configure persistent storage options for Prometheus, such as storage class and volume size.

Define the back end storage class with the storageClass parameter. If you do not set this parameter, the Service Telemetry Operator uses the default storage class for the Red Hat OpenShift Container Platform cluster.

Define the minimum required volume size for the storage request with the pvcStorageRequest parameter. By default, Service Telemetry Operator requests a volume size of 20G (20 Gigabytes).

Procedure

  1. List the available storage classes:

    $ oc get storageclasses
    NAME                 PROVISIONER                RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
    csi-manila-ceph      manila.csi.openstack.org   Delete          Immediate              false                  20h
    standard (default)   kubernetes.io/cinder       Delete          WaitForFirstConsumer   true                   20h
    standard-csi         cinder.csi.openstack.org   Delete          WaitForFirstConsumer   true                   20h
  2. Edit the ServiceTelemetry object:

    $ oc edit stf default
  3. Set the value of the backends.metrics.prometheus.enabled parameter to true and the value of backends.metrics.prometheus.storage.strategy to persistent:

    apiVersion: infra.watch/v1beta1
    kind: ServiceTelemetry
    metadata:
      name: default
      namespace: service-telemetry
    spec:
      [...]
      backends:
        metrics:
          prometheus:
            enabled: true
            storage:
              strategy: persistent
              persistent:
                storageClass: standard-csi
                pvcStorageRequest: 50G
Enabling Elasticsearch as a storage back end for events
Note

Previous versions of STF managed Elasticsearch objects for the community supported Elastic Cloud on Kubernetes Operator (ECK). Elasticsearch management functionality is deprecated in STF 1.5.3. You can still forward to an existing Elasticsearch instance that you deploy and manage with ECK, but you cannot manage the creation of Elasticsearch objects. When you upgrade your STF deployment, existing Elasticsearch objects and deployments remain, but are no longer managed by STF.

For more information about using Elasticsearch with STF, see the Red Hat Knowledge Base article Using Service Telemetry Framework with Elasticsearch.

To enable events forwarding to Elasticsearch as a storage back end, you must configure the ServiceTelemetry object.

Procedure

  1. Edit the ServiceTelemetry object:

    $ oc edit stf default
  2. Set the value of the backends.events.elasticsearch.enabled parameter to true and configure the hostUrl with the relevant Elasticsearch instance:

    apiVersion: infra.watch/v1beta1
    kind: ServiceTelemetry
    metadata:
      name: default
      namespace: service-telemetry
    spec:
      [...]
      backends:
        events:
          elasticsearch:
            enabled: true
            forwarding:
              hostUrl: https://external-elastic-http.domain:9200
              tlsServerName: ""
              tlsSecretName: elasticsearch-es-cert
              userSecretName: elasticsearch-es-elastic-user
              useBasicAuth: true
              useTls: true
  3. Create the secret named in the userSecretName parameter to store the basic auth credentials

    $ oc create secret generic elasticsearch-es-elastic-user --from-literal=elastic='<PASSWORD>'
  4. Copy the CA certificate into a file named EXTERNAL-ES-CA.pem, then create the secret named in the tlsSecretName parameter to make it available to STF

    $ cat EXTERNAL-ES-CA.pem
    -----BEGIN CERTIFICATE-----
    [...]
    -----END CERTIFICATE-----
    
    $ oc create secret generic elasticsearch-es-cert --from-file=ca.crt=EXTERNAL-ES-CA.pem
The clouds parameter

Configure the clouds parameter to define which Smart Gateway objects deploy and provide the interface for monitored cloud environments to connect to an instance of STF. If a supporting back end is available, metrics and events Smart Gateways for the default cloud configuration are created. By default, the Service Telemetry Operator creates Smart Gateways for cloud1.

You can create a list of cloud objects to control which Smart Gateways are created for the defined clouds. Each cloud consists of data types and collectors. Data types are metrics or events. Each data type consists of a list of collectors, the message bus subscription address, and a parameter to enable debugging. Available collectors for metrics are collectd, ceilometer, and sensubility. Available collectors for events are collectd and ceilometer. Ensure that the subscription address for each of these collectors is unique for every cloud, data type, and collector combination.

The default cloud1 configuration is represented by the following ServiceTelemetry object, which provides subscriptions and data storage of metrics and events for collectd, Ceilometer, and Sensubility data collectors for a particular cloud instance:

apiVersion: infra.watch/v1beta1
kind: ServiceTelemetry
metadata:
  name: default
  namespace: service-telemetry
spec:
  clouds:
    - name: cloud1
      metrics:
        collectors:
          - collectorType: collectd
            subscriptionAddress: collectd/cloud1-telemetry
          - collectorType: ceilometer
            subscriptionAddress: anycast/ceilometer/cloud1-metering.sample
          - collectorType: sensubility
            subscriptionAddress: sensubility/cloud1-telemetry
            debugEnabled: false
      events:
        collectors:
          - collectorType: collectd
            subscriptionAddress: collectd/cloud1-notify
          - collectorType: ceilometer
            subscriptionAddress: anycast/ceilometer/cloud1-event.sample

Each item of the clouds parameter represents a cloud instance. A cloud instance consists of three top-level parameters: name, metrics, and events. The metrics and events parameters represent the corresponding back end for storage of that data type. The collectors parameter specifies a list of objects made up of two required parameters, collectorType and subscriptionAddress, and these represent an instance of the Smart Gateway. The collectorType parameter specifies data collected by either collectd, Ceilometer, or Sensubility. The subscriptionAddress parameter provides the AMQ Interconnect address to which a Smart Gateway subscribes.

You can use the optional Boolean parameter debugEnabled within the collectors parameter to enable additional console debugging in the running Smart Gateway pod.

Additional resources

The alerting parameter

Set the alerting parameter to create an Alertmanager instance and a storage back end. By default, alerting is enabled. For more information, see Section 6.3, “Alerts in Service Telemetry Framework”.

The graphing parameter

Set the graphing parameter to create a Grafana instance. By default, graphing is disabled. For more information, see Section 6.1, “Dashboards in Service Telemetry Framework”.

The highAvailability parameter
Warning

STF high availability (HA) mode is deprecated and is not supported in production environments. Red Hat OpenShift Container Platform is a highly-available platform, and you can cause issues and complicate debugging in STF if you enable HA mode.

Set the highAvailability parameter to instantiate multiple copies of STF components to reduce recovery time of components that fail or are rescheduled. By default, highAvailability is disabled. For more information, see Section 6.5, “High availability”.

The transports parameter

Set the transports parameter to enable the message bus for a STF deployment. The only transport currently supported is AMQ Interconnect. By default, the qdr transport is enabled.

3.3. Accessing user interfaces for STF components

In Red Hat OpenShift Container Platform, applications are exposed to the external network through a route. For more information about routes, see Configuring ingress cluster traffic.

In Service Telemetry Framework (STF), HTTPS routes are exposed for each service that has a web-based interface and protected by Red Hat OpenShift Container Platform role-based access control (RBAC).

You need the following permissions to access the corresponding component UI’s:

{"namespace":"service-telemetry", "resource":"grafana", "group":"grafana.integreatly.org", "verb":"get"}
{"namespace":"service-telemetry", "resource":"prometheus", "group":"monitoring.rhobs", "verb":"get"}
{"namespace":"service-telemetry", "resource":"alertmanager", "group":"monitoring.rhobs", "verb":"get"}

For more information about RBAC, see Using RBAC to define and apply permissions.

Procedure

  1. Log in to Red Hat OpenShift Container Platform.
  2. Change to the service-telemetry namespace:

    $ oc project service-telemetry
  3. List the available web UI routes in the service-telemetry project:

    $ oc get routes | grep web
    default-alertmanager-proxy   default-alertmanager-proxy-service-telemetry.apps.infra.watch          default-alertmanager-proxy   web     reencrypt/Redirect   None
    default-prometheus-proxy     default-prometheus-proxy-service-telemetry.apps.infra.watch            default-prometheus-proxy     web     reencrypt/Redirect   None
  4. In a web browser, navigate to https://<route_address> to access the web interface for the corresponding service.

3.4. Configuring an alternate observability strategy

To skip the deployment of storage, visualization, and alerting backends, add observabilityStrategy: none to the ServiceTelemetry spec. In this mode, you only deploy AMQ Interconnect routers and Smart Gateways, and you must configure an external Prometheus-compatible system to collect metrics from the STF Smart Gateways, and an external Elasticsearch to receive the forwarded events.

Procedure

  1. Create a ServiceTelemetry object with the property observabilityStrategy: none in the spec parameter. The manifest shows results in a default deployment of STF that is suitable for receiving telemetry from a single cloud with all metrics collector types.

    $ oc apply -f - <<EOF
    apiVersion: infra.watch/v1beta1
    kind: ServiceTelemetry
    metadata:
      name: default
      namespace: service-telemetry
    spec:
      observabilityStrategy: none
    EOF
  2. Delete the remaining objects that are managed by community operators

    $ for o in alertmanagers.monitoring.rhobs/default prometheuses.monitoring.rhobs/default elasticsearch/elasticsearch grafana/default-grafana; do oc delete $o; done
  3. To verify that all workloads are operating correctly, view the pods and the status of each pod:

    $ oc get pods
    NAME                                                      READY   STATUS    RESTARTS   AGE
    default-cloud1-ceil-event-smartgateway-6f8547df6c-p2db5   3/3     Running   0          132m
    default-cloud1-ceil-meter-smartgateway-59c845d65b-gzhcs   3/3     Running   0          132m
    default-cloud1-coll-event-smartgateway-bf859f8d77-tzb66   3/3     Running   0          132m
    default-cloud1-coll-meter-smartgateway-75bbd948b9-d5phm   3/3     Running   0          132m
    default-cloud1-sens-meter-smartgateway-7fdbb57b6d-dh2g9   3/3     Running   0          132m
    default-interconnect-668d5bbcd6-57b2l                     1/1     Running   0          132m
    interconnect-operator-b8f5bb647-tlp5t                     1/1     Running   0          47h
    service-telemetry-operator-566b9dd695-wkvjq               1/1     Running   0          156m
    smart-gateway-operator-58d77dcf7-6xsq7                    1/1     Running   0          47h

Additional resources