Chapter 3. Installing the core components of Service Telemetry Framework

You can use Operators to load the Service Telemetry Framework (STF) components and objects. Operators manage each of the following STF core and community components:

  • cert-manager
  • AMQ Interconnect
  • Smart Gateway
  • Prometheus and AlertManager
  • Elasticsearch
  • Grafana

Prerequisites

  • An Red Hat OpenShift Container Platform version inclusive of 4.10 through 4.12 is running.
  • You have prepared your Red Hat OpenShift Container Platform environment and ensured that there is persistent storage and enough resources to run the STF components on top of the Red Hat OpenShift Container Platform environment. For more information, see Service Telemetry Framework Performance and Scaling.
  • Your environment is fully connected. STF does not work in a Red Hat OpenShift Container Platform-disconnected environments or network proxy environments.
Important

STF is compatible with Red Hat OpenShift Container Platform version 4.10 through 4.12.

Additional resources

3.1. Deploying Service Telemetry Framework to the Red Hat OpenShift Container Platform environment

Deploy Service Telemetry Framework (STF) to collect, store, and monitor events:

Procedure

  1. Create a namespace to contain the STF components, for example, service-telemetry:

    $ oc new-project service-telemetry
  2. Create an OperatorGroup in the namespace so that you can schedule the Operator pods:

    $ oc create -f - <<EOF
    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
      name: service-telemetry-operator-group
      namespace: service-telemetry
    spec:
      targetNamespaces:
      - service-telemetry
    EOF

    For more information, see OperatorGroups.

  3. Create a namespace for the cert-manager Operator:

    $ oc create -f - <<EOF
    apiVersion: project.openshift.io/v1
    kind: Project
    metadata:
      name: openshift-cert-manager-operator
    spec:
      finalizers:
      - kubernetes
    EOF
  4. Create an OperatorGroup for the cert-manager Operator:

    $ oc create -f - <<EOF
    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
      name: openshift-cert-manager-operator
      namespace: openshift-cert-manager-operator
    spec: {}
    EOF
  5. Subscribe to the cert-manager Operator by using the redhat-operators CatalogSource:

    $ oc create -f - <<EOF
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: openshift-cert-manager-operator
      namespace: openshift-cert-manager-operator
    spec:
      channel: tech-preview
      installPlanApproval: Automatic
      name: openshift-cert-manager-operator
      source: redhat-operators
      sourceNamespace: openshift-marketplace
    EOF
  6. Validate your ClusterServiceVersion. Ensure that cert-manager Operator displays a phase of Succeeded:

    $ oc get csv --namespace openshift-cert-manager-operator --selector=operators.coreos.com/openshift-cert-manager-operator.openshift-cert-manager-operator
    
    NAME                            DISPLAY                                       VERSION   REPLACES   PHASE
    openshift-cert-manager.v1.7.1   cert-manager Operator for Red Hat OpenShift   1.7.1-1              Succeeded
  7. Subscribe to the AMQ Interconnect Operator by using the redhat-operators CatalogSource:

    $ oc create -f - <<EOF
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: amq7-interconnect-operator
      namespace: service-telemetry
    spec:
      channel: 1.10.x
      installPlanApproval: Automatic
      name: amq7-interconnect-operator
      source: redhat-operators
      sourceNamespace: openshift-marketplace
    EOF
  8. Validate your ClusterServiceVersion. Ensure that amq7-interconnect-operator.v1.10.x displays a phase of Succeeded:

    $ oc get csv --selector=operators.coreos.com/amq7-interconnect-operator.service-telemetry
    
    NAME                                  DISPLAY                                  VERSION   REPLACES                             PHASE
    amq7-interconnect-operator.v1.10.15   Red Hat Integration - AMQ Interconnect   1.10.15   amq7-interconnect-operator.v1.10.4   Succeeded
  9. To store metrics in Prometheus, you must enable the Prometheus Operator by using the community-operators CatalogSource:

    Warning

    Community Operators are Operators which have not been vetted or verified by Red Hat. Community Operators should be used with caution because their stability is unknown. Red Hat provides no support for community Operators.

    Learn more about Red Hat’s third party software support policy

    $ oc create -f - <<EOF
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: prometheus-operator
      namespace: service-telemetry
    spec:
      channel: beta
      installPlanApproval: Automatic
      name: prometheus
      source: community-operators
      sourceNamespace: openshift-marketplace
    EOF
  10. Verify that the ClusterServiceVersion for Prometheus Succeeded:

    $ oc get csv --selector=operators.coreos.com/prometheus.service-telemetry
    
    NAME                        DISPLAY               VERSION   REPLACES                    PHASE
    prometheusoperator.0.56.3   Prometheus Operator   0.56.3    prometheusoperator.0.47.0   Succeeded
  11. To store events in Elasticsearch, you must enable the Elastic Cloud on Kubernetes (ECK) Operator by using the certified-operators CatalogSource:

    Warning

    Certified Operators are Operators from leading independent software vendors (ISVs). Red Hat partners with ISVs to package and ship, but not support, the certified Operators. Supported is provided by the ISV.

    Learn more about Red Hat’s third party software support policy

    $ oc create -f - <<EOF
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: elasticsearch-eck-operator-certified
      namespace: service-telemetry
    spec:
      channel: stable
      installPlanApproval: Automatic
      name: elasticsearch-eck-operator-certified
      source: certified-operators
      sourceNamespace: openshift-marketplace
    EOF
  12. Verify that the ClusterServiceVersion for Elastic Cloud on Kubernetes Succeeded:

    $ oc get csv --selector=operators.coreos.com/elasticsearch-eck-operator-certified.service-telemetry
    
    NAME                                          DISPLAY                        VERSION   REPLACES                                      PHASE
    elasticsearch-eck-operator-certified.v2.8.0   Elasticsearch (ECK) Operator   2.8.0     elasticsearch-eck-operator-certified.v2.7.0   Succeeded
  13. Create the Service Telemetry Operator subscription to manage the STF instances:

    $ oc create -f - <<EOF
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: service-telemetry-operator
      namespace: service-telemetry
    spec:
      channel: stable-1.5
      installPlanApproval: Automatic
      name: service-telemetry-operator
      source: redhat-operators
      sourceNamespace: openshift-marketplace
    EOF
  14. Validate the Service Telemetry Operator and the dependent operators have their phase as Succeeded:

    $ oc get csv --namespace service-telemetry
    
    NAME                                          DISPLAY                                       VERSION          REPLACES                                      PHASE
    amq7-interconnect-operator.v1.10.15           Red Hat Integration - AMQ Interconnect        1.10.15          amq7-interconnect-operator.v1.10.4            Succeeded
    elasticsearch-eck-operator-certified.v2.8.0   Elasticsearch (ECK) Operator                  2.8.0            elasticsearch-eck-operator-certified.v2.7.0   Succeeded
    openshift-cert-manager.v1.7.1                 cert-manager Operator for Red Hat OpenShift   1.7.1-1                                                        Succeeded
    prometheusoperator.0.56.3                     Prometheus Operator                           0.56.3           prometheusoperator.0.47.0                     Succeeded
    service-telemetry-operator.v1.5.1680516659    Service Telemetry Operator                    1.5.1680516659                                                 Succeeded
    smart-gateway-operator.v5.0.1680516659        Smart Gateway Operator                        5.0.1680516659                                                 Succeeded

3.2. Creating a ServiceTelemetry object in Red Hat OpenShift Container Platform

Create a ServiceTelemetry object in Red Hat OpenShift Container Platform to result in the Service Telemetry Operator creating the supporting components for a Service Telemetry Framework (STF) deployment. For more information, see Section 3.2.1, “Primary parameters of the ServiceTelemetry object”.

Procedure

  1. To create a ServiceTelemetry object that results in an STF deployment that uses the default values, create a ServiceTelemetry object with an empty spec parameter:

    $ oc apply -f - <<EOF
    apiVersion: infra.watch/v1beta1
    kind: ServiceTelemetry
    metadata:
      name: default
      namespace: service-telemetry
    spec: {}
    EOF

    Creating a ServiceTelemetry object with an empty spec parameter results in an STF deployment with the following default settings:

    apiVersion: infra.watch/v1beta1
    kind: ServiceTelemetry
    metadata:
      name: default
      namespace: service-telemetry
    spec:
      alerting:
        alertmanager:
          receivers:
            snmpTraps:
              alertOidLabel: oid
              community: public
              enabled: false
              port: 162
              retries: 5
              target: 192.168.24.254
              timeout: 1
              trapDefaultOid: 1.3.6.1.4.1.50495.15.1.2.1
              trapDefaultSeverity: ''
              trapOidPrefix: 1.3.6.1.4.1.50495.15
          storage:
            persistent:
              pvcStorageRequest: 20G
            strategy: persistent
        enabled: true
      backends:
        events:
          elasticsearch:
            certificates:
              caCertDuration: 70080h
              endpointCertDuration: 70080h
            storage:
              persistent:
                pvcStorageRequest: 20Gi
              strategy: persistent
            enabled: false
            version: 7.16.1
        logs:
          loki:
            storage:
              objectStorageSecret: test
              storageClass: standard
            enabled: false
            flavor: 1x.extra-small
            replicationFactor: 1
        metrics:
          prometheus:
            storage:
              persistent:
                pvcStorageRequest: 20G
              retention: 24h
              strategy: persistent
            enabled: true
            scrapeInterval: 10s
      clouds:
        - events:
            collectors:
              - bridge:
                  ringBufferCount: 15000
                  ringBufferSize: 16384
                  verbose: false
                collectorType: collectd
                debugEnabled: false
                subscriptionAddress: collectd/cloud1-notify
              - bridge:
                  ringBufferCount: 15000
                  ringBufferSize: 16384
                  verbose: false
                collectorType: ceilometer
                debugEnabled: false
                subscriptionAddress: anycast/ceilometer/cloud1-event.sample
          metrics:
            collectors:
              - bridge:
                  ringBufferCount: 15000
                  ringBufferSize: 16384
                  verbose: false
                collectorType: collectd
                debugEnabled: false
                subscriptionAddress: collectd/cloud1-telemetry
              - bridge:
                  ringBufferCount: 15000
                  ringBufferSize: 16384
                  verbose: false
                collectorType: ceilometer
                debugEnabled: false
                subscriptionAddress: anycast/ceilometer/cloud1-metering.sample
              - bridge:
                  ringBufferCount: 15000
                  ringBufferSize: 16384
                  verbose: false
                collectorType: sensubility
                debugEnabled: false
                subscriptionAddress: sensubility/cloud1-telemetry
          name: cloud1
      graphing:
        grafana:
          adminPassword: secret
          adminUser: root
          disableSignoutMenu: false
          ingressEnabled: false
        enabled: false
      highAvailability:
        enabled: false
      transports:
        qdr:
          certificates:
            caCertDuration: 70080h
            endpointCertDuration: 70080h
          web:
            enabled: false
          enabled: true
      observabilityStrategy: use_community

    To override these defaults, add the configuration to the spec parameter.

  2. View the STF deployment logs in the Service Telemetry Operator:

    $ oc logs --selector name=service-telemetry-operator
    
    ...
    --------------------------- Ansible Task Status Event StdOut  -----------------
    
    PLAY RECAP *********************************************************************
    localhost                  : ok=90   changed=0    unreachable=0    failed=0    skipped=26   rescued=0    ignored=0

Verification

  • To determine that all workloads are operating correctly, view the pods and the status of each pod.

    Note

    If you set the backends.events.elasticsearch.enabled parameter to true, the notification Smart Gateways report Error and CrashLoopBackOff error messages for a period of time before Elasticsearch starts.

    $ oc get pods
    
    NAME                                                      READY   STATUS    RESTARTS   AGE
    alertmanager-default-0                                    3/3     Running   0          4m7s
    default-cloud1-ceil-meter-smartgateway-669c6cdcf9-xvdvx   3/3     Running   0          3m46s
    default-cloud1-coll-meter-smartgateway-585855c59d-858rf   3/3     Running   0          3m46s
    default-cloud1-sens-meter-smartgateway-6f8dffb645-hhgkw   3/3     Running   0          3m46s
    default-interconnect-6994ff546-fx7jn                      1/1     Running   0          4m18s
    elastic-operator-9f44cdf6c-csvjq                          1/1     Running   0          19m
    interconnect-operator-646bfc886c-gx55n                    1/1     Running   0          25m
    prometheus-default-0                                      3/3     Running   0          3m33s
    prometheus-operator-54d644d8d7-wzdlh                      1/1     Running   0          20m
    service-telemetry-operator-54f6f7b6d-nfhwx                1/1     Running   0          18m
    smart-gateway-operator-9bbd7c56c-76w67                    1/1     Running   0          18m

3.2.1. Primary parameters of the ServiceTelemetry object

The ServiceTelemetry object comprises the following primary configuration parameters:

  • alerting
  • backends
  • clouds
  • graphing
  • highAvailability
  • transports

You can configure each of these configuration parameters to provide different features in an STF deployment.

The backends parameter

Use the backends parameter to control which storage back ends are available for storage of metrics and events, and to control the enablement of Smart Gateways that the clouds parameter defines. For more information, see the section called “The clouds parameter”.

You can use Prometheus as the metrics storage back end and Elasticsearch as the events storage back end. You can use the Service Telemetry Operator to create other custom resource objects that the Prometheus Operator and Elastic Cloud on Kubernetes Operator watch to create Prometheus and Elasticsearch workloads.

Enabling Prometheus as a storage back end for metrics

To enable Prometheus as a storage back end for metrics, you must configure the ServiceTelemetry object.

Procedure

  1. Edit the ServiceTelemetry object:

    $ oc edit stf default
  2. Set the value of the backends.metrics.prometheus.enabled parameter to true:

    apiVersion: infra.watch/v1beta1
    kind: ServiceTelemetry
    metadata:
      name: default
      namespace: service-telemetry
    spec:
      [...]
      backends:
        metrics:
          prometheus:
            enabled: true
Configuring persistent storage for Prometheus

Use the additional parameters that are defined in backends.metrics.prometheus.storage.persistent to configure persistent storage options for Prometheus, such as storage class and volume size.

Use storageClass to define the back end storage class. If you do not set this parameter, the Service Telemetry Operator uses the default storage class for the Red Hat OpenShift Container Platform cluster.

Use the pvcStorageRequest parameter to define the minimum required volume size to satisfy the storage request. If volumes are statically defined, it is possible that a volume size larger than requested is used. By default, Service Telemetry Operator requests a volume size of 20G (20 Gigabytes).

Procedure

  1. List the available storage classes:

    $ oc get storageclasses
    NAME                 PROVISIONER                RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
    csi-manila-ceph      manila.csi.openstack.org   Delete          Immediate              false                  20h
    standard (default)   kubernetes.io/cinder       Delete          WaitForFirstConsumer   true                   20h
    standard-csi         cinder.csi.openstack.org   Delete          WaitForFirstConsumer   true                   20h
  2. Edit the ServiceTelemetry object:

    $ oc edit stf default
  3. Set the value of the backends.metrics.prometheus.enabled parameter to true and the value of backends.metrics.prometheus.storage.strategy to persistent:

    apiVersion: infra.watch/v1beta1
    kind: ServiceTelemetry
    metadata:
      name: default
      namespace: service-telemetry
    spec:
      [...]
      backends:
        metrics:
          prometheus:
            enabled: true
            storage:
              strategy: persistent
              persistent:
                storageClass: standard-csi
                pvcStorageRequest: 50G
Enabling Elasticsearch as a storage back end for events

To enable Elasticsearch as a storage back end for events, you must configure the ServiceTelemetry object.

Procedure

  1. Edit the ServiceTelemetry object:

    $ oc edit stf default
  2. Set the value of the backends.events.elasticsearch.enabled parameter to true:

    apiVersion: infra.watch/v1beta1
    kind: ServiceTelemetry
    metadata:
      name: default
      namespace: service-telemetry
    spec:
      [...]
      backends:
        events:
          elasticsearch:
            enabled: true
Configuring persistent storage for Elasticsearch

Use the additional parameters defined in backends.events.elasticsearch.storage.persistent to configure persistent storage options for Elasticsearch, such as storage class and volume size.

Use storageClass to define the back end storage class. If you do not set this parameter, the Service Telemetry Operator uses the default storage class for the Red Hat OpenShift Container Platform cluster.

Use the pvcStorageRequest parameter to define the minimum required volume size to satisfy the storage request. If volumes are statically defined, it is possible that a volume size larger than requested is used. By default, Service Telemetry Operator requests a volume size of 20Gi (20 Gibibytes).

Procedure

  1. List the available storage classes:

    $ oc get storageclasses
    NAME                 PROVISIONER                RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
    csi-manila-ceph      manila.csi.openstack.org   Delete          Immediate              false                  20h
    standard (default)   kubernetes.io/cinder       Delete          WaitForFirstConsumer   true                   20h
    standard-csi         cinder.csi.openstack.org   Delete          WaitForFirstConsumer   true                   20h
  2. Edit the ServiceTelemetry object:

    $ oc edit stf default
  3. Set the value of the backends.events.elasticsearch.enabled parameter to true and the value of backends.events.elasticsearch.storage.strategy to persistent:

    apiVersion: infra.watch/v1beta1
    kind: ServiceTelemetry
    metadata:
      name: default
      namespace: service-telemetry
    spec:
      [...]
      backends:
        events:
          elasticsearch:
            enabled: true
            version: 7.16.1
            storage:
              strategy: persistent
              persistent:
                storageClass: standard-csi
                pvcStorageRequest: 50G
The clouds parameter

Use the clouds parameter to define which Smart Gateway objects deploy, thereby providing the interface for multiple monitored cloud environments to connect to an instance of STF. If a supporting back end is available, then metrics and events Smart Gateways for the default cloud configuration are created. By default, the Service Telemetry Operator creates Smart Gateways for cloud1.

You can create a list of cloud objects to control which Smart Gateways are created for the defined clouds. Each cloud consists of data types and collectors. Data types are metrics or events. Each data type consists of a list of collectors, the message bus subscription address, and a parameter to enable debugging. Available collectors for metrics are collectd, ceilometer, and sensubility. Available collectors for events are collectd and ceilometer. Ensure that the subscription address for each of these collectors is unique for every cloud, data type, and collector combination.

The default cloud1 configuration is represented by the following ServiceTelemetry object, which provides subscriptions and data storage of metrics and events for collectd, Ceilometer, and Sensubility data collectors for a particular cloud instance:

apiVersion: infra.watch/v1beta1
kind: ServiceTelemetry
metadata:
  name: default
  namespace: service-telemetry
spec:
  clouds:
    - name: cloud1
      metrics:
        collectors:
          - collectorType: collectd
            subscriptionAddress: collectd/cloud1-telemetry
          - collectorType: ceilometer
            subscriptionAddress: anycast/ceilometer/cloud1-metering.sample
          - collectorType: sensubility
            subscriptionAddress: sensubility/cloud1-telemetry
            debugEnabled: false
      events:
        collectors:
          - collectorType: collectd
            subscriptionAddress: collectd/cloud1-notify
          - collectorType: ceilometer
            subscriptionAddress: anycast/ceilometer/cloud1-event.sample

Each item of the clouds parameter represents a cloud instance. A cloud instance consists of three top-level parameters: name, metrics, and events. The metrics and events parameters represent the corresponding back end for storage of that data type. The collectors parameter specifies a list of objects made up of two required parameters, collectorType and subscriptionAddress, and these represent an instance of the Smart Gateway. The collectorType parameter specifies data collected by either collectd, Ceilometer, or Sensubility. The subscriptionAddress parameter provides the AMQ Interconnect address to which a Smart Gateway subscribes.

You can use the optional Boolean parameter debugEnabled within the collectors parameter to enable additional console debugging in the running Smart Gateway pod.

Additional resources

The alerting parameter

Use the alerting parameter to control creation of an Alertmanager instance and the configuration of the storage back end. By default, alerting is enabled. For more information, see Section 5.3, “Alerts in Service Telemetry Framework”.

The graphing parameter

Use the graphing parameter to control the creation of a Grafana instance. By default, graphing is disabled. For more information, see Section 5.1, “Dashboards in Service Telemetry Framework”.

The highAvailability parameter

Use the highAvailability parameter to control the instantiation of multiple copies of STF components to reduce recovery time of components that fail or are rescheduled. By default, highAvailability is disabled. For more information, see Section 5.6, “High availability”.

The transports parameter

Use the transports parameter to control the enablement of the message bus for a STF deployment. The only transport currently supported is AMQ Interconnect. By default, the qdr transport is enabled.

3.3. Accessing user interfaces for STF components

In Red Hat OpenShift Container Platform, applications are exposed to the external network through a route. For more information about routes, see Configuring ingress cluster traffic.

In Service Telemetry Framework (STF), HTTPS routes are exposed for each service that has a web-based interface. These routes are protected by Red Hat OpenShift Container Platform RBAC and any user that has a ClusterRoleBinding that enables them to view Red Hat OpenShift Container Platform Namespaces can log in. For more information about RBAC, see Using RBAC to define and apply permissions.

Procedure

  1. Log in to Red Hat OpenShift Container Platform.
  2. Change to the service-telemetry namespace:

    $ oc project service-telemetry
  3. List the available web UI routes in the service-telemetry project:

    $ oc get routes | grep web
    default-alertmanager-proxy   default-alertmanager-proxy-service-telemetry.apps.infra.watch          default-alertmanager-proxy   web     reencrypt/Redirect   None
    default-prometheus-proxy     default-prometheus-proxy-service-telemetry.apps.infra.watch            default-prometheus-proxy     web     reencrypt/Redirect   None
  4. In a web browser, navigate to https://<route_address> to access the web interface for the corresponding service.

3.4. Configuring an alternate observability strategy

To configure STF to skip the deployment of storage, visualization, and alerting backends, add observabilityStrategy: none to the ServiceTelemetry spec. In this mode, only AMQ Interconnect routers and metrics Smart Gateways are deployed, and you must configure an external Prometheus-compatible system to collect metrics from the STF Smart Gateways.

Note

Currently, only metrics are supported when you set observabilityStrategy to none. Events Smart Gateways are not deployed.

Procedure

  1. Create a ServiceTelemetry object with the property observabilityStrategy: none in the spec parameter. The manifest shows results in a default deployment of STF that is suitable for receiving telemetry from a single cloud with all metrics collector types.

    $ oc apply -f - <<EOF
    apiVersion: infra.watch/v1beta1
    kind: ServiceTelemetry
    metadata:
      name: default
      namespace: service-telemetry
    spec:
      observabilityStrategy: none
    EOF
  2. Delete the left over objects that are managed by community operators

    $ for o in alertmanager/default prometheus/default elasticsearch/elasticsearch grafana/default; do oc delete $o; done
  3. To verify that all workloads are operating correctly, view the pods and the status of each pod:

    $ oc get pods
    NAME                                                      READY   STATUS    RESTARTS   AGE
    default-cloud1-ceil-meter-smartgateway-59c845d65b-gzhcs   3/3     Running   0          132m
    default-cloud1-coll-meter-smartgateway-75bbd948b9-d5phm   3/3     Running   0          132m
    default-cloud1-sens-meter-smartgateway-7fdbb57b6d-dh2g9   3/3     Running   0          132m
    default-interconnect-668d5bbcd6-57b2l                     1/1     Running   0          132m
    interconnect-operator-b8f5bb647-tlp5t                     1/1     Running   0          47h
    service-telemetry-operator-566b9dd695-wkvjq               1/1     Running   0          156m
    smart-gateway-operator-58d77dcf7-6xsq7                    1/1     Running   0          47h

Additional resources

For more information about configuring additional clouds or to change the set of supported collectors, see Section 4.3.2, “Deploying Smart Gateways”