Chapter 3. Installing the core components of Service Telemetry Framework

You can use Operators to load the Service Telemetry Framework (STF) components and objects. Operators manage each of the following STF core and community components:

  • AMQ Interconnect
  • Smart Gateway
  • Prometheus and AlertManager
  • ElasticSearch
  • Grafana

Prerequisites

  • An Red Hat OpenShift Container Platform version inclusive of 4.7 through 4.8 is running.
  • You have prepared your Red Hat OpenShift Container Platform environment and ensured that there is persistent storage and enough resources to run the STF components on top of the Red Hat OpenShift Container Platform environment. For more information, see Service Telemetry Framework Performance and Scaling.
Important

STF is compatible with Red Hat OpenShift Container Platform version 4.7 through 4.8.

Additional resources

3.1. Deploying Service Telemetry Framework to the Red Hat OpenShift Container Platform environment

Deploy Service Telemetry Framework (STF) to collect, store, and monitor events:

Procedure

  1. Create a namespace to contain the STF components, for example, service-telemetry:

    $ oc new-project service-telemetry
  2. Create an OperatorGroup in the namespace so that you can schedule the Operator pods:

    $ oc create -f - <<EOF
    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
      name: service-telemetry-operator-group
      namespace: service-telemetry
    spec:
      targetNamespaces:
      - service-telemetry
    EOF

    For more information, see OperatorGroups.

  3. Enable the OperatorHub.io Community Catalog Source to install data storage and visualization Operators:

    Warning

    Red Hat supports the core Operators and workloads, including AMQ Interconnect, AMQ Certificate Manager, Service Telemetry Operator, and Smart Gateway Operator. Red Hat does not support the community Operators or workload components, inclusive of ElasticSearch, Prometheus, Alertmanager, Grafana, and their Operators.

    $ oc create -f - <<EOF
    apiVersion: operators.coreos.com/v1alpha1
    kind: CatalogSource
    metadata:
      name: operatorhubio-operators
      namespace: openshift-marketplace
    spec:
      sourceType: grpc
      image: quay.io/operatorhubio/catalog:latest
      displayName: OperatorHub.io Operators
      publisher: OperatorHub.io
    EOF
  4. Subscribe to the AMQ Certificate Manager Operator by using the redhat-operators CatalogSource:

    Note

    The AMQ Certificate Manager deploys to the openshift-operators namespace and is then available to all namespaces across the cluster. As a result, on clusters with a large number of namespaces, it can take several minutes for the Operator to be available in the service-telemetry namespace. The AMQ Certificate Manager Operator is not compatible with the dependency management of Operator Lifecycle Manager when you use it with other namespace-scoped operators.

    $ oc create -f - <<EOF
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: amq7-cert-manager-operator
      namespace: openshift-operators
    spec:
      channel: 1.x
      installPlanApproval: Automatic
      name: amq7-cert-manager-operator
      source: redhat-operators
      sourceNamespace: openshift-marketplace
    EOF
  5. Validate your ClusterServiceVersion. Ensure that amq7-cert-manager.v1.0.1 displays a phase of Succeeded:

    $ oc get --namespace openshift-operators csv
    
    NAME                       DISPLAY                                         VERSION   REPLACES                   PHASE
    amq7-cert-manager.v1.0.3   Red Hat Integration - AMQ Certificate Manager   1.0.3     amq7-cert-manager.v1.0.2   Succeeded
  6. If you plan to store events in ElasticSearch, you must enable the Elastic Cloud on Kubernetes (ECK) Operator. To enable the ECK Operator, create the following manifest in your Red Hat OpenShift Container Platform environment:

    $ oc create -f - <<EOF
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: elasticsearch-eck-operator-certified
      namespace: service-telemetry
    spec:
      channel: stable
      installPlanApproval: Automatic
      name: elasticsearch-eck-operator-certified
      source: certified-operators
      sourceNamespace: openshift-marketplace
    EOF
  7. Verify that the ClusterServiceVersion for Elastic Cloud on Kubernetes Succeeded:

    $ oc get csv
    
    NAME                                          DISPLAY                        VERSION   REPLACES                                     PHASE
    ...
    elasticsearch-eck-operator-certified.1.9.1   Elasticsearch (ECK) Operator                    1.9.1                                Succeeded
    ...
  8. Create the Smart Gateway Operator subscription to manage the Smart Gateway instances:

    $ oc create -f - <<EOF
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: smart-gateway-operator
      namespace: service-telemetry
    spec:
      channel: stable-1.3
      installPlanApproval: Automatic
      name: smart-gateway-operator
      source: redhat-operators
      sourceNamespace: openshift-marketplace
    EOF
  9. Create the Service Telemetry Operator subscription to manage the STF instances:

    $ oc create -f - <<EOF
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: service-telemetry-operator
      namespace: service-telemetry
    spec:
      channel: stable-1.3
      installPlanApproval: Automatic
      name: service-telemetry-operator
      source: redhat-operators
      sourceNamespace: openshift-marketplace
    EOF
  10. Validate the Service Telemetry Operator and the dependent operators:

    $ oc get csv --namespace service-telemetry
    
    NAME                                         DISPLAY                                         VERSION          REPLACES                             PHASE
    amq7-cert-manager.v1.0.3                     Red Hat Integration - AMQ Certificate Manager   1.0.3            amq7-cert-manager.v1.0.2             Succeeded
    amq7-interconnect-operator.v1.10.5           Red Hat Integration - AMQ Interconnect          1.10.5           amq7-interconnect-operator.v1.10.4   Succeeded
    elasticsearch-eck-operator-certified.1.9.1   Elasticsearch (ECK) Operator                    1.9.1                                                 Succeeded
    prometheusoperator.0.47.0                    Prometheus Operator                             0.47.0           prometheusoperator.0.37.0            Succeeded
    service-telemetry-operator.v1.3.1635451892   Service Telemetry Operator                      1.3.1635451892                                        Succeeded
    smart-gateway-operator.v3.0.1635451893       Smart Gateway Operator                          3.0.1635451893                                        Succeeded

3.2. Creating a ServiceTelemetry object in Red Hat OpenShift Container Platform

Create a ServiceTelemetry object in Red Hat OpenShift Container Platform to result in the Service Telemetry Operator creating the supporting components for a Service Telemetry Framework (STF) deployment. For more information, see Section 3.2.1, “Primary parameters of the ServiceTelemetry object”.

Procedure

  1. To create a ServiceTelemetry object that results in an STF deployment that uses the default values, create a ServiceTelemetry object with an empty spec parameter:

    $ oc apply -f - <<EOF
    apiVersion: infra.watch/v1beta1
    kind: ServiceTelemetry
    metadata:
      name: default
      namespace: service-telemetry
    spec: {}
    EOF

    To override a default value, define the parameter that you want to override. In this example, enable ElasticSearch by setting enabled to true:

    $ oc apply -f - <<EOF
    apiVersion: infra.watch/v1beta1
    kind: ServiceTelemetry
    metadata:
      name: default
      namespace: service-telemetry
    spec:
      backends:
        events:
          elasticsearch:
            enabled: true
    EOF

    Creating a ServiceTelemetry object with an empty spec parameter results in an STF deployment with the following default settings:

    apiVersion: infra.watch/v1beta1
    kind: ServiceTelemetry
    metadata:
      name: default
    spec:
      alerting:
        alertmanager:
          storage:
            persistent:
              pvcStorageRequest: 20G
              storageSelector: {}
            receivers:
              snmpTraps:
                enabled: false
                target: 192.168.24.254
            strategy: persistent
        enabled: true
      backends:
        events:
          elasticsearch:
            enabled: false
            storage:
              persistent:
                pvcStorageRequest: 20Gi
                storageSelector: {}
              strategy: persistent
        metrics:
          prometheus:
            enabled: true
            scrapeInterval: 10s
            storage:
              persistent:
                pvcStorageRequest: 20G
                storageSelector: {}
              retention: 24h
              strategy: persistent
      graphing:
        enabled: false
        grafana:
          adminPassword: secret
          adminUser: root
          disableSignoutMenu: false
          ingressEnabled: false
          baseImage: docker.io/grafana/grafana:8.1.2
      highAvailability:
        enabled: false
      transports:
        qdr:
          enabled: true
          web:
            enabled: false
      clouds:
        - name: cloud1
          metrics:
            collectors:
              - collectorType: collectd
                subscriptionAddress: collectd/telemetry
                debugEnabled: false
              - collectorType: ceilometer
                subscriptionAddress: anycast/ceilometer/metering.sample
                debugEnabled: false
              - collectorType: sensubility
                subscriptionAddress: sensubility/telemetry
                debugEnabled: false
          events:
            collectors:
              - collectorType: collectd
                subscriptionAddress: collectd/notify
                debugEnabled: false
              - collectorType: ceilometer
                subscriptionAddress: anycast/ceilometer/event.sample
                debugEnabled: false

    To override these defaults, add the configuration to the spec parameter.

  2. View the STF deployment logs in the Service Telemetry Operator:

    $ oc logs --selector name=service-telemetry-operator
    
    ...
    --------------------------- Ansible Task Status Event StdOut  -----------------
    
    PLAY RECAP *********************************************************************
    localhost                  : ok=57   changed=0    unreachable=0    failed=0    skipped=20   rescued=0    ignored=0

Verification

  • To determine that all workloads are operating correctly, view the pods and the status of each pod.

    Note

    If you set the backends.events.elasticsearch.enabled parameter to true, the notification Smart Gateways report Error and CrashLoopBackOff error messages for a period of time before ElasticSearch starts.

    $ oc get pods
    
    NAME                                                      READY   STATUS    RESTARTS   AGE
    alertmanager-default-0                                    2/2     Running   0          17m
    default-cloud1-ceil-meter-smartgateway-6484b98b68-vd48z   2/2     Running   0          17m
    default-cloud1-coll-meter-smartgateway-799f687658-4gxpn   2/2     Running   0          17m
    default-cloud1-sens-meter-smartgateway-c7f4f7fc8-c57b4    2/2     Running   0          17m
    default-interconnect-54658f5d4-pzrpt                      1/1     Running   0          17m
    elastic-operator-66b7bc49c4-sxkc2                         1/1     Running   0          52m
    interconnect-operator-69df6b9cb6-7hhp9                    1/1     Running   0          50m
    prometheus-default-0                                      2/2     Running   1          17m
    prometheus-operator-6458b74d86-wbdqp                      1/1     Running   0          51m
    service-telemetry-operator-864646787c-hd9pm               1/1     Running   0          51m
    smart-gateway-operator-79778cf548-mz5z7                   1/1     Running   0          51m

3.2.1. Primary parameters of the ServiceTelemetry object

The ServiceTelemetry object comprises the following primary configuration parameters:

  • alerting
  • backends
  • clouds
  • graphing
  • highAvailability
  • transports

You can configure each of these configuration parameters to provide different features in an STF deployment.

Important

Support for servicetelemetry.infra.watch/v1alpha1 was removed from STF 1.3.

The backends parameter

Use the backends parameter to control which storage back ends are available for storage of metrics and events, and to control the enablement of Smart Gateways that the clouds parameter defines. For more information, see the section called “The clouds parameter”.

Currently, you can use Prometheus as the metrics storage back end and ElasticSearch as the events storage back end.

Enabling Prometheus as a storage back end for metrics

To enable Prometheus as a storage back end for metrics, you must configure the ServiceTelemetry object.

Procedure

  • Configure the ServiceTelemetry object:

    apiVersion: infra.watch/v1beta1
    kind: ServiceTelemetry
    metadata:
      name: default
      namespace: service-telemetry
    spec:
      backends:
        metrics:
          prometheus:
            enabled: true
Configuring persistent storage for Prometheus

Use the additional parameters that are defined in backends.metrics.prometheus.storage.persistent to configure persistent storage options for Prometheus, such as storage class and volume size.

Use storageClass to define the back end storage class. If you do not set this parameter, the Service Telemetry Operator uses the default storage class for the Red Hat OpenShift Container Platform cluster.

Use the pvcStorageRequest parameter to define the minimum required volume size to satisfy the storage request. If volumes are statically defined, it is possible that a volume size larger than requested is used. By default, Service Telemetry Operator requests a volume size of 20G (20 Gigabytes).

Procedure

  • List the available storage classes:

    $ oc get storageclasses
    NAME                 PROVISIONER                RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
    csi-manila-ceph      manila.csi.openstack.org   Delete          Immediate              false                  20h
    standard (default)   kubernetes.io/cinder       Delete          WaitForFirstConsumer   true                   20h
    standard-csi         cinder.csi.openstack.org   Delete          WaitForFirstConsumer   true                   20h
  • Configure the ServiceTelemetry object:

    apiVersion: infra.watch/v1beta1
    kind: ServiceTelemetry
    metadata:
      name: default
      namespace: service-telemetry
    spec:
      backends:
        metrics:
          prometheus:
            enabled: true
            storage:
              strategy: persistent
              persistent:
                storageClass: standard-csi
                pvcStorageRequest: 50G
Enabling ElasticSearch as a storage back end for events

To enable ElasticSearch as a storage back end for events, you must configure the ServiceTelemetry object.

Procedure

  • Configure the ServiceTelemetry object:

    apiVersion: infra.watch/v1beta1
    kind: ServiceTelemetry
    metadata:
      name: default
      namespace: service-telemetry
    spec:
      backends:
        events:
          elasticsearch:
            enabled: true
Configuring persistent storage for ElasticSearch

Use the additional parameters defined in backends.events.elasticsearch.storage.persistent to configure persistent storage options for ElasticSearch, such as storage class and volume size.

Use storageClass to define the back end storage class. If you do not set this parameter, the Service Telemetry Operator uses the default storage class for the Red Hat OpenShift Container Platform cluster.

Use the pvcStorageRequest parameter to define the minimum required volume size to satisfy the storage request. If volumes are statically defined, it is possible that a volume size larger than requested is used. By default, Service Telemetry Operator requests a volume size of 20Gi (20 Gibibytes).

Procedure

  • List the available storage classes:

    $ oc get storageclasses
    NAME                 PROVISIONER                RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
    csi-manila-ceph      manila.csi.openstack.org   Delete          Immediate              false                  20h
    standard (default)   kubernetes.io/cinder       Delete          WaitForFirstConsumer   true                   20h
    standard-csi         cinder.csi.openstack.org   Delete          WaitForFirstConsumer   true                   20h
  • Configure the ServiceTelemetry object:

    apiVersion: infra.watch/v1beta1
    kind: ServiceTelemetry
    metadata:
      name: default
      namespace: service-telemetry
    spec:
      backends:
        events:
          elasticsearch:
            enabled: true
            version: 7.16.1
            storage:
              strategy: persistent
              persistent:
                storageClass: standard-csi
                pvcStorageRequest: 50G
The clouds parameter

Use the clouds parameter to define which Smart Gateway objects deploy, thereby providing the interface for multiple monitored cloud environments to connect to an instance of STF. If a supporting back end is available, then metrics and events Smart Gateways for the default cloud configuration are created. By default, the Service Telemetry Operator creates Smart Gateways for cloud1.

You can create a list of cloud objects to control which Smart Gateways are created for the defined clouds. Each cloud consists of data types and collectors. Data types are metrics or events. Each data type consists of a list of collectors, the message bus subscription address, and a parameter to enable debugging. Available collectors for metrics are collectd, ceilometer, and sensubility. Available collectors for events are collectd and ceilometer. Ensure that the subscription address for each of these collectors is unique for every cloud, data type, and collector combination.

The default cloud1 configuration is represented by the following ServiceTelemetry object, which provides subscriptions and data storage of metrics and events for collectd, Ceilometer, and Sensubility data collectors for a particular cloud instance:

apiVersion: infra.watch/v1beta1
kind: ServiceTelemetry
metadata:
  name: stf-default
  namespace: service-telemetry
spec:
  clouds:
    - name: cloud1
      metrics:
        collectors:
          - collectorType: collectd
            subscriptionAddress: collectd/telemetry
          - collectorType: ceilometer
            subscriptionAddress: anycast/ceilometer/metering.sample
          - collectorType: sensubility
            subscriptionAddress: sensubility/telemetry
            debugEnabled: false
      events:
        collectors:
          - collectorType: collectd
            subscriptionAddress: collectd/notify
          - collectorType: ceilometer
            subscriptionAddress: anycast/ceilometer/event.sample

Each item of the clouds parameter represents a cloud instance. A cloud instance consists of three top-level parameters: name, metrics, and events. The metrics and events parameters represent the corresponding back end for storage of that data type. The collectors parameter specifies a list of objects made up of two required parameters, collectorType and subscriptionAddress, and these represent an instance of the Smart Gateway. The collectorType parameter specifies data collected by either collectd, Ceilometer, or Sensubility. The subscriptionAddress parameter provides the AMQ Interconnect address to which a Smart Gateway subscribes.

You can use the optional Boolean parameter debugEnabled within the collectors parameter to enable additional console debugging in the running Smart Gateway pod.

Additional resources

The alerting parameter

Use the alerting parameter to control creation of an Alertmanager instance and the configuration of the storage back end. By default, alerting is enabled. For more information, see Section 5.3, “Alerts in Service Telemetry Framework”.

The graphing parameter

Use the graphing parameter to control the creation of a Grafana instance. By default, graphing is disabled. For more information, see Section 5.1, “Dashboards in Service Telemetry Framework”.

The highAvailability parameter

Use the highAvailability parameter to control the instantiation of multiple copies of STF components to reduce recovery time of components that fail or are rescheduled. By default, highAvailability is disabled. For more information, see Section 5.5, “High availability”.

The transports parameter

Use the transports parameter to control the enablement of the message bus for a STF deployment. The only transport currently supported is AMQ Interconnect. By default, the qdr transport is enabled.

3.3. Removing Service Telemetry Framework from the Red Hat OpenShift Container Platform environment

Remove Service Telemetry Framework (STF) from an Red Hat OpenShift Container Platform environment if you no longer require the STF functionality.

3.3.1. Deleting the namespace

To remove the operational resources for STF from Red Hat OpenShift Container Platform, delete the namespace.

Procedure

  1. Run the oc delete command:

    $ oc delete project service-telemetry
  2. Verify that the resources have been deleted from the namespace:

    $ oc get all
    No resources found.

3.3.2. Removing the CatalogSource

If you do not expect to install Service Telemetry Framework (STF) again, delete the CatalogSource. When you remove the CatalogSource, PackageManifests related to STF are automatically removed from the Operator Lifecycle Manager catalog.

Procedure

  1. If you enabled the OperatorHub.io Community Catalog Source during the installation process and you no longer need this catalog source, delete it:

    $ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operators
    catalogsource.operators.coreos.com "operatorhubio-operators" deleted

Additional resources

For more information about the OperatorHub.io Community Catalog Source, see Section 3.1, “Deploying Service Telemetry Framework to the Red Hat OpenShift Container Platform environment”.