Chapter 2. Installing the core components of Service Telemetry Framework

Before you install Service Telemetry Framework (STF), ensure that Red Hat OpenShift Container Platform (OCP) version 4.5 is running and that you understand the core components of the framework. As part of the OCP installation planning process, ensure that the administrator provides persistent storage and enough resources to run the STF components on top of the OCP environment.

Important

Red Hat OpenShift Container Platform version 4.5 is currently required for a successful installation of STF. To upgrade to later versions of STF, you must migrate installations of STF 1.0 that use an OperatorSource to CatalogSource . For more information about migrating, see Migrating Service Telemetry Framework v1.0 from OperatorSource to CatalogSource.

2.1. The core components of STF

The following STF core components are managed by Operators:

  • Prometheus and AlertManager
  • ElasticSearch
  • Smart Gateway
  • AMQ Interconnect

Each component has a corresponding Operator that you can use to load the various application components and objects.

Additional resources

For more information about Operators, see the Understanding Operators guide.

2.2. Preparing your OCP environment for STF

As you prepare your OCP environment for STF, you must plan for persistent storage, adequate resources, and event storage:

2.2.1. Persistent volumes

STF uses persistent storage in OCP to instantiate the volumes dynamically so that Prometheus and ElasticSearch can store metrics and events. When persistent storage is enabled through the Service Telemetry Operator, the Persistent Volume Claims requested in an STF deployment results in an access mode of RWO (ReadWriteOnce). If your environment contains pre-provisioned persistent volumes, ensure that volumes of RWO are available in the OCP default configured storageClass. For more information about recommended configurable storage technology in Red Hat OpenShift Container Platform, see Recommended configurable storage technology.

Additional resources

For more information about configuring persistent storage for OCP, see Understanding persistent storage.

2.2.1.1. Using ephemeral storage

Warning

You can use ephemeral storage with STF. However, if you use ephemeral storage, you might experience data loss if a pod is restarted, updated, or rescheduled onto another node. Use ephemeral storage only for development or testing, and not production environments.

Additional resources

For more information about enabling ephemeral storage for STF, see Section 4.7.1, “Configuring ephemeral storage”.

2.2.2. Resource allocation

To enable the scheduling of pods within the OCP infrastructure, you need resources for the components that are running. If you do not allocate enough resources, pods remain in a Pending state because they cannot be scheduled.

The amount of resources that you require to run STF depends on your environment and the number of nodes and clouds that you want to monitor.

Additional resources

2.2.3. Node tuning operator

STF uses ElasticSearch to store events, which requires a larger than normal vm.max_map_count. The vm.max_map_count value is set by default in Red Hat OpenShift Container Platform.

Tip

If your host platform is a typical Red Hat OpenShift Container Platform 4 environment, do not make any adjustments. The default node tuning operator is configured to account for ElasticSearch workloads.

If you want to edit the value of vm.max_map_count, you cannot apply node tuning manually using the sysctl command because Red Hat OpenShift Container Platform manages nodes directly. To configure values and apply them to the infrastructure, you must use the node tuning operator. For more information, see Using the Node Tuning Operator.

In an OCP deployment, the default node tuning operator specification provides the required profiles for ElasticSearch workloads or pods scheduled on nodes. To view the default cluster node tuning specification, run the following command:

$ oc get Tuned/default -o yaml -n openshift-cluster-node-tuning-operator

The output of the default specification is documented at Default profiles set on a cluster. You can manage the assignment of profiles in the recommend section where profiles are applied to a node when certain conditions are met. When scheduling ElasticSearch to a node in STF, one of the following profiles is applied:

  • openshift-control-plane-es
  • openshift-node-es

When scheduling an ElasticSearch pod, there must be a label present that matches tuned.openshift.io/elasticsearch. If the label is present, one of the two profiles is assigned to the pod. No action is required by the administrator if you use the recommended Operator for ElasticSearch. If you use a custom-deployed ElasticSearch with STF, ensure that you add the tuned.openshift.io/elasticsearch label to all scheduled pods.

Additional resources

2.3. Deploying STF to the OCP environment

You can deploy STF to the OCP environment in one of two ways:

2.3.1. Deploying STF to the OCP environment with ElasticSearch

Complete the following tasks:

2.3.2. Deploying STF to the OCP environment without ElasticSearch

Complete the following tasks:

2.3.3. Creating a namespace

Create a namespace to hold the STF components. The service-telemetry namespace is used throughout the documentation:

Procedure

  • Enter the following command:

    $ oc new-project service-telemetry

2.3.4. Creating an OperatorGroup

Create an OperatorGroup in the namespace so that you can schedule the Operator pods.

Procedure

  • Enter the following command:

    $ oc apply -f - <<EOF
    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
      name: service-telemetry-operator-group
      namespace: service-telemetry
    spec:
      targetNamespaces:
      - service-telemetry
    EOF

Additional resources

For more information, see OperatorGroups.

2.3.5. Enabling the OperatorHub.io Community Catalog Source

Before you install ElasticSearch, you must have access to the resources on the OperatorHub.io Community Catalog Source:

Procedure

  • Enter the following command:

    $ oc apply -f - <<EOF
    apiVersion: operators.coreos.com/v1alpha1
    kind: CatalogSource
    metadata:
      name: operatorhubio-operators
      namespace: openshift-marketplace
    spec:
      sourceType: grpc
      image: quay.io/operator-framework/upstream-community-operators:latest
      displayName: OperatorHub.io Operators
      publisher: OperatorHub.io
    EOF

2.3.6. Enabling Red Hat STF Catalog Source

Before you deploy STF on Red Hat OpenShift Container Platform, you must enable the catalog source.

Procedure

  1. Install a CatalogSource that contains the Service Telemetry Operator and the Smart Gateway Operator:

    $ oc apply -f - <<EOF
    apiVersion: operators.coreos.com/v1alpha1
    kind: CatalogSource
    metadata:
      name: redhat-operators-stf
      namespace: openshift-marketplace
    spec:
      displayName: Red Hat STF Operators
      image: quay.io/redhat-operators-stf/stf-catalog:latest
      publisher: Red Hat
      sourceType: grpc
      updateStrategy:
        registryPoll:
          interval: 30m
    EOF
  2. To validate the creation of your CatalogSource, use the oc get catalogsources command:

    $ oc get -nopenshift-marketplace catalogsource redhat-operators-stf
    
    NAME                   DISPLAY                 TYPE   PUBLISHER   AGE
    redhat-operators-stf   Red Hat STF Operators   grpc   Red Hat     62m
  3. To validate that the Operators are available from the catalog, use the oc get packagemanifest command:

    $ oc get packagemanifests | grep "Red Hat STF"
    
    smart-gateway-operator                       Red Hat STF Operators      63m
    service-telemetry-operator                   Red Hat STF Operators      63m

2.3.7. Subscribing to the AMQ Certificate Manager Operator

You must subscribe to the AMQ Certificate Manager Operator before you deploy the other STF components because the AMQ Certificate Manager Operator runs globally-scoped. The AMQ Certificate Manager Operator is not compatible with the dependency management of Operator Lifecycle Manager when you use it with other namespace-scoped operators.

Procedure

  1. Subscribe to the AMQ Certificate Manager Operator, create the subscription, and validate the AMQ7 Certificate Manager:

    Note

    The AMQ Certificate Manager is installed globally for all namespaces, so the namespace value provided is openshift-operators. You might not see your amq7-cert-manager.v1.0.0 ClusterServiceVersion in the service-telemetry namespace for a few minutes until the processing executes against the namespace.

    $ oc apply -f - <<EOF
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: amq7-cert-manager
      namespace: openshift-operators
    spec:
      channel: alpha
      installPlanApproval: Automatic
      name: amq7-cert-manager
      source: redhat-operators
      sourceNamespace: openshift-marketplace
    EOF
  2. To validate your ClusterServiceVersion, use the oc get csv command:

    $ oc get --namespace openshift-operators csv
    
    NAME                       DISPLAY                                         VERSION   REPLACES   PHASE
    amq7-cert-manager.v1.0.0   Red Hat Integration - AMQ Certificate Manager   1.0.0                Succeeded

    Ensure that amq7-cert-manager.v1.0.0 has a phase Succeeded.

2.3.8. Subscribing to the Elastic Cloud on Kubernetes Operator

Before you install the Service Telemetry Operator and if you plan to store events in ElasticSearch, you must enable the Elastic Cloud Kubernetes Operator.

Procedure

  1. Apply the following manifest to your OCP environment to enable the Elastic Cloud on Kubernetes Operator:

    $ oc apply -f - <<EOF
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: elastic-cloud-eck
      namespace: service-telemetry
    spec:
      channel: stable
      installPlanApproval: Automatic
      name: elastic-cloud-eck
      source: operatorhubio-operators
      sourceNamespace: openshift-marketplace
    EOF
  2. To verify that the ClusterServiceVersion for ElasticSearch Cloud on Kubernetes succeeded, enter the oc get csv command:

    $ oc get csv
    
    NAME                       DISPLAY                                         VERSION   REPLACES   PHASE
    elastic-cloud-eck.v1.2.1   Elastic Cloud on Kubernetes                     1.2.1                Succeeded

2.3.9. Subscribing to the Service Telemetry Operator

You must subscribe to the Service Telemetry Operator, which manages the STF instances.

Procedure

  1. To create the Service Telemetry Operator subscription, enter the oc apply -f command:

    $ oc apply -f - <<EOF
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: service-telemetry-operator
      namespace: service-telemetry
    spec:
      channel: stable
      installPlanApproval: Automatic
      name: service-telemetry-operator
      source: redhat-operators-stf
      sourceNamespace: openshift-marketplace
    EOF
  2. To validate the Service Telemetry Operator and the dependent operators, enter the following command:

    $ oc get csv --namespace service-telemetry
    NAME                                DISPLAY                                         VERSION   REPLACES   PHASE
    amq7-cert-manager.v1.0.0            Red Hat Integration - AMQ Certificate Manager   1.0.0                Succeeded
    amq7-interconnect-operator.v1.2.1   Red Hat Integration - AMQ Interconnect          1.2.1                Succeeded
    elastic-cloud-eck.v1.2.1            Elastic Cloud on Kubernetes                     1.2.1                Succeeded
    prometheusoperator.0.37.0           Prometheus Operator                             0.37.0               Succeeded
    service-telemetry-operator.v1.1.0   Service Telemetry Operator                      1.1.0                Succeeded
    smart-gateway-operator.v2.1.0       Smart Gateway Operator                          2.1.0                Succeeded

2.3.10. Overview of the ServiceTelemetry object

Important

Versions of Service Telemetry Operator prior to v1.1.0 used a flat API (servicetelemetry.infra.watch/v1alpha1) interface for creating the ServiceTelemetry object. In Service Telemetry Operator v1.1.0, there is a dictionary-based API interface (servicetelemetry.infra.watch/v1beta1) to allow for better control of STF deployments, including managing multi-cluster deployments natively, and allowing the management of additional storage backends in the future. Ensure that any previously created ServiceTelemetry objects are updated to the new interface.

To deploy the Service Telemetry Framework, you must create an instance of ServiceTelemetry in OCP. The ServiceTelemetry object is made up of the following major configuration parameters:

  • alerting
  • backends
  • clouds
  • graphing
  • highAvailability
  • transports

Each of these top-level configuration parameters provides various controls for a Service Telemetry Framework deployment.

2.3.10.1. backends

Use the backends parameter to control which storage backends are available for storage of metrics and events, and to control the enablement of Smart Gateways, as defined by the clouds parameter. For more information, see Section 2.3.10.2, “clouds”.

Currently, you can use Prometheus as the metrics backend, and ElasticSearch as the events backend.

2.3.10.1.1. Enabling Prometheus as a storage backend for metrics

Procedure

  • To enable Prometheus as a storage backend for metrics, configure the ServiceTelemetry object:

    apiVersion: infra.watch/v1beta1
    kind: ServiceTelemetry
    metadata:
      name: default
      namespace: service-telemetry
    spec:
      backends:
        metrics:
          prometheus:
            enabled: true
2.3.10.1.2. Enabling ElasticSearch as a storage backend for events

To enable events support in STF, you must enable the Elastic Cloud for Kubernetes Operator. For more information, see Section 2.3.8, “Subscribing to the Elastic Cloud on Kubernetes Operator”.

By default, ElasticSearch storage of events is disabled. For more information, see Section 2.3.2, “Deploying STF to the OCP environment without ElasticSearch”.

2.3.10.2. clouds

Use the clouds parameter to control which Smart Gateway objects are deployed, thereby providing the interface for multiple monitored cloud environments to connect to an instance of STF. If a supporting backend is available, then metrics and events Smart Gateways for the default cloud configuration are created. By default, the Service Telemetry Operator creates Smart Gateways for cloud1.

You can create a list of cloud objects to control which Smart Gateways are created for each cloud defined. Each cloud is made up of data types and collectors. Data types are metrics or events. Each data type is made up of a list of collectors and the message bus subscription address. Available collectors are collectd and ceilometer. Ensure that the subscription address for each of these collectors is unique for every cloud, data type, and collector combination.

The default cloud1 configuration is represented by the following ServiceTelemetry object, providing subscriptions and data storage of metrics and events for both collectd and Ceilometer data collectors for a particular cloud instance:

apiVersion: infra.watch/v1beta1
kind: ServiceTelemetry
metadata:
  name: stf-default
  namespace: service-telemetry
spec:
  clouds:
    - name: cloud1
      metrics:
        collectors:
          - collectorType: collectd
            subscriptionAddress: collectd/telemetry
          - collectorType: ceilometer
            subscriptionAddress: anycast/ceilometer/metering.sample
      events:
        collectors:
          - collectorType: collectd
            subscriptionAddress: collectd/notify
          - collectorType: ceilometer
            subscriptionAddress: anycast/ceilometer/event.sample

Each item of the clouds parameter represents a cloud instance. The cloud instances are made up of 3 top-level parameters: name, metrics, and events. The metrics and events parameters represent the corresponding backend for storage of that data type. The collectors parameter then specifies a list of objects made up of two parameters, collectorType and subscriptionAddress, and these represent an instance of the Smart Gateway. The collectorType specifies data collected by either collectd or Ceilometer. The subscriptionAddress parameter provides the AMQ Interconnect address that a Smart Gateway instance should subscribe to.

2.3.10.3. alerting

Use the alerting parameter to control creation of an Alertmanager instance and the configuration of the storage backend. By default, alerting is enabled. For more information, see Section 4.2, “Alerts”.

2.3.10.4. graphing

Use the graphing parameter to control the creation of a Grafana instance. By default, graphing is disabled. For more information, see Section 4.5, “Dashboards”.

2.3.10.5. highAvailability

Use The highAvailability parameter to control the instantiation of multiple copies of STF components to reduce recovery time of components that fail or are rescheduled. By default, highAvailability is disabled. For more information, see Section 4.4, “High availability”.

2.3.10.6. transports

Use the transports parameter to control the enablement of the message bus for a STF deployment. The only transport currently supported is AMQ Interconnect. Ensure that it is enabled for proper operation of STF. By default, the qdr transport is enabled.

2.3.11. Creating a ServiceTelemetry object in OCP

Create a ServiceTelemetry object in OCP to result in the creation of supporting components for a Service Telemetry Framework deployment. For more information, see Section 2.3.10, “Overview of the ServiceTelemetry object”.

Procedure

  1. To create a ServiceTelemetry object that results in a default STF deployment, create a ServiceTelemetry object with an empty spec object:

    $ oc apply -f - <<EOF
    apiVersion: infra.watch/v1beta1
    kind: ServiceTelemetry
    metadata:
      name: default
      namespace: service-telemetry
    spec: {}
    EOF

    Creating a default ServiceTelemetry object results in a STF deployment with the following defaults:

    apiVersion: infra.watch/v1beta1
    kind: ServiceTelemetry
    metadata:
      name: default
    spec:
      alerting:
        enabled: true
        alertmanager:
          storage:
            strategy: persistent
            persistent:
              storageSelector: {}
              pvcStorageRequest: 20G
      backends:
        metrics:
          prometheus:
            enabled: true
            scrapeInterval: 10s
            storage:
              strategy: persistent
              persistent:
                storageSelector: {}
                pvcStorageRequest: 20G
        events:
          elasticsearch:
            enabled: false
            storage:
              strategy: persistent
              persistent:
                pvcStorageRequest: 20Gi
      graphing:
        enabled: false
        grafana:
          ingressEnabled: false
          adminPassword: secret
          adminUser: root
          disableSignoutMenu: false
      transports:
        qdr:
          enabled: true
      highAvailability:
        enabled: false
      clouds:
        - name: cloud1
          metrics:
            collectors:
              - collectorType: collectd
                subscriptionAddress: collectd/telemetry
              - collectorType: ceilometer
                subscriptionAddress: anycast/ceilometer/metering.sample
          events:
            collectors:
              - collectorType: collectd
                subscriptionAddress: collectd/notify
              - collectorType: ceilometer
                subscriptionAddress: anycast/ceilometer/event.sample
  2. Optional: To create a ServiceTelemetry object that results in collection and storage of events for the default cloud, enable the ElasticSearch backend:

    $ oc apply -f - <<EOF
    apiVersion: infra.watch/v1beta1
    kind: ServiceTelemetry
    metadata:
      name: default
      namespace: service-telemetry
    spec:
      backends:
        events:
          elasticsearch:
            enabled: true
    EOF
  3. To view the STF deployment logs in the Service Telemetry Operator, use the oc logs command:

    $ oc logs --selector name=service-telemetry-operator -c ansible
    PLAY RECAP ***
    localhost                  : ok=55   changed=0    unreachable=0    failed=0    skipped=16   rescued=0    ignored=0
  4. View the pods and the status of each pod to determine that all workloads are operating nominally:

    Note

    If you set backends.events.elasticsearch.enabled: true, the notification Smart Gateways reports Error and CrashLoopBackOff error messages for a period of time before ElasticSearch starts.

    $ oc get pods
    
    NAME                                                      READY   STATUS    RESTARTS   AGE
    alertmanager-default-0                                    2/2     Running   0          38s
    default-cloud1-ceil-meter-smartgateway-58d8876857-lbf9d   1/1     Running   0          159m
    default-cloud1-coll-meter-smartgateway-8645d64f5f-rxfpb   2/2     Running   0          159m
    default-interconnect-79d9994b5-xnfvv                      1/1     Running   0          167m
    elastic-operator-746f86c956-jkvcq                         1/1     Running   0          6h23m
    interconnect-operator-5b474bdddc-sztsj                    1/1     Running   0          6h19m
    prometheus-default-0                                      3/3     Running   1          5m39s
    prometheus-operator-7dfb478c8b-bfd4j                      1/1     Running   0          6h19m
    service-telemetry-operator-656fc8ccb6-4w8x4               2/2     Running   0          98m
    smart-gateway-operator-7f49676d5d-nqzmp                   2/2     Running   0          6h21m

2.4. Removing STF from the OCP environment

Remove STF from an OCP environment if you no longer require the STF functionality.

Complete the following tasks:

2.4.1. Deleting the namespace

To remove the operational resources for STF from OCP, delete the namespace.

Procedure

  1. Run the oc delete command:

    $ oc delete project service-telemetry
  2. Verify that the resources have been deleted from the namespace:

    $ oc get all
    No resources found.

2.4.2. Removing the CatalogSource

If you do not expect to install Service Telemetry Framework again, delete the CatalogSource. When you remove the CatalogSource, PackageManifests related to STF are removed from the Operator Lifecycle Manager catalog.

Procedure

  1. Delete the CatalogSource:

    $ oc delete --namespace=openshift-marketplace catalogsource redhat-operators-stf
    catalogsource.operators.coreos.com "redhat-operators-stf" deleted
  2. Verify that the STF PackageManifests are removed from the platform. If successful, the following command returns no result:

    $ oc get packagemanifests | grep "Red Hat STF"
  3. If you enabled the OperatorHub.io Community Catalog Source during the installation process and you no longer need this catalog source, delete it:

    $ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operators
    catalogsource.operators.coreos.com "operatorhubio-operators" deleted

Additional resources

For more information about the OperatorHub.io Community Catalog Source, see Section 2.3, “Deploying STF to the OCP environment”.