Chapter 2. Installing the core components of Service Telemetry Framework

Before you install Service Telemetry Framework (STF), ensure that Red Hat OpenShift Container Platform (OCP) version 4.x is running and that you understand the core components of the framework. As part of the OCP installation planning process, ensure that the administrator provides persistent storage and enough resources to run the STF components on top of the OCP environment.

Warning

Red Hat OpenShift Container Platform version 4.3 or later is currently required for a successful installation of STF.

2.1. The core components of STF

The following STF core components are managed by Operators:

  • Prometheus and AlertManager
  • ElasticSearch
  • Smart Gateway
  • AMQ Interconnect

Each component has a corresponding Operator that you can use to load the various application components and objects.

Additional resources

For more information about Operators, see the Understanding Operators guide.

2.2. Preparing your OCP environment for STF

As you prepare your OCP environment for STF, you must plan for persistent storage, adequate resources, and event storage:

2.2.1. Persistent volumes

STF uses persistent storage in OCP to instantiate the volumes dynamically so that Prometheus and ElasticSearch can store metrics and events.

Additional resources

For more information about configuring persistent storage for OCP, see Understanding persistent storage.

2.2.1.1. Using ephemeral storage

Warning

You can use ephemeral storage with STF. However, if you use ephemeral storage, you might experience data loss if a pod is restarted, updated, or rescheduled onto another node. Use ephemeral storage only for development or testing, and not production environments.

Procedure

  • To enable ephemeral storage for STF, set storageEphemeralEnabled: true in your ServiceTelemetry manifest.

Additional resources

For more information about enabling ephemeral storage for STF, see Section 4.6.1, “Configuring ephemeral storage”.

2.2.2. Resource allocation

To enable the scheduling of pods within the OCP infrastructure, you need resources for the components that are running. If you do not allocate enough resources, pods remain in a Pending state because they cannot be scheduled.

The amount of resources that you require to run STF depends on your environment and the number of nodes and clouds that you want to monitor.

Additional resources

For recommendations about sizing for metrics collection see https://access.redhat.com/articles/4907241.

For information about sizing requirements for ElasticSearch, see https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-managing-compute-resources.html

2.2.3. Node tuning operator

STF uses ElasticSearch to store events, which requires a larger than normal vm.max_map_count. The vm.max_map_count value is set by default in Red Hat OpenShift Container Platform.

If you want to edit the value of vm.max_map_count, you cannot apply node tuning manually using the sysctl command because Red Hat OpenShift Container Platform manages nodes directly. To configure values and apply them to the infrastructure, you must use the node tuning operator. For more information, see Using the Node Tuning Operator.

In an OCP deployment, the default node tuning operator specification provides the required profiles for ElasticSearch workloads or pods scheduled on nodes. To view the default cluster node tuning specification, run the following command:

oc get Tuned/default -o yaml -n openshift-cluster-node-tuning-operator

The output of the default specification is documented at Default profiles set on a cluster. The assignment of profiles is managed in the recommend section where profiles are applied to a node when certain conditions are met. When scheduling ElasticSearch to a node in STF, one of the following profiles is applied:

  • openshift-control-plane-es
  • openshift-node-es

When scheduling an ElasticSearch pod, there must be a label present that matches tuned.openshift.io/elasticsearch. If the label is present, one of the two profiles is assigned to the pod. No action is required by the administrator if you use the recommended Operator for ElasticSearch. If you use a custom-deployed ElasticSearch with STF, ensure that you add the tuned.openshift.io/elasticsearch label to all scheduled pods.

Additional resources

For more information about virtual memory usage by ElasticSearch, see https://www.elastic.co/guide/en/elasticsearch/reference/current/vm-max-map-count.html

For more information about how the profiles are applied to nodes, see Custom tuning specification.

2.3. Deploying STF to the OCP environment

You can deploy STF to the OCP environment in one of two ways:

2.3.1. Deploying STF to the OCP environment with ElasticSearch

Complete the following tasks:

2.3.2. Deploying STF to the OCP environment without ElasticSearch

Complete the following tasks:

2.3.3. Creating a namespace

Create a namespace to hold the STF components. The service-telemetry namespace is used throughout the documentation:

Procedure

  • Enter the following command:

    oc new-project service-telemetry

2.3.4. Creating an OperatorGroup

Create an OperatorGroup in the namespace so that you can schedule the Operator pods.

Procedure

  • Enter the following command:

    oc apply -f - <<EOF
    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
      name: service-telemetry-operator-group
      namespace: service-telemetry
    spec:
      targetNamespaces:
      - service-telemetry
    EOF

Additional resources

For more information, see OperatorGroups.

2.3.5. Enabling the OperatorHub.io Community Catalog Source

Before you install ElasticSearch, you must have access to the resources on the OperatorHub.io Community Catalog Source:

Procedure

  • Enter the following command:

    oc apply -f - <<EOF
    apiVersion: operators.coreos.com/v1alpha1
    kind: CatalogSource
    metadata:
      name: operatorhubio-operators
      namespace: openshift-marketplace
    spec:
      sourceType: grpc
      image: quay.io/operator-framework/upstream-community-operators:latest
      displayName: OperatorHub.io Operators
      publisher: OperatorHub.io
    EOF

2.3.6. Enabling Red Hat STF Operator Source

Before you deploy STF on Red Hat OpenShift Container Platform, you must enable the operator source.

Procedure

  1. Install an OperatorSource that contains the Service Telemetry Operator and the Smart Gateway Operator:

    oc apply -f - <<EOF
    apiVersion: operators.coreos.com/v1
    kind: OperatorSource
    metadata:
      labels:
        opsrc-provider: redhat-operators-stf
      name: redhat-operators-stf
      namespace: openshift-marketplace
    spec:
      authorizationToken: {}
      displayName: Red Hat STF Operators
      endpoint: https://quay.io/cnr
      publisher: Red Hat
      registryNamespace: redhat-operators-stf
      type: appregistry
    EOF
  2. To validate the creation of your OperatorSource, use the oc get operatorsources command. A successful import results in the MESSAGE field returning a result of The object has been successfully reconciled.

    $ oc get -nopenshift-marketplace operatorsource redhat-operators-stf
    
    NAME                   TYPE          ENDPOINT              REGISTRY               DISPLAYNAME             PUBLISHER   STATUS      MESSAGE
    redhat-operators-stf   appregistry   https://quay.io/cnr   redhat-operators-stf   Red Hat STF Operators   Red Hat     Succeeded   The object has been successfully reconciled
  3. To validate that the Operators are available from the catalog, use the oc get packagemanifest command:

    $ oc get packagemanifests | grep "Red Hat STF"
    
    smartgateway-operator                        Red Hat STF Operators      2m50s
    servicetelemetry-operator                    Red Hat STF Operators      2m50s

2.3.7. Subscribing to the AMQ Certificate Manager Operator

You must subscribe to the AMQ Certificate Manager Operator before you deploy the other STF components because the AMQ Certificate Manager Operator runs globally-scoped and is not compatible with the dependency management of Operator Lifecycle Manager when used with other namespace-scoped operators.

Procedure

  1. Subscribe to the AMQ Certificate Manager Operator, create the subscription, and validate the AMQ7 Certificate Manager:

    Note

    The AMQ Certificate Manager is installed globally for all namespaces, so the namespace value provided is openshift-operators. You might not see your amq7-cert-manager.v1.0.0 ClusterServiceVersion in the service-telemetry namespace for a few minutes until the processing executes against the namespace.

    oc apply -f - <<EOF
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: amq7-cert-manager
      namespace: openshift-operators
    spec:
      channel: alpha
      installPlanApproval: Automatic
      name: amq7-cert-manager
      source: redhat-operators
      sourceNamespace: openshift-marketplace
    EOF
  2. To validate your ClusterServiceVersion, use the oc get csv command. Ensure that amq7-cert-manager.v1.0.0 has a phase Succeeded.

    $ oc get --namespace openshift-operators csv
    
    NAME                       DISPLAY                                         VERSION   REPLACES   PHASE
    amq7-cert-manager.v1.0.0   Red Hat Integration - AMQ Certificate Manager   1.0.0                Succeeded

2.3.8. Subscribing to the Elastic Cloud on Kubernetes Operator

Before you install the Service Telemetry Operator and if you plan to store events in ElasticSearch, you must enable the Elastic Cloud Kubernetes Operator.

Procedure

  1. Apply the following manifest to your OCP environment to enable the Elastic Cloud on Kubernetes Operator:

    oc apply -f - <<EOF
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: elastic-cloud-eck
      namespace: service-telemetry
    spec:
      channel: stable
      installPlanApproval: Automatic
      name: elastic-cloud-eck
      source: operatorhubio-operators
      sourceNamespace: openshift-marketplace
    EOF
  2. To verify that the ClusterServiceVersion for ElasticSearch Cloud on Kubernetes succeeded, enter the oc get csv command:

    $ oc get csv
    
    NAME                       DISPLAY                                         VERSION   REPLACES                   PHASE
    elastic-cloud-eck.v1.1.0   Elastic Cloud on Kubernetes                     1.1.0     elastic-cloud-eck.v1.0.1   Succeeded

2.3.9. Subscribing to the Service Telemetry Operator

To instantiate an STF instance, create the ServiceTelemetry object to allow the Service Telemetry Operator to create the environment.

Procedure

  1. To create the Service Telemetry Operator subscription, enter the oc apply -f command:

    oc apply -f - <<EOF
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: servicetelemetry-operator
      namespace: service-telemetry
    spec:
      channel: stable
      installPlanApproval: Automatic
      name: servicetelemetry-operator
      source: redhat-operators-stf
      sourceNamespace: openshift-marketplace
    EOF
  2. To validate the Service Telemetry Operator and the dependent operators, enter the following command:

    $ oc get csv --namespace service-telemetry
    NAME                                DISPLAY                                         VERSION   REPLACES                            PHASE
    amq7-cert-manager.v1.0.0            Red Hat Integration - AMQ Certificate Manager   1.0.0                                         Succeeded
    amq7-interconnect-operator.v1.2.0   Red Hat Integration - AMQ Interconnect          1.2.0                                         Succeeded
    elastic-cloud-eck.v1.1.0            Elastic Cloud on Kubernetes                     1.1.0     elastic-cloud-eck.v1.0.1            Succeeded
    prometheusoperator.0.37.0           Prometheus Operator                             0.37.0    prometheusoperator.0.32.0           Succeeded
    service-telemetry-operator.v1.0.2   Service Telemetry Operator                      1.0.2     service-telemetry-operator.v1.0.1   Succeeded
    smart-gateway-operator.v1.0.1       Smart Gateway Operator                          1.0.1     smart-gateway-operator.v1.0.0       Succeeded

2.3.10. Creating a ServiceTelemetry object in OCP

To deploy the Service Telemetry Framework, you must create an instance of ServiceTelemetry in OCP. By default, eventsEnabled is set to false. If you do not want to store events in ElasticSearch, ensure that eventsEnabled is set to false. For more information, see Section 2.3.2, “Deploying STF to the OCP environment without ElasticSearch”.

The following core parameters are available for a ServiceTelemetry manifest:

Table 2.1. Core parameters for a ServiceTelemetry manifest

ParameterDescriptionDefault Value

eventsEnabled

Enable events support in STF. Requires prerequisite steps to ensure ElasticSearch can be started. For more information, see Section 2.3.8, “Subscribing to the Elastic Cloud on Kubernetes Operator”.

false

metricsEnabled

Enable metrics support in STF.

true

highAvailabilityEnabled

Enable high availability in STF. For more information, see Section 4.3, “High availability”.

false

storageEphemeralEnabled

Enable ephemeral storage support in STF. For more information, see Section 4.6, “Ephemeral storage”.

false

Procedure

  1. To store events in ElasticSearch, set eventsEnabled to true during deployment:

    oc apply -f - <<EOF
    apiVersion: infra.watch/v1alpha1
    kind: ServiceTelemetry
    metadata:
      name: stf-default
      namespace: service-telemetry
    spec:
      eventsEnabled: true
      metricsEnabled: true
    EOF
  2. To view the STF deployment logs in the Service Telemetry Operator, use the oc logs command:

    oc logs $(oc get pod --selector='name=service-telemetry-operator' -oname) -c ansible
    PLAY RECAP ***
    localhost                  : ok=37   changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0
  3. View the pods and the status of each pod to determine that all workloads are operating nominally:

    Note

    If you set eventsEnabled: true, the notification Smart Gateways will Error and CrashLoopBackOff for a period of time before ElasticSearch starts.

    $ oc get pods
    
    NAME                                                              READY   STATUS             RESTARTS   AGE
    alertmanager-stf-default-0                                        2/2     Running            0          26m
    elastic-operator-645dc8b8ff-jwnzt                                 1/1     Running            0          88m
    elasticsearch-es-default-0                                        1/1     Running            0          26m
    interconnect-operator-6fd49d9fb9-4bl92                            1/1     Running            0          46m
    prometheus-operator-bf7d97fb9-kwnlx                               1/1     Running            0          46m
    prometheus-stf-default-0                                          3/3     Running            0          26m
    service-telemetry-operator-54f4c99d9b-k7ll6                       2/2     Running            0          46m
    smart-gateway-operator-7ff58bcf94-66rvx                           2/2     Running            0          46m
    stf-default-ceilometer-notification-smartgateway-6675df547q4lbj   1/1     Running            0          26m
    stf-default-collectd-notification-smartgateway-698c87fbb7-xj528   1/1     Running            0          26m
    stf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn      1/1     Running            0          26m
    stf-default-interconnect-7458fd4d69-nqbfs                         1/1     Running            0          26m

2.4. Removing STF from the OCP environment

Remove STF from an OCP environment if you no longer require the STF functionality.

Complete the following tasks:

2.4.1. Deleting the namespace

To remove the operational resources for STF from OCP, delete the namespace.

Procedure

  1. Run the oc delete command:

    oc delete project service-telemetry
  2. Verify that the resources have been deleted from the namespace:

    $ oc get all
    No resources found.

2.4.2. Removing the OperatorSource

If you do not expect to install Service Telemetry Framework again, delete the OperatorSource. When you remove the OperatorSource, PackageManifests related to STF are removed from the Operator Lifecycle Manager catalog.

Procedure

  1. Delete the OperatorSource:

    $ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stf
    operatorsource.operators.coreos.com "redhat-operators-stf" deleted
  2. Verify that the STF PackageManifests are removed from the platform. If successful, the following command returns no result:

    $ oc get packagemanifests | grep "Red Hat STF"
  3. If you enabled the OperatorHub.io Community Catalog Source during the installation process and you no longer need this catalog source, delete it:

    $ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operators
    catalogsource.operators.coreos.com "operatorhubio-operators" deleted

Additional resources

For more information about the OperatorHub.io Community Catalog Source, see Section 2.3, “Deploying STF to the OCP environment”.