Chapter 3. Completing the Service Telemetry Framework configuration

3.1. Connecting Red Hat OpenStack Platform to Service Telemetry Framework

To collect metrics, events, or both, and to send them to the Service Telemetry Framework (STF) storage domain, you must configure the Red Hat OpenStack Platform overcloud to enable data collection and transport.

To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes that employ routed L3 domains, such as distributed compute node (DCN) or spine-leaf, see Section 3.2, “Deploying to non-standard network topologies”.

3.2. Deploying to non-standard network topologies

If your nodes are on a separate network from the default InternalApi network, you must make configuration adjustments so that AMQ Interconnect can transport data to the Service Telemetry Framework (STF) server instance. This scenario is typical in a spine-leaf or a DCN topology. For more information about DCN configuration, see the Spine Leaf Networking guide.

If you use STF with Red Hat OpenStack Platform 16.1 and plan to monitor your Ceph, Block, or Object storage nodes, you must make configuration changes that are similar to the configuration changes that you make to the spine-leaf and DCN network configuration. To monitor Ceph nodes, use the CephStorageExtraConfig parameter to define which network interface to load into the AMQ Interconnect and collectd configuration files.

  CephStorageExtraConfig:
      tripleo::profile::base::metrics::collectd::amqp_host: "%{hiera('storage')}"
      tripleo::profile::base::metrics::qdr::listener_addr: "%{hiera('storage')}"
      tripleo::profile::base::ceilometer::agent::notification::notifier_host_addr: "%{hiera('storage')}"

Similarly, you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters if your environment uses Block and Object storage roles.

The deployment of a spine-leaf topology involves creating roles and networks, then assigning those networks to the available roles. When you configure data collection and transport for STF for an Red Hat OpenStack Platform deployment, the default network for roles is InternalApi. For Ceph, Block and Object storage roles, the default network is Storage. Because a spine-leaf configuration can result in different networks being assigned to different Leaf groupings and those names are typically unique, additional configuration is required in the parameter_defaults section of the Red Hat OpenStack Platform environment files.

Procedure

  1. Document which networks are available for each of the Leaf roles. For examples of network name definitions, see Creating a network data file in the Spine Leaf Networking guide. For more information about the creation of the Leaf groupings (roles) and assignment of the networks to those groupings, see Creating a roles data file in the Spine Leaf Networking guide.
  2. Add the following configuration example to the ExtraConfig section for each of the leaf roles. In this example, internal_api_subnet is the value defined in the name_lower parameter of your network definition (with _subnet appended to the name for Leaf 0) , and is the network to which the ComputeLeaf0 leaf role is connected. In this case, the network identification of 0 corresponds to the Compute role for leaf 0, and represents a value that is different from the default internal API network name.

    For the ComputeLeaf0 leaf role, specify extra configuration to perform a hiera lookup to determine which network interface for a particular network to assign to the collectd AMQP host parameter. Perform the same configuration for the AMQ Interconnect listener address parameter.

    ComputeLeaf0ExtraConfig:
    ›   tripleo::profile::base::metrics::collectd::amqp_host: "%{hiera('internal_api_subnet')}"
    ›   tripleo::profile::base::metrics::qdr::listener_addr: "%{hiera('internal_api_subnet')}"

    Additional leaf roles typically replace _subnet with _leafN where N represents a unique indentifier for the leaf.

    ComputeLeaf1ExtraConfig:
    ›   tripleo::profile::base::metrics::collectd::amqp_host: "%{hiera('internal_api_leaf1')}"
    ›   tripleo::profile::base::metrics::qdr::listener_addr: "%{hiera('internal_api_leaf1')}"

    This example configuration is on a CephStorage leaf role:

    CephStorageLeaf0ExtraConfig:
    ›   tripleo::profile::base::metrics::collectd::amqp_host: "%{hiera('storage_subnet')}"
    ›   tripleo::profile::base::metrics::qdr::listener_addr: "%{hiera('storage_subnet')}"

3.3. Configuring Red Hat OpenStack Platform overcloud for Service Telemetry Framework

To configure the Red Hat OpenStack Platform overcloud, you must configure the data collection applications and the data transport to STF, and deploy the overcloud.

To configure the Red Hat OpenStack Platform overcloud, complete the following tasks:

Additional resources

  • To collect data through AMQ Interconnect, see The amqp1 plug-in in the Monitoring Tools Configuration guide.

3.3.1. Retrieving the AMQ Interconnect route address

When you configure the Red Hat OpenStack Platform overcloud for STF, you must provide the AMQ Interconnect route address in the STF connection file.

Procedure

  1. Log in to your Red Hat OpenShift Container Platform (OCP) environment.
  2. In the service-telemetry project, retrieve the AMQ Interconnect route address:

    $ oc get routes -ogo-template='{{ range .items }}{{printf "%s\n" .spec.host }}{{ end }}' | grep "\-5671"
    default-interconnect-5671-service-telemetry.apps.infra.watch
    Note

    If your STF installation differs from the documentation, ensure that you retrieve the correct AMQ Interconnect route address.

3.3.2. Configuring the STF connection for the overcloud

To configure the STF connection, you must create a file that contains the connection configuration of the AMQ Interconnect for the overcloud to the STF deployment. Enable the collection of events and storage of the events in STF and deploy the overcloud.

Procedure

  1. Log in to the Red Hat OpenStack Platform undercloud as the stack user.
  2. Create a configuration file called stf-connectors.yaml in the /home/stack directory.

    Important

    The Service Telemetry Operator simplifies the deployment of all data ingestion and data storage components for single cloud deployments. To share the data storage domain with multiple clouds, see Section 4.6, “Multiple cloud configuration”.

    Additionally, setting EventPipelinePublishers and MetricPipelinePublishers to empty lists results in no metric or event data passing to Red Hat OpenStack Platform legacy telemetry components, such as Gnocchi or Panko. If you need to send data to additional pipelines, the Ceilometer polling interval of 5 seconds as specified in ExtraConfig might overwhelm the legacy components. If you configure a longer polling interval, you must also modify STF to avoid stale metrics, resulting in what appears to be missing data in Prometheus.

    If an adjustment needs to be made to the polling interval, then modify the ServiceTelemetry object backends.metrics.prometheus.scrapeInterval parameter from the default value of 10s to double the polling interval of the data collectors. For example, if CollectdAmqpInterval and ceilometer::agent::polling::polling_interval are adjusted to 30 then set the backends.metrics.prometheus.scrapeInterval to a value of 60s.

  3. In the stf-connectors.yaml file, configure the MetricsQdrConnectors address to connect the AMQ Interconnect on the overcloud to the STF deployment.

    • Add the CeilometerQdrPublishMetrics: true parameter to enable collection and transport of Ceilometer metrics to STF.
    • Add the CeilometerQdrPublishEvents: true parameter to enable collection and transport of Ceilometer events to STF.
    • Add the EventPiplinePublishers: [] and MetricPipelinePublishers: [] to avoid writing data to Gnocchi and Panko.
    • Add the ManagePolling: true and ManagePipeline: true parameters to allow full control of Ceilometer polling and pipeline configuration.
    • Add the ExtraConfig parameter ceilometer::agent::polling::polling_interval to set the polling interval of Ceilometer to be compatible with the default STF scrape interval.
    • Replace the host parameter with the value of HOST/PORT that you retrieved in Section 3.3.1, “Retrieving the AMQ Interconnect route address”:

      parameter_defaults:
          EventPipelinePublishers: []
          MetricPipelinePublishers: []
          CeilometerQdrPublishEvents: true
          CeilometerQdrPublishMetrics: true
          MetricsQdrConnectors:
          - host: default-interconnect-5671-service-telemetry.apps.infra.watch
            port: 443
            role: edge
            sslProfile: sslProfile
            verifyHostname: false
          ExtraConfig:
            ceilometer::agent::polling::polling_interval: 5
  4. Add the following files to your Red Hat OpenStack Platform director deployment to setup collectd and AMQ Interconnect:

    • the stf-connectors.yaml environment file
    • the enable-stf.yaml file that ensures that the environment is being used during the overcloud deployment
    • the ceilometer-write-qdr.yaml file that ensures that Ceilometer telemetry is sent to STF

      openstack overcloud deploy <other arguments>
        --templates /usr/share/openstack-tripleo-heat-templates \
        --environment-file <...other-environment-files...> \
        --environment-file /usr/share/openstack-tripleo-heat-templates/environments/metrics/ceilometer-write-qdr.yaml \
        --environment-file /usr/share/openstack-tripleo-heat-templates/environments/enable-stf.yaml \
        --environment-file /home/stack/stf-connectors.yaml
  5. Deploy the Red Hat OpenStack Platform overcloud.

3.3.3. Validating client-side installation

To validate data collection from the STF storage domain, query the data sources for delivered data. To validate individual nodes in the Red Hat OpenStack Platform deployment, connect to the console using SSH.

Tip

Some telemetry data is only available when Red Hat OpenStack Platform has active workloads.

Procedure

  1. Log in to an overcloud node, for example, controller-0.
  2. Ensure that metrics_qdr container is running on the node:

    $ sudo podman container inspect --format '{{.State.Status}}' metrics_qdr
    
    running
  3. Return the internal network address on which AMQ Interconnect is running, for example, 172.17.1.44 listening on port 5666:

    $ sudo podman exec -it metrics_qdr cat /etc/qpid-dispatch/qdrouterd.conf
    
    listener {
        host: 172.17.1.44
        port: 5666
        authenticatePeer: no
        saslMechanisms: ANONYMOUS
    }
  4. Return a list of connections to the local AMQ Interconnect:

    $ sudo podman exec -it metrics_qdr qdstat --bus=172.17.1.44:5666 --connections
    
    Connections
      id   host                                                                  container                                                                                                  role    dir  security                            authentication  tenant
      ============================================================================================================================================================================================================================================================================================
      1    default-interconnect-5671-service-telemetry.apps.infra.watch:443      default-interconnect-7458fd4d69-bgzfb                                                                      edge    out  TLSv1.2(DHE-RSA-AES256-GCM-SHA384)  anonymous-user
      12   172.17.1.44:60290                                                     openstack.org/om/container/controller-0/ceilometer-agent-notification/25/5c02cee550f143ec9ea030db5cccba14  normal  in   no-security                         no-auth
      16   172.17.1.44:36408                                                     metrics                                                                                                    normal  in   no-security                         anonymous-user
      899  172.17.1.44:39500                                                     10a2e99d-1b8a-4329-b48c-4335e5f75c84                                                                       normal  in   no-security                         no-auth

    There are four connections:

    • Outbound connection to STF
    • Inbound connection from ceilometer
    • Inbound connection from collectd
    • Inbound connection from our qdstat client

      The outbound STF connection is provided to the MetricsQdrConnectors host parameter and is the route for the STF storage domain. The other hosts are internal network addresses of the client connections to this AMQ Interconnect.

  5. To ensure that messages are being delivered, list the links, and view the _edge address in the deliv column for delivery of messages:

    $ sudo podman exec -it metrics_qdr qdstat --bus=172.17.1.44:5666 --links
    Router Links
      type      dir  conn id  id    peer  class   addr                  phs  cap  pri  undel  unsett  deliv    presett  psdrop  acc  rej  rel     mod  delay  rate
      ===========================================================================================================================================================
      endpoint  out  1        5           local   _edge                      250  0    0      0       2979926  0        0       0    0    2979926 0    0      0
      endpoint  in   1        6                                              250  0    0      0       0        0        0       0    0    0       0    0      0
      endpoint  in   1        7                                              250  0    0      0       0        0        0       0    0    0       0    0      0
      endpoint  out  1        8                                              250  0    0      0       0        0        0       0    0    0       0    0      0
      endpoint  in   1        9                                              250  0    0      0       0        0        0       0    0    0       0    0      0
      endpoint  out  1        10                                             250  0    0      0       911      911      0       0    0    0       0    911    0
      endpoint  in   1        11                                             250  0    0      0       0        911      0       0    0    0       0    0      0
      endpoint  out  12       32          local   temp.lSY6Mcicol4J2Kp       250  0    0      0       0        0        0       0    0    0       0    0      0
      endpoint  in   16       41                                             250  0    0      0       2979924  0        0       0    0    2979924 0    0      0
      endpoint  in   912      1834        mobile  $management           0    250  0    0      0       1        0        0       1    0    0       0    0      0
      endpoint  out  912      1835        local   temp.9Ok2resI9tmt+CT       250  0    0      0       0        0        0       0    0    0       0    0      0
  6. To list the addresses from Red Hat OpenStack Platform nodes to STF, connect to OCP to get the AMQ Interconnect pod name and list the connections. List the available AMQ Interconnect pods:

    $ oc get pods -l application=default-interconnect
    
    NAME                                    READY   STATUS    RESTARTS   AGE
    default-interconnect-7458fd4d69-bgzfb   1/1     Running   0          6d21h
  7. Connect to the pod and run the qdstat --connections command to list the known connections:

    $ oc exec -it default-interconnect-7458fd4d69-bgzfb -- qdstat --connections
    
    2020-04-21 18:25:47.243852 UTC
    default-interconnect-7458fd4d69-bgzfb
    
    Connections
      id  host               container                                                      role    dir  security                                authentication  tenant  last dlv      uptime
      ===============================================================================================================================================================================================
      5   10.129.0.110:48498  bridge-3f5                                                    edge    in   no-security                             anonymous-user          000:00:00:02  000:17:36:29
      6   10.129.0.111:43254  rcv[default-cloud1-ceil-meter-smartgateway-58f885c76d-xmxwn]  edge    in   no-security                             anonymous-user          000:00:00:02  000:17:36:20
      7   10.130.0.109:50518  rcv[default-cloud1-coll-event-smartgateway-58fbbd4485-rl9bd]  normal  in   no-security                             anonymous-user          -             000:17:36:11
      8   10.130.0.110:33802  rcv[default-cloud1-ceil-event-smartgateway-6cfb65478c-g5q82]  normal  in   no-security                             anonymous-user          000:01:26:18  000:17:36:05
      22  10.128.0.1:51948   Router.ceph-0.redhat.local                                     edge    in   TLSv1/SSLv3(DHE-RSA-AES256-GCM-SHA384)  anonymous-user          000:00:00:03  000:22:08:43
      23  10.128.0.1:51950   Router.compute-0.redhat.local                                  edge    in   TLSv1/SSLv3(DHE-RSA-AES256-GCM-SHA384)  anonymous-user          000:00:00:03  000:22:08:43
      24  10.128.0.1:52082   Router.controller-0.redhat.local                               edge    in   TLSv1/SSLv3(DHE-RSA-AES256-GCM-SHA384)  anonymous-user          000:00:00:00  000:22:08:34
      27  127.0.0.1:42202    c2f541c1-4c97-4b37-a189-a396c08fb079                           normal  in   no-security                             no-auth                 000:00:00:00  000:00:00:00

    In this example, there are three edge connections from the Red Hat OpenStack Platform nodes with connection id 22, 23, and 24.

  8. To view the number of messages delivered by the network, use each address with the oc exec command:

    $ oc exec -it default-interconnect-7458fd4d69-bgzfb -- qdstat --address
    
    2020-04-21 18:20:10.293258 UTC
    default-interconnect-7458fd4d69-bgzfb
    
    Router Addresses
      class   addr                                phs  distrib    pri  local  remote  in           out          thru  fallback
      ==========================================================================================================================
      mobile  anycast/ceilometer/event.sample     0    balanced   -    1      0       970          970          0     0
      mobile  anycast/ceilometer/metering.sample  0    balanced   -    1      0       2,344,833    2,344,833    0     0
      mobile  collectd/notify                     0    multicast  -    1      0       70           70           0     0
      mobile  collectd/telemetry                  0    multicast  -    1      0       216,128,890  216,128,890  0     0