Service Telemetry Framework Release Notes 1.3

Red Hat OpenStack Platform 16.2

Release details for Service Telemetry Framework 1.3

OpenStack Documentation Team

Red Hat Customer Content Services

Abstract

This document outlines the major features, enhancements, and known issues in this release of Service Telemetry Framework.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. We are beginning with these four terms: master, slave, blacklist, and whitelist. Because of the enormity of this endeavor, these changes will be implemented gradually over several upcoming releases. For more details, see our CTO Chris Wright’s message.

Chapter 1. Introduction to Service Telemetry Framework release

This release of Service Telemetry Framework (STF) provides new features and resolved issues specific to STF.

STF uses components from other Red Hat products. For specific information pertaining to the support of these components, see https://access.redhat.com/site/support/policy/updates/openstack/platform/ and https://access.redhat.com/support/policy/updates/openshift/.

STF 1.3 is compatible with OpenShift Container Platform (OCP) version 4.6 as the deployment platform.

1.1. Product support

The Red Hat Customer Portal offers resources to guide you through the installation and configuration of Service Telemetry Framework. The following types of documentation are available through the Customer Portal:

  • Product documentation
  • Knowledge base articles and solutions
  • Technical briefs
  • Support case management

    You can access the Customer Portal at https://access.redhat.com/.

Chapter 2. Top new features

The following features are new to Service Telemetry Framework (STF):

Smart Gateway Operator interface
The use of legacy Smart Gateway has been dropped from future versions of STF and a new pluggable architecture has been implemented in the sg-core application. As an administrator, you can use the Smart Gateway Operator to make better use of the sg-core through a more flexible API interface.

Chapter 3. Service Telemetry Framework release information

Notes for updates released during the supported lifecycle of this Service Telemetry Framework (STF) release appear in the advisory text associated with each update.

3.1. Service Telemetry Framework 1.3

These release notes highlight technology preview items, recommended practices, known issues, and deprecated functionality to be taken into consideration when you install this release of Service Telemetry Framework (STF).

Note

Service Telemetry Framework version 1.1 support ended on June 15, 2021.

This release includes the following advisories:

RHEA-2021:2424-01
Release of components for Service Telemetry Framework - RPMs
RHEA-2021:2425-02
Release of components for Service Telemetry Framework - Container Images
RHBA-2021:2478-02
Release of components for Service Telemetry Framework - Container Images
RHBA-2021:2477-02
Release of common components for Service Telemetry Framework - Container Images
RHBA-2021:2442
Service Telemetry Framework version 1.1 support ended on June 15, 2021

3.1.1. Enhancements

This release of STF features the following enhancements:

BZ#1959594
With this update, the Smart Gateway Operator interface can support additional functionality in sg-core. As an administrator, you can use the Smart Gateway Operator to make better use of the sg-core through a more flexible API interface.

3.1.2. Release notes

This section outlines important details about the release, including recommended practices and notable changes to STF. You must take this information into account to ensure the best possible outcomes for your installation.

BZ#1960025

STF 1.3 does not support the infra.watch/v1alpha1 Custom Resource Definition and now supports infra.watch/v1beta1.

In STF 1.2, the infra.watch/v1alpha1 interface was deprecated and the Service Telemetry Operator supported a translation to infra.watch/v1beta1 dynamically. As of STF 1.3, this support has been removed and only infra.watch/v1beta1 is supported. Ensure that you migrate to infra.watch/v1beta1 before you upgrade from STF 1.2 to STF 1.3.

BZ#1952188

Ceilometer metrics are distributed internally within Red Hat OpenStack Platform (RHOSP) via the RabbitMQ bus, collected via the ceilometer agents, and transported to STF for storage in Prometheus via sg-core.

Before this release, if you set up the RHOSP environment in high-availability mode, each controller collected and sent metrics with a publisher label containing the controller name. As a result, ceilometer metrics that looked broken were written to Prometheus.

This update drops the publisher label on ceilometer metrics to collapse the ceilometer metrics to a single set of labels. As a result, metrics from ceilometer no longer appear to be broken up across multiple publishers.

Previous queries that relied on the publisher label might not work. You can override the default ServiceMonitor object with the servicemonitorManifest parameter in the ServiceTelemetry object.

BZ#1954722
You need the caCertFile parameter in RHOSP13 to allow connection from RHOSP to STF. To configure RHOSP13 to support the caCertFile parameter in THT environment files, see Configuring Red Hat OpenStack Platform overcloud for Service Telemetry Framework in the Service Telemetry Framework 1.3 guide.

3.1.3. Deprecated functionality

These features have been deprecated:

BZ#1965464
With this release, delivery of alerts through SNMP using prometheus-webhook-snmp is deprecated.

3.1.4. Removed Functionality

The following functionality has been removed from this release of STF:

BZ#1983662
Previously, the use of EnableSTF was part of the OpenStack configuration for STF. In this release, configuration of STF is now done through the base configuration, and use of EnableSTF has been removed. For more information about the base configuration, see https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/service_telemetry_framework_1.3/index#creating-the-base-configuration-for-stf_assembly-completing-the-stf-configuration

3.2. Service Telemetry Framework 1.3.1 Maintenance Release - July 19, 2021

These release notes highlight bug fixes and enhancements to be taken into consideration when you install this release of Service Telemetry Framework (STF).

This release includes the following advisories:

RHBA-2021:2771
Release of components for Service Telemetry Framework v1.3.1

3.2.1. Bug fixes

These bugs were fixed in this release of STF:

BZ#1979637

Before this update, Ceilometer metrics exposed by sg-core resulted in virtual machine instances having their ID exposed as a label value host. As a result, using the label host overloaded the drop-down menu in the dashboards with virtual machine instances in addition to the node instances.

With this update, Ceilometer virtual machine instance metrics use the vm_instance label to expose the instance ID of a virtual machine so that virtual machine instance IDs are not listed in the STF dashboard node instance drop-down menu.

BZ#1976981
Before this update, port 5672 was not enabled for AMQ Interconnect when deploying an Interconnect cluster with Service Telemetry Framework (STF) 1.3. As a result, administrators were not able to query with qdstat for connections to validate their deployment. With this update, port 5672 was added to the list of listeners in the Interconnect object that is managed by the AMQ Interconnect Operator. Administrators can now use qdstat to validate and debug AMQ Interconnect.
BZ#1979378

Before this release, documentation referred to clouds: {} to provide an empty object to result in no Smart Gateways being deployed. As a result, Smart Gateways did not clear and the following error was seen in the logs of the Service Telemetry Operator:

"Invalid data passed to loop, it requires a list, got this instead: {}. Hint: If you passed a list/dict of just one element, try adding wantlist=True to your lookup invocation or use q/query instead of lookup."

Documentation now states that clouds: [] is the correct format, resulting in an empty list being passed rather than an empty object. As a result, no Smart Gateways are defined.

3.2.2. Enhancements

This release of STF features the following enhancements:

BZ#1975792
With this update, you can now install Service Telemetry Framework (STF) 1.3 on Red Hat OpenShift Container Platform (OCP) 4.6 and 4.7.

3.2.3. Release notes

This section outlines important details about the release, including recommended practices and notable changes to Service Telemetry Framework (STF). You must take this information into account to ensure the best possible outcomes for your deployment.

BZ#1940181
The dashboards for STF 1.3 have been reworked to be synchronized with the data provided by the new base configuration for OSP. The location of these are different from that of STF 1.2 and are noted in the documentation. The dashboards for Cloud View and Infrastructure View are designed for a single cloud environment.

3.3. Service Telemetry Framework 1.3.2 Maintenance Release - October 5, 2021

These release notes highlight bug fixes and enhancements to be taken into consideration when you install this release of Service Telemetry Framework (STF).

This release includes the following advisory:

RHBA-2021:3721
Release of components for Service Telemetry Framework 1.3.2 - Container Images

3.3.1. Bug fixes

These bugs were fixed in this release of STF:

BZ#1979637
The documentation provides a procedure in which you can verify that the version of Grafana deployed by the Grafana Operator is compatible with the latest dashboard updates. Administrators can now use the new graphing.grafana.baseImage parameter to run Grafana 8.1.0 or later, which is required by the latest example dashboards.
BZ#2008338
The configuration overview of the clouds parameter for the ServiceTelemetry manifest was previously missing the configuration example for Sensubility, resulting in the corresponding Smart Gateway not being deployed. The documentation has been updated to reflect the configuration for Sensubility support.

3.3.2. Enhancements

This release of STF features the following enhancements:

BZ#1979642
The dashboards referred to in the documentation are now compatible with multiple clouds, making visualization of individual clouds easier for administrators.
BZ#1958934
The virtual machine view dashboard has been updated to work with STF 1.3. Previously, the vm-view.json file in the infrawatch/dashboard repository did not work with STF 1.3. It was not possible to install the dashboard as a GrafanaDashboard object for management by the Grafana Operator.

3.3.3. Release notes

This section outlines important details about the release, including recommended practices and notable changes to Service Telemetry Framework (STF). You must take this information into account to ensure the best possible outcomes for your deployment.

BZ#1989660
The latest release of STF 1.3 has been verified to work with RHOSP 16.2.1.

3.3.4. Deprecated functionality

The items in this section are either no longer supported or will no longer be supported in a future release:

BZ#2002711
Use of the Elastic Cloud on Kubernetes (ECK) Operator was previously installed from the OperatorHub.io CatalogSource. In Service Telemetry Framework v1.3.2 the documentation was updated to use the ECK Operator from the Certified Operators CatalogSource.
BZ#2002714

Use of the redhat-operators-stf CatalogSource has been removed from the documentation of Service Telemetry Framework v1.3. It was used to install a copy of the AMQ Certificate Manager Operator as a workaround to the unavailable Operator in OpenShift Container Platform versions beyond v4.5.

The AMQ Certificate Operator is again available in the redhat-operators CatalogSource from OpenShift Container Platform v4.7, meaning the use of the additional CatalogSource is no longer necessary.

To migrate to the built in AMQ Certificate Manager v1.0.1, complete the following steps:

  1. Uninstall the existing AMQ Certificate Manager provided by the redhat-operators-stf CatalogSource.
  2. Subscribe to the new AMQ Certificate Manager provided by the redhat-operators CatalogSource that is documented in Service Telemetry Framework v1.3.
  3. Remove the redhat-operators-stf CatalogSource.

3.3.5. Removed functionality

The following functionality has been removed from this release of STF:

BZ#2004142
Documentation about the Node Tuning Operator has been removed from the documentation as there are no action items required by an administrator. OpenShift Container Platform already correctly deals with node tuning when scheduling ElasticSearch as defined by Service Telemetry Operator.

3.4. Service Telemetry Framework 1.3.3 Maintenance Release - November 10, 2021

These release notes highlight bug fixes and enhancements to be taken into consideration when you install this release of Service Telemetry Framework (STF).

This release includes the following advisory:

RHBA-2021:4582
Release of components for Service Telemetry Framework 1.3.3 - Container Images

3.4.1. Bug fixes

These bugs were fixed in this release of STF:

BZ#2011603
With this update, the servicetelemetrys.infra.watch CRD has a validation that limits clouds[].name to 10 characters and alphanumeric to avoid issues with extra characters in the cloud name and names being too long.
BZ#1959166

Previously, when you installed STF without having Elastic Cloud on Kubernetes (ECK) Operator installed, the following error message was returned: "Failed to find exact match for elasticsearch.k8s.elastic.co/v1beta1.Elasticsearch". The error was as a result of Service Telemetry Operator trying to look up information from a non-existent API interface.

With this update, the Service Telemetry Operator verifies that the API exists before it attempts to make requests to the API interface that is provided by ECK.

BZ#1875854

Before this update, the query for high CPU usage in the alerts.yaml was invalid. Additionally, the rhos-dashboard showed pending alerts in the global alerts panels, resulting in false positives, meaning that high CPU warning and critical alarms could trigger falsely or not trigger when high CPU usage was detected.

With this release, the high CPU alert query uses the new recording rules by calculating the sum of CPU usage grouped by host and dividing those values by the number of cores for that host, resulting in a CPU usage between 0 and 100%.

BZ#2011145
Commands used to verify creation of Alertmanager rules made use of oc CLI parameters that are no longer valid. The documentation has been updated to no longer use the deprecated parameters.

3.4.2. Release notes

This section outlines important details about the release, including recommended practices and notable changes to Service Telemetry Framework (STF). You must take this information into account to ensure the best possible outcomes for your deployment.

BZ#2013268
STF 1.3.3 now supports OpenShift Container Platform 4.8 as an installation platform.
BZ#2023763
In this release, there is additional documentation on the method used to define complicated alertmanager route configurations by using a base64-encoded configuration. With some alertmanager.yaml configurations, the Ansible Operator attempts to parse the configuration inline, potentially resulting in the Operator failing.

3.5. Service Telemetry Framework 1.3.4 Maintenance Release - February 22, 2022

These release notes highlight bug fixes and enhancements to be taken into consideration when you install this release of Service Telemetry Framework (STF).

3.5.1. Bug fixes

These bugs were fixed in this release of STF:

BZ#2016460
In some cases, Ceilometer metrics were not handled properly by sg-core. This resulted in some Ceilometer metrics not being stored in Prometheus. In this release, the processing of metrics has been enhanced to be more robust. While the sg-core has been enhanced to support larger messages from Ceilometer, an additional change is required to support passing the larger messages through the sg-bridge ring buffer. The changes required to fully support this functionality are being tracked in RHBZ#2053683.
BZ#2047932
Previously, there was an invalid configuration in the example Custom Resource and the x-descriptor of the ClusterServiceVersion resource. Installation of a ServiceTelemetry manifest from the OpenShift UI resulted in an invalid CustomResource manifest, which could cause issues upgrading to STF 1.4. In this update, the CustomResource example and the ClusterServiceVersion x-descriptors for the alertmanager parameters were fixed. Now, when you deploy STF 1.3 with a default UI configuration, STF successfully upgrades from 1.3 to 1.4.
BZ#2046538
The release of Service Telemetry Framework (STF) 1.4 results in an improper Smart Gateway Operator channel being used for dependency installation. As a result, new STF 1.3 installations use the wrong Smart Gateway Operator channel and the wrong Smart Gateway Operator version. With this update, the documentation includes a step to use the appropriate channel in the Smart Gateway Operator subscription.

3.5.2. Enhancements

This release of STF features the following enhancements:

BZ#2032661
The STF 1.3.4 release adds the backends.events.elasticsearch.version parameter to the ServiceTelemetry manifest so that the Service Telemetry Framework can request an Elasticsearch version for installation to the Elastic Cloud on Kubernetes (ECK) Operator.