Chapter 6. Upgrading Service Telemetry Framework to version 1.3

To migrate from Service Telemetry Framework (STF) 1.2 to STF 1.3, you must replace the ClusterServiceVersion and Subscription objects in the service-telemetry namespace on your Red Hat OpenShift Container Platform environment.

Prerequisites

  • You have upgraded your Red Hat OpenShift Container Platform environment to 4.7. STF 1.3 does not run on Red Hat OpenShift Container Platform 4.5 and lower.STF 1.2 does not run on Red Hat OpenShift Container Platform 4.7 and higher.
  • You have backed up your data before any upgrade of the environment. When you upgrade STF 1.2 to 1.3, there is a brief outage while the Smart Gateways are upgraded. Additionally, changes to the ServiceTelemetry and SmartGateway objects do not have any effect while the Operators are being replaced.

To upgrade from STF 1.2 to 1.3, complete the following procedures:

6.1. Removing Service Telemetry Framework 1.2 Operators

Remove the Operators from STF 1.2, Smart Gateway Operator, and Service Telemetry Operator.

Warning

You must temporarily remove the clouds parameters because of changes in the API interface. This results in the removal of all Smart Gateways until the upgrade is complete and the inability to deliver metrics and events during the upgrade.

Procedure

  1. Retrieve the current ServiceTelemetry object and note the contents, in particular the clouds parameter because you must remove this parameter before you upgrade the Operators.

    $ oc get stf default -oyaml
  2. Modify the ServiceTelemetry object to clear the clouds parameter and set it to an empty list. Set cloudsRemoveOnMissing to true to remove all Smart Gateways.

    Warning

    This command stops all monitoring functions until after the upgrade is completed and the clouds object is redefined. If you use the default clouds configuration, it is not defined in your ServiceTelemetry object.

    $ oc patch stf default --patch $'spec:\n  clouds: []\n  cloudsRemoveOnMissing: true' --type=merge
  3. Monitor the Smart Gateway pods until they are fully terminated and removed:

    $ oc get pods --selector app=smart-gateway --watch
    
    NAME                                                      READY   STATUS    RESTARTS   AGE
    default-cloud1-ceil-meter-smartgateway-58cc854f4-hgk92    1/1     Running   0          2m42s
    default-cloud1-coll-meter-smartgateway-6c76f9786d-crn9b   2/2     Running   0          2m55s
    default-cloud1-coll-meter-smartgateway-6c76f9786d-crn9b   2/2     Terminating   0          3m12s
    default-cloud1-ceil-meter-smartgateway-58cc854f4-hgk92    1/1     Terminating   0          3m
    ...
  4. Retrieve the Subscription name of the Smart Gateway Operator:

    $ oc get sub smart-gateway-operator-stable-1.2-redhat-operators-openshift-marketplace
    
    NAME                                                                       PACKAGE                  SOURCE             CHANNEL
    smart-gateway-operator-stable-1.2-redhat-operators-openshift-marketplace   smart-gateway-operator   redhat-operators   stable-1.2
  5. Delete the Smart Gateway Operator subscription:

    $ oc delete sub smart-gateway-operator-stable-1.2-redhat-operators-openshift-marketplace
    
    subscription.operators.coreos.com "smart-gateway-operator-stable-1.2-redhat-operators-openshift-marketplace" deleted
  6. Retrieve the Smart Gateway Operator ClusterServiceVersion:

    $ oc get csv -o name | grep -E 'smart-gateway'
    
    clusterserviceversion.operators.coreos.com/smart-gateway-operator.v2.2.1623675667
  7. Delete the Smart Gateway Operator ClusterServiceVersion:

    $ oc delete clusterserviceversion.operators.coreos.com/smart-gateway-operator.v2.2.1623675667
    
    clusterserviceversion.operators.coreos.com "smart-gateway-operator.v2.2.1623675667" deleted
  8. Delete the SmartGateway Custom Resource Definition:

    $ oc delete crd smartgateways.smartgateway.infra.watch
    
    customresourcedefinition.apiextensions.k8s.io "smartgateways.smartgateway.infra.watch" deleted
  9. Patch the Service Telemetry Operator Subscription to use the stable-1.3 channel:

    $ oc patch sub service-telemetry-operator --patch $'spec:\n  channel: stable-1.3' --type=merge
    
    subscription.operators.coreos.com/service-telemetry-operator patched
  10. Monitor the output of the oc get csv command until the Smart Gateway Operator is installed and Service Telemetry Operator is Pending for version 1.2 and 1.3:

    $ oc get csv
    
    NAME                                         DISPLAY                                         VERSION          REPLACES                                     PHASE
    amq7-cert-manager.v1.0.0                     Red Hat Integration - AMQ Certificate Manager   1.0.0                                                         Succeeded
    amq7-interconnect-operator.v1.2.4            Red Hat Integration - AMQ Interconnect          1.2.4            amq7-interconnect-operator.v1.2.3            Succeeded
    elastic-cloud-eck.v1.6.0                     Elasticsearch (ECK) Operator                    1.6.0            elastic-cloud-eck.v1.5.0                     Succeeded
    prometheusoperator.0.47.0                    Prometheus Operator                             0.47.0           prometheusoperator.0.37.0                    Succeeded
    service-telemetry-operator.v1.2.1623675667   Service Telemetry Operator                      1.2.1623675667                                                Pending
    service-telemetry-operator.v1.3.1622734200   Service Telemetry Operator                      1.3.1622734200   service-telemetry-operator.v1.2.1623675667   Pending
    smart-gateway-operator.v3.0.1622734308       Smart Gateway Operator                          3.0.1622734308                                                Succeeded
  11. Delete the Service Telemetry Operator v1.2 ClusterServiceVersion:

    $ oc delete csv service-telemetry-operator.v1.2.1623675667
    
    clusterserviceversion.operators.coreos.com "service-telemetry-operator.v1.2.1623675667" deleted
  12. Edit the ServiceTelemetry object and insert the contents of your previously noted clouds parameter. If the clouds parameter was not previously defined because you used the default Smart Gateway instances, remove the clouds: [] parameter.

    $ oc edit stf default
  13. Validate that the Smart Gateways are restored:

    $ oc get pods --selector app=smart-gateway
    
    NAME                                                      READY   STATUS    RESTARTS   AGE
    default-cloud1-ceil-meter-smartgateway-6484b98b68-sl7mb   2/2     Running   0          5m56s
    default-cloud1-coll-meter-smartgateway-799f687658-nfzr6   2/2     Running   0          6m6s

6.2. Subscribing to the Service Telemetry Operator

You must subscribe to the Service Telemetry Operator, which manages the STF instances.

Procedure

  1. Create the Service Telemetry Operator subscription:

    $ oc create -f - <<EOF
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: service-telemetry-operator
      namespace: service-telemetry
    spec:
      channel: stable-1.3
      installPlanApproval: Automatic
      name: service-telemetry-operator
      source: redhat-operators
      sourceNamespace: openshift-marketplace
    EOF
  2. Validate the Service Telemetry Operator and the dependent operators:

    $ oc get csv --namespace service-telemetry
    
    NAME                                         DISPLAY                                         VERSION          REPLACES                            PHASE
    amq7-cert-manager.v1.0.0                     Red Hat Integration - AMQ Certificate Manager   1.0.0                                                Succeeded
    amq7-interconnect-operator.v1.2.3            Red Hat Integration - AMQ Interconnect          1.2.3            amq7-interconnect-operator.v1.2.2   Succeeded
    elastic-cloud-eck.v1.6.0                     Elasticsearch (ECK) Operator                    1.6.0            elastic-cloud-eck.v1.5.0            Succeeded
    prometheusoperator.0.47.0                    Prometheus Operator                             0.47.0           prometheusoperator.0.37.0           Succeeded
    service-telemetry-operator.v1.3.1622734200   Service Telemetry Operator                      1.3.1622734200                                       Succeeded
    smart-gateway-operator.v3.0.1622734308       Smart Gateway Operator                          3.0.1622734308                                       Succeeded

When the new Operators start, they reconcile the existing ServiceTelemetry and SmartGateway objects, which restarts the Smart Gateway containers.

  • Check the state of the Smart Gateway containers:

    oc get pods
    
    NAME                                                      READY   STATUS        RESTARTS   AGE
    ...
    default-cloud1-ceil-meter-smartgateway-5849c4cdb5-xgl42   1/1     Running       0          35s
    default-cloud1-coll-meter-smartgateway-749674f75c-k7pm7   2/2     Terminating   0          56m
    default-cloud1-coll-meter-smartgateway-868476456b-ksh9b   2/2     Running       0          26s
    ...