Chapter 13. Managing AMQ Streams

This chapter covers tasks to maintain a deployment of AMQ Streams.

13.1. Discovering services using labels and annotations

Service discovery makes it easier for client applications running in the same OpenShift cluster as AMQ Streams to interact with a Kafka cluster.

A service discovery label and annotation is generated for services used to access the Kafka cluster:

  • Internal Kafka bootstrap service
  • HTTP Bridge service

The label helps to make the service discoverable, and the annotation provides connection details that a client application can use to make the connection.

The service discovery label, strimzi.io/discovery, is set as true for the Service resources. The service discovery annotation has the same key, providing connection details in JSON format for each service.

Example internal Kafka bootstrap service

apiVersion: v1
kind: Service
metadata:
  annotations:
    strimzi.io/discovery: |-
      [ {
        "port" : 9092,
        "tls" : false,
        "protocol" : "kafka",
        "auth" : "scram-sha-512"
      }, {
        "port" : 9093,
        "tls" : true,
        "protocol" : "kafka",
        "auth" : "tls"
      } ]
  labels:
    strimzi.io/cluster: my-cluster
    strimzi.io/discovery: "true"
    strimzi.io/kind: Kafka
    strimzi.io/name: my-cluster-kafka-bootstrap
  name: my-cluster-kafka-bootstrap
spec:
  #...

Example HTTP Bridge service

apiVersion: v1
kind: Service
metadata:
  annotations:
    strimzi.io/discovery: |-
      [ {
        "port" : 8080,
        "tls" : false,
        "auth" : "none",
        "protocol" : "http"
      } ]
  labels:
    strimzi.io/cluster: my-bridge
    strimzi.io/discovery: "true"
    strimzi.io/kind: KafkaBridge
    strimzi.io/name: my-bridge-bridge-service

13.1.1. Returning connection details on services

You can find the services by specifying the discovery label when fetching services from the command line or a corresponding API call.

oc get service -l strimzi.io/discovery=true

The connection details are returned when retrieving the service discovery label.

13.2. Checking the status of a custom resource

The status property of a AMQ Streams custom resource publishes information about the resource to users and tools that need it.

13.2.1. AMQ Streams custom resource status information

Several resources have a status property, as described in the following table.

AMQ Streams resourceSchema referencePublishes status information on…​

Kafka

Section B.70, “KafkaStatus schema reference”

The Kafka cluster.

KafkaConnect

Section B.88, “KafkaConnectStatus schema reference”

The Kafka Connect cluster, if deployed.

KafkaConnectS2I

Section B.92, “KafkaConnectS2IStatus schema reference”

The Kafka Connect cluster with Source-to-Image support, if deployed.

KafkaConnector

Section B.126, “KafkaConnectorStatus schema reference”

KafkaConnector resources, if deployed.

KafkaMirrorMaker

Section B.114, “KafkaMirrorMakerStatus schema reference”

The Kafka MirrorMaker tool, if deployed.

KafkaTopic

Section B.95, “KafkaTopicStatus schema reference”

Kafka topics in your Kafka cluster.

KafkaUser

Section B.107, “KafkaUserStatus schema reference”

Kafka users in your Kafka cluster.

KafkaBridge

Section B.123, “KafkaBridgeStatus schema reference”

The AMQ Streams Kafka Bridge, if deployed.

The status property of a resource provides information on the resource’s:

  • Current state, in the status.conditions property
  • Last observed generation, in the status.observedGeneration property

The status property also provides resource-specific information. For example:

  • KafkaConnectStatus provides the REST API endpoint for Kafka Connect connectors.
  • KafkaUserStatus provides the user name of the Kafka user and the Secret in which their credentials are stored.
  • KafkaBridgeStatus provides the HTTP address at which external client applications can access the Bridge service.

A resource’s current state is useful for tracking progress related to the resource achieving its desired state, as defined by the spec property. The status conditions provide the time and reason the state of the resource changed and details of events preventing or delaying the operator from realizing the resource’s desired state.

The last observed generation is the generation of the resource that was last reconciled by the Cluster Operator. If the value of observedGeneration is different from the value of metadata.generation, the operator has not yet processed the latest update to the resource. If these values are the same, the status information reflects the most recent changes to the resource.

AMQ Streams creates and maintains the status of custom resources, periodically evaluating the current state of the custom resource and updating its status accordingly. When performing an update on a custom resource using oc edit, for example, its status is not editable. Moreover, changing the status would not affect the configuration of the Kafka cluster.

Here we see the status property specified for a Kafka custom resource.

Kafka custom resource with status

apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
spec:
  # ...
status:
  conditions: 1
  - lastTransitionTime: 2019-07-23T23:46:57+0000
    status: "True"
    type: Ready 2
  observedGeneration: 4 3
  listeners: 4
  - addresses:
    - host: my-cluster-kafka-bootstrap.myproject.svc
      port: 9092
    type: plain
  - addresses:
    - host: my-cluster-kafka-bootstrap.myproject.svc
      port: 9093
    certificates:
    - |
      -----BEGIN CERTIFICATE-----
      ...
      -----END CERTIFICATE-----
    type: tls
  - addresses:
    - host: 172.29.49.180
      port: 9094
    certificates:
    - |
      -----BEGIN CERTIFICATE-----
      ...
      -----END CERTIFICATE-----
    type: external
    # ...

1
Status conditions describe criteria related to the status that cannot be deduced from the existing resource information, or are specific to the instance of a resource.
2
The Ready condition indicates whether the Cluster Operator currently considers the Kafka cluster able to handle traffic.
3
The observedGeneration indicates the generation of the Kafka custom resource that was last reconciled by the Cluster Operator.
4
The listeners describe the current Kafka bootstrap addresses by type.
Important

The address in the custom resource status for external listeners with type nodeport is currently not supported.

Note

The Kafka bootstrap addresses listed in the status do not signify that those endpoints or the Kafka cluster is in a ready state.

Accessing status information

You can access status information for a resource from the command line. For more information, see Section 13.2.2, “Finding the status of a custom resource”.

13.2.2. Finding the status of a custom resource

This procedure describes how to find the status of a custom resource.

Prerequisites

  • An OpenShift cluster.
  • The Cluster Operator is running.

Procedure

  • Specify the custom resource and use the -o jsonpath option to apply a standard JSONPath expression to select the status property:

    oc get kafka <kafka_resource_name> -o jsonpath='{.status}'

    This expression returns all the status information for the specified custom resource. You can use dot notation, such as status.listeners or status.observedGeneration, to fine-tune the status information you wish to see.

Additional resources

13.3. Recovering a cluster from persistent volumes

You can recover a Kafka cluster from persistent volumes (PVs) if they are still present.

You might want to do this, for example, after:

  • A namespace was deleted unintentionally
  • A whole OpenShift cluster is lost, but the PVs remain in the infrastructure

13.3.1. Recovery from namespace deletion

Recovery from namespace deletion is possible because of the relationship between persistent volumes and namespaces. A PersistentVolume (PV) is a storage resource that lives outside of a namespace. A PV is mounted into a Kafka pod using a PersistentVolumeClaim (PVC), which lives inside a namespace.

The reclaim policy for a PV tells a cluster how to act when a namespace is deleted. If the reclaim policy is set as:

  • Delete (default), PVs are deleted when PVCs are deleted within a namespace
  • Retain, PVs are not deleted when a namespace is deleted

To ensure that you can recover from a PV if a namespace is deleted unintentionally, the policy must be reset from Delete to Retain in the PV specification using the persistentVolumeReclaimPolicy property:

apiVersion: v1
kind: PersistentVolume
# ...
spec:
  # ...
  persistentVolumeReclaimPolicy: Retain

Alternatively, PVs can inherit the reclaim policy of an associated storage class. Storage classes are used for dynamic volume allocation.

By configuring the reclaimPolicy property for the storage class, PVs that use the storage class are created with the appropriate reclaim policy. The storage class is configured for the PV using the storageClassName property.

apiVersion: v1
kind: StorageClass
metadata:
  name: gp2-retain
parameters:
  # ...
# ...
reclaimPolicy: Retain
apiVersion: v1
kind: PersistentVolume
# ...
spec:
  # ...
  storageClassName: gp2-retain
Note

If you are using Retain as the reclaim policy, but you want to delete an entire cluster, you need to delete the PVs manually. Otherwise they will not be deleted, and may cause unnecessary expenditure on resources.

13.3.2. Recovery from loss of an OpenShift cluster

When a cluster is lost, you can use the data from disks/volumes to recover the cluster if they were preserved within the infrastructure. The recovery procedure is the same as with namespace deletion, assuming PVs can be recovered and they were created manually.

13.3.3. Recovering a deleted cluster from persistent volumes

This procedure describes how to recover a deleted cluster from persistent volumes (PVs).

In this situation, the Topic Operator identifies that topics exist in Kafka, but the KafkaTopic resources do not exist.

When you get to the step to recreate your cluster, you have two options:

  1. Use Option 1 when you can recover all KafkaTopic resources.

    The KafkaTopic resources must therefore be recovered before the cluster is started so that the corresponding topics are not deleted by the Topic Operator.

  2. Use Option 2 when you are unable to recover all KafkaTopic resources.

    This time you deploy your cluster without the Topic Operator, delete the Topic Operator data in ZooKeeper, and then redeploy it so that the Topic Operator can recreate the KafkaTopic resources from the corresponding topics.

Note

If the Topic Operator is not deployed, you only need to recover the PersistentVolumeClaim (PVC) resources.

Before you begin

In this procedure, it is essential that PVs are mounted into the correct PVC to avoid data corruption. A volumeName is specified for the PVC and this must match the name of the PV.

For more information, see:

Note

The procedure does not include recovery of KafkaUser resources, which must be recreated manually. If passwords and certificates need to be retained, secrets must be recreated before creating the KafkaUser resources.

Procedure

  1. Check information on the PVs in the cluster:

    oc get pv

    Information is presented for PVs with data.

    Example output showing columns important to this procedure:

    NAME                                         RECLAIMPOLICY CLAIM
    pvc-5e9c5c7f-3317-11ea-a650-06e1eadd9a4c ... Retain ...    myproject/data-my-cluster-zookeeper-1
    pvc-5e9cc72d-3317-11ea-97b0-0aef8816c7ea ... Retain ...    myproject/data-my-cluster-zookeeper-0
    pvc-5ead43d1-3317-11ea-97b0-0aef8816c7ea ... Retain ...    myproject/data-my-cluster-zookeeper-2
    pvc-7e1f67f9-3317-11ea-a650-06e1eadd9a4c ... Retain ...    myproject/data-0-my-cluster-kafka-0
    pvc-7e21042e-3317-11ea-9786-02deaf9aa87e ... Retain ...    myproject/data-0-my-cluster-kafka-1
    pvc-7e226978-3317-11ea-97b0-0aef8816c7ea ... Retain ...    myproject/data-0-my-cluster-kafka-2
    • NAME shows the name of each PV.
    • RECLAIM POLICY shows that PVs are retained.
    • CLAIM shows the link to the original PVCs.
  2. Recreate the original namespace:

    oc create namespace myproject
  3. Recreate the original PVC resource specifications, linking the PVCs to the appropriate PV:

    For example:

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: data-0-my-cluster-kafka-0
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 100Gi
      storageClassName: gp2-retain
      volumeMode: Filesystem
      volumeName: pvc-7e1f67f9-3317-11ea-a650-06e1eadd9a4c
  4. Edit the PV specifications to delete the claimRef properties that bound the original PVC.

    For example:

    apiVersion: v1
    kind: PersistentVolume
    metadata:
      annotations:
        kubernetes.io/createdby: aws-ebs-dynamic-provisioner
        pv.kubernetes.io/bound-by-controller: "yes"
        pv.kubernetes.io/provisioned-by: kubernetes.io/aws-ebs
      creationTimestamp: "<date>"
      finalizers:
      - kubernetes.io/pv-protection
      labels:
        failure-domain.beta.kubernetes.io/region: eu-west-1
        failure-domain.beta.kubernetes.io/zone: eu-west-1c
      name: pvc-7e226978-3317-11ea-97b0-0aef8816c7ea
      resourceVersion: "39431"
      selfLink: /api/v1/persistentvolumes/pvc-7e226978-3317-11ea-97b0-0aef8816c7ea
      uid: 7efe6b0d-3317-11ea-a650-06e1eadd9a4c
    spec:
      accessModes:
      - ReadWriteOnce
      awsElasticBlockStore:
        fsType: xfs
        volumeID: aws://eu-west-1c/vol-09db3141656d1c258
      capacity:
        storage: 100Gi
      claimRef:
        apiVersion: v1
        kind: PersistentVolumeClaim
        name: data-0-my-cluster-kafka-2
        namespace: myproject
        resourceVersion: "39113"
        uid: 54be1c60-3319-11ea-97b0-0aef8816c7ea
      nodeAffinity:
        required:
          nodeSelectorTerms:
          - matchExpressions:
            - key: failure-domain.beta.kubernetes.io/zone
              operator: In
              values:
              - eu-west-1c
            - key: failure-domain.beta.kubernetes.io/region
              operator: In
              values:
              - eu-west-1
      persistentVolumeReclaimPolicy: Retain
      storageClassName: gp2-retain
      volumeMode: Filesystem

    In the example, the following properties are deleted:

    claimRef:
      apiVersion: v1
      kind: PersistentVolumeClaim
      name: data-0-my-cluster-kafka-2
      namespace: myproject
      resourceVersion: "39113"
      uid: 54be1c60-3319-11ea-97b0-0aef8816c7ea
  5. Deploy the Cluster Operator.

    oc apply -f install/cluster-operator -n my-project
  6. Recreate your cluster.

    Follow the steps depending on whether or not you have all the KafkaTopic resources needed to recreate your cluster.

    Option 1: If you have all the KafkaTopic resources that existed before you lost your cluster, including internal topics such as committed offsets from __consumer_offsets:

    1. Recreate all KafkaTopic resources.

      It is essential that you recreate the resources before deploying the cluster, or the Topic Operator will delete the topics.

    2. Deploy the Kafka cluster.

      For example:

      oc apply -f kafka.yaml

    Option 2: If you do not have all the KafkaTopic resources that existed before you lost your cluster:

    1. Deploy the Kafka cluster, as with the first option, but without the Topic Operator by removing the topicOperator property from the Kafka resource before deploying.

      If you include the Topic Operator in the deployment, the Topic Operator will delete all the topics.

    2. Run an exec command to one of the Kafka broker pods to open the ZooKeeper shell script.

      For example, where my-cluster-kafka-0 is the name of the broker pod:

      oc exec my-cluster-kafka-0 bin/zookeeper-shell.sh localhost:2181
    3. Delete the whole /strimzi path to remove the Topic Operator storage:

      deleteall /strimzi
    4. Enable the Topic Operator by redeploying the Kafka cluster with the topicOperator property to recreate the KafkaTopic resources.

      For example:

      apiVersion: kafka.strimzi.io/v1beta1
      kind: Kafka
      metadata:
        name: my-cluster
      spec:
        #...
        entityOperator:
          topicOperator: {} 1
          #...
    1
    Here we show the default configuration, which has no additional properties. You specify the required configuration using the properties described in Section B.61, “EntityTopicOperatorSpec schema reference”.
  7. Verify the recovery by listing the KafkaTopic resources:

    oc get KafkaTopic

13.4. Uninstalling AMQ Streams

This procedure describes how to uninstall AMQ Streams and remove resources related to the deployment.

Prerequisites

In order to perform this procedure, identify resources created specifically for a deployment and referenced from the AMQ Streams resource.

Such resources include:

  • Secrets (Custom CAs and certificates, Kafka Connect secrets, and other Kafka secrets)
  • Logging ConfigMaps (of type external)

These are resources referenced by Kafka, KafkaConnect, KafkaConnectS2I, KafkaMirrorMaker, or KafkaBridge configuration.

Procedure

  1. Delete the Cluster Operator Deployment, related CustomResourceDefinitions, and RBAC resources:

    oc delete -f install/cluster-operator
    Warning

    Deleting CustomResourceDefinitions results in the garbage collection of the corresponding custom resources (Kafka, KafkaConnect, KafkaConnectS2I, KafkaMirrorMaker, or KafkaBridge) and the resources dependent on them (Deployments, StatefulSets, and other dependent resources).

  2. Delete the resources you identified in the prerequisites.