Monitoring metrics in Red Hat OpenShift Streams for Apache Kafka

Guide
  • Red Hat OpenShift Streams for Apache Kafka 1
  • Updated 09 August 2022
  • Published 21 December 2021

Monitoring metrics in Red Hat OpenShift Streams for Apache Kafka

Guide
Red Hat OpenShift Streams for Apache Kafka 1
  • Updated 09 August 2022
  • Published 21 December 2021

As a developer or administrator, you can view metrics in OpenShift Streams for Apache Kafka to visualize the performance and data usage for Kafka instances and topics that you have access to. You can view metrics directly in the OpenShift Streams for Apache Kafka web console, or use the metrics API endpoint provided by OpenShift Streams for Apache Kafka to import the data into your own metrics monitoring tool, such as Prometheus.

Supported metrics in OpenShift Streams for Apache Kafka

OpenShift Streams for Apache Kafka supports the following metrics for Kafka instances and topics. In the OpenShift Streams for Apache Kafka web console, the Dashboard page of a Kafka instance displays a subset of these metrics. To learn more about the limits associated with both trial and production Kafka instance types, refer to Red Hat OpenShift Streams for Apache Kafka Service Limits.

Cluster metrics
  • kafka_namespace:haproxy_server_bytes_in_total:rate5m: Number of incoming bytes per second for the cluster in the last five minutes. This ingress metric represents all the data that producers are sending to topics in the cluster. The Kafka instance type determines the maximum incoming byte rate.

  • kafka_namespace:haproxy_server_bytes_out_total:rate5m: Number of outgoing bytes per second for the cluster in the last five minutes. This egress metric represents all the data that consumers are receiving from topics in the cluster. The Kafka instance type determines the maximum outgoing byte rate.

  • kafka_namespace:kafka_server_socket_server_metrics_connection_count:sum: Number of current client connections to the cluster. Kafka clients use persistent connections to interact with brokers in the cluster. For example, a consumer holds a connection to each broker it is receiving data from and a connection to its group coordinator. The Kafka instance type determines the maximum number of active connections.

  • kafka_namespace:kafka_server_socket_server_metrics_connection_creation_rate:sum: Number of client connection creations per second for the cluster. Kafka clients use persistent connections to interact with brokers in the cluster. A constant high number of connection creations might indicate a client issue. The Kafka instance type determines the maximum connection creation rate.

  • kafka_topic:kafka_topic_partitions:count: Number of topics in the cluster. This does not include internal Kafka topics, such as __consumer_offsets and __transaction_state.

  • kafka_topic:kafka_topic_partitions:sum: Number of partitions across all topics in the cluster. This does not include partitions from internal Kafka topics, such as __consumer_offsets and __transaction_state. The Kafka instance type determines the maximum number of partitions.

Broker metrics
  • kafka_broker_quota_softlimitbytes: Maximum amount of storage, in bytes, for this broker before producers are throttled. When this limit is reached, the broker starts throttling producers to prevent them from sending additional data. The Kafka instance type determines the maximum storage in the broker.

  • kafka_broker_quota_totalstorageusedbytes: Amount of storage, in bytes, that is currently used by partitions in the broker. The storage usage depends on the number and retention configurations of the partitions. This metric must stay below the kafka_broker_quota_softlimitbytes metric setting.

  • kafka_controller_kafkacontroller_global_partition_count: Number of partitions in the cluster. Only the broker that is the current controller in the cluster reports this metric. Any other brokers report 0. This count includes partitions from internal Kafka topics, such as __consumer_offsets and __transaction_state. This metric is similar to the kafka_topic:kafka_topic_partitions:sum cluster metric. The Kafka instance type determines the maximum storage in the broker.

  • kafka_controller_kafkacontroller_offline_partitions_count: Number of partitions in the cluster that are currently offline. Offline partitions cannot be used by clients for producing or consuming data. Only the broker that is the current controller in the cluster reports this metric. Any other brokers report 0.

  • kubelet_volume_stats_available_bytes: Amount of disk space, in bytes, that is available in the broker.

  • kubelet_volume_stats_used_bytes: Amount of disk space, in bytes, that is currently used in the broker. This metric is similar to the kafka_broker_quota_totalstorageusedbytes broker metric.

Topic metrics
  • kafka_server_brokertopicmetrics_bytes_in_total: Number of incoming bytes to topics in the instance.

  • kafka_server_brokertopicmetrics_bytes_out_total: Number of outgoing bytes from topics in the instance.

  • kafka_server_brokertopicmetrics_messages_in_total: Number of messages per second received by one or more topics in the instance.

  • kafka_topic:kafka_server_brokertopicmetrics_bytes_in_total:rate5m: Number of incoming bytes to topics in the instance in the last five minutes.

  • kafka_topic:kafka_server_brokertopicmetrics_bytes_out_total:rate5m: Number of outgoing bytes from topics in the instance in the last five minutes.

  • kafka_topic:kafka_server_brokertopicmetrics_messages_in_total:rate5m: Number of messages per second received by one or more topics in the instance in the last five minutes.

  • kafka_topic:kafka_log_log_size:sum: Log size of each topic and replica, in bytes, across all brokers in the cluster.

Viewing metrics for a Kafka instance in OpenShift Streams for Apache Kafka

After you produce and consume messages in your services using methods such as Kafka scripts, Kafkacat, or a Quarkus application, you can return to the Kafka instance in the web console and use the Dashboard page to view metrics for the instance and topics. The metrics help you understand the performance and data usage for your Kafka instance and topics.

Prerequisites
Procedure
  • In the Kafka Instances page of the web console, click the name of the Kafka instance and select the Dashboard tab.

    When you create a Kafka instance and add new topics, the Dashboard page is initially empty. After you start producing and consuming messages in your services, you can return to this page to view related metrics. For example, to use Kafka scripts to produce and consume messages, see Configuring and connecting Kafka scripts with Red Hat OpenShift Streams for Apache Kafka.

In some cases, after you start producing and consuming messages, you might need to wait several minutes for the latest metrics to appear. You might also need to wait until your instance and topics contain enough data for metrics to appear.

Configuring metrics monitoring for a Kafka instance in Prometheus

As an alternative to viewing metrics for a Kafka instance in the OpenShift Streams for Apache Kafka web console, you can export your metrics to Prometheus and integrate the metrics with your own metrics monitoring platform. OpenShift Streams for Apache Kafka provides a kafkas/{id}/metrics/federate API endpoint that you can configure as a scrape target for Prometheus to use to collect and store metrics. You can then access the metrics in the Prometheus expression browser or in a data-graphing tool such as Grafana.

This procedure follows the Configuration File method defined by Prometheus for integrating third-party metrics. If you use the Prometheus Operator in your monitoring environment, you can also follow the Additional Scrape Configuration method.

Prerequisites
  • You have access to a Kafka instance that contains topics in OpenShift Streams for Apache Kafka. For more information about access management in OpenShift Streams for Apache Kafka, see Managing account access in Red Hat OpenShift Streams for Apache Kafka.

  • You have the ID and the SASL/OAUTHBEARER token endpoint for the Kafka instance. To relocate the Kafka instance ID and the token endpoint, select your Kafka instance in the OpenShift Streams for Apache Kafka web console, select the options menu (three vertical dots), and click Connection.

  • You have the generated credentials for your service account that has access to the Kafka instance. To reset the credentials, use the Service Accounts page in the Application Services section of the Red Hat Hybrid Cloud Console.

  • You’ve installed a Prometheus instance in your monitoring environment. For installation instructions, see Getting Started in the Prometheus documentation.

Procedure
  1. In your Prometheus configuration file, add the following information. Replace the variable values with your own Kafka instance and service account information.

    The <kafka_instance_id> is the ID of the Kafka instance. The <client_id> and <client_secret> are the generated credentials for your service account that you copied previously. The <token_url> is the SASL/OAUTHBEARER token endpoint for the Kafka instance.

    Required information for Prometheus configuration file
    - job_name: "kafka-federate"
      static_configs:
      - targets: ["api.openshift.com"]
      scheme: "https"
      metrics_path: "/api/kafkas_mgmt/v1/kafkas/<kafka_instance_id>/metrics/federate"
      oauth2:
        client_id: "<client_id>"
        client_secret: "<client_secret>"
        token_url: "<token_url>"

    The new scrape target becomes available after the configuration has reloaded.

  2. View your collected metrics in the Prometheus expression browser at http://<host>:<port>/graph, or integrate your Prometheus data source with a data-graphing tool such as Grafana. For information about Prometheus metrics in Grafana, see Grafana Support for Prometheus in the Grafana documentation.

    If you use Grafana with your Prometheus instance, you can import the predefined Red Hat OpenShift Streams for Apache Kafka Grafana dashboard to set up your metrics display. For import instructions, see Importing a dashboard in the Grafana documentation.

When you create a Kafka instance and add new topics, the metrics are initially empty. After you start producing and consuming messages in your services, you can return to your monitoring tool to view related metrics. For example, to use Kafka scripts to produce and consume messages, see Configuring and connecting Kafka scripts with Red Hat OpenShift Streams for Apache Kafka.

In some cases, after you start producing and consuming messages, you might need to wait several minutes for the latest metrics to appear. You might also need to wait until your instance and topics contain enough data for metrics to appear.

If you use the Prometheus Operator in your monitoring environment, you can alternatively create a kafka-federate.yaml file as an additional scrape configuration in your Prometheus custom resource as shown in the following example commands. For more information about this method, see Additional Scrape Configuration in the Prometheus documentation.

Example kafka-federate.yaml file
- job_name: "kafka-federate"
  static_configs:
  - targets: ["api.openshift.com"]
  scheme: "https"
  metrics_path: "/api/kafkas_mgmt/v1/kafkas/<kafka_instance_id>/metrics/federate"
  oauth2:
    client_id: "<client_id>"
    client_secret: "<client_secret>"
    token_url: "<token_url>"
Example command to create and apply a Kubernetes secret
kubectl create secret generic additional-scrape-configs --from-file=<~/kafka-federate.yaml> --dry-run -o yaml \
kubectl apply -f - -n <namespace>
Example Prometheus custom resource with new secret
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
    ...
spec:
    ...
    additionalScrapeConfigs:
        name: additional-scrape-configs
        key: kafka-federate.yaml

Configuring Prometheus alerts for Kafka instance limits

Prerequisites
  • You have successfully configured metrics monitoring for a Kafka instance in Prometheus.

  • You use the Prometheus Operator in your monitoring environment.

  • You can define alerting rules in Prometheus and can deploy an Alertmanager cluster in Prometheus Operator.

Procedure
  1. Create a PrometheusRule custom resource with alerts defined for the capacity of your Kafka instance.

  2. Apply the PrometheusRule to the cluster that you are federating the metrics to.

Example PrometheusRule custom resource for a Kafka broker storage limit alert
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
spec:
  groups:
    - name: limits
      rules:
        - alert: KafkaBrokerStorageFillingUp
          expr: predict_linear(kubelet_volume_stats_available_bytes{persistentvolumeclaim=~"data-(.+)-kafka-[0-9]+"}[1h], 4 * 3600)
          labels:
            severity: <SOME_SEVERITY>
          annotations:
            summary: 'Broker PersistentVolume is filling up.'
            description: 'Based on recent sampling, the Broker PersistentVolume claimed by {{ $labels.persistentvolumeclaim }} is expected to fill up within four days.