Chapter 7. Monitoring

Monitoring data allows you to monitor the performance and health of AMQ Streams. You can configure your deployment to capture metrics data for analysis and notifications.

Metrics data is useful when investigating issues with connectivity and data delivery. For example, metrics data can identify under-replicated partitions or the rate at which messages are consumed. Alerting rules can provide time-critical notifications on such metrics through a specified communications channel. Monitoring visualizations present real-time metrics data to help determine when and how to update the configuration of your deployment. Example metrics configuration files are provided with AMQ Streams.

Distributed tracing complements the gathering of metrics data by providing a facility for end-to-end tracking of messages through AMQ Streams.

Cruise Control provides support for rebalancing of Kafka clusters, based on workload data.

Metrics and monitoring tools

AMQ Streams can employ the following tools for metrics and monitoring:

  • Prometheus pulls metrics from Kafka, ZooKeeper and Kafka Connect clusters. The Prometheus Alertmanager plugin handles alerts and routes them to a notification service.
  • Kafka Exporter adds additional Prometheus metrics
  • Grafana provides dashboard visualizations of Prometheus metrics
  • Jaeger provides distributed tracing support to track transactions between applications
  • Cruise Control balances data across a Kafka cluster

7.1. Prometheus

Prometheus can extract metrics data from Kafka components and the AMQ Streams Operators.

To use Prometheus to obtain metrics data and provide alerts, Prometheus and the Prometheus Alertmanager plugin must be deployed. Kafka resources must also be deployed or redeployed with metrics configuration to expose the metrics data.

Prometheus scrapes the exposed metrics data for monitoring. Alertmanager issues alerts when conditions indicate potential problems, based on pre-defined alerting rules.

Sample metrics and alerting rules configuration files are provided with AMQ Streams. The sample alerting mechanism provided with AMQ Streams is configured to send notifications to a Slack channel.

7.2. Grafana

Grafana uses the metrics data exposed by Prometheus to present dashboard visualizations for monitoring.

A deployment of Grafana is required, with Prometheus added as a data source. Example dashboards, supplied with AMQ Streams as JSON files, are imported through the Grafana interface to present monitoring data.

7.3. Kafka Exporter

Kafka Exporter is an open source project to enhance monitoring of Apache Kafka brokers and clients. Kafka Exporter is deployed with a Kafka cluster to extract additional Prometheus metrics data from Kafka brokers related to offsets, consumer groups, consumer lag, and topics. You can use the Grafana dashboard provided to visualize the data collected by Prometheus from Kafka Exporter.

A sample configuration file, alerting rules and Grafana dashboard for Kafka Exporter are provided with AMQ Streams.

7.4. Distributed tracing

Within a Kafka deployment, distributed tracing using Jaeger is supported for:

  • MirrorMaker to trace messages from a source cluster to a target cluster
  • Kafka Connect to trace messages consumed and produced by Kafka Connect
  • Kafka Bridge to trace messages consumed and produced by Kafka Bridge, and HTTP requests from client applications

Template configuration properties are set for the Kafka resources, which describe tracing environment variables.

Tracing for Kafka clients

Client applications, such as Kafka producers and consumers, can also be set up so that transactions are monitored. Clients are configured with a tracing profile, and a tracer is initialized for the client application to use.

7.5. Cruise Control

Cruise Control is an open source project for simplifying the monitoring and balancing of data across a Kafka cluster. Cruise Control is deployed alongside a Kafka cluster to monitor its traffic, propose more balanced partition assignments, and trigger partition reassignments based on those proposals.

Cruise Control collects resource utilization information to model and analyze the workload of the Kafka cluster. Based on optimization goals that have been defined, Cruise Control generates optimization proposals outlining how the cluster can be effectively rebalanced. When an optimization proposal is approved, Cruise Control applies the rebalancing outlined in the proposal.

Prometheus can extract Cruise Control metrics data, including data related to optimization proposals and rebalancing operations. A sample configuration file and Grafana dashboard for Cruise Control are provided with AMQ Streams.