Chapter 9. Using Prometheus and Grafana to monitor the router network

Prometheus is container-native software built for storing historical data and for monitoring large, scalable systems such as AMQ Interconnect. It gathers data over an extended time, rather than just for the currently running session.

You use Prometheus and Alertmanager to monitor and store AMQ Interconnect data so that you can use a graphical tool, such as Grafana, to visualize and run queries on the data.

9.1. Setting up Prometheus and Grafana

Before you can view AMQ Interconnect dashboards, you must deploy and configure Prometheus, Alertmanager, and Grafana in the OpenShift project in which AMQ Interconnect is deployed. All of the required configuration files are provided in a GitHub repository.

Procedure

  1. Clone the qdr-monitoring GitHub repository.

    This repository contains example configuration files needed to set up Prometheus and Grafana to monitor AMQ Interconnect.

    $ git clone https://github.com/interconnectedcloud/qdr-monitoring
  2. Set the NAMESPACE environment variable to the name of the project where you deployed AMQ Interconnect.

    For example, if you deployed AMQ Interconnect in the example project, set the NAMESPACE environment variable as follows:

    $ export NAMESPACE=example
  3. Run the deploy-monitoring.sh script.

    This script creates and configures the OpenShift resources needed to deploy Prometheus, Alertmanager, and Grafana in your OpenShift project. It also configures two dashboards that provide metrics for the router network.

    $ ./deploy-monitoring.sh

    An alternative method of running this script is to to specify the target project as a parameter. For example:

    $ ./deploy-monitoring.sh example

Additional resources

9.2. Viewing AMQ Interconnect dashboards in Grafana

After setting up Prometheus and Grafana, you can visualize the AMQ Interconnect data on the following Grafana dashboards:

Qpid Dispatch Router
Shows metrics for:
Qpid Dispatch Router

Shows metrics for:

  • Deliveries ingress
  • Deliveries egress
  • Deliveries ingress route container
  • Deliveries egress route container
  • Deliveries redirected to fallback destination
  • Dropped presettled deliveries
  • Presettled deliveries
  • Auto links
  • Link routes
  • Address count
  • Connection count
  • Link count
Qpid Dispatch Router - Delayed Deliveries

Shows metrics for:

  • Cumulative delayed 10 seconds
  • Cumulative delayed 1 second
  • Rate of new delayed deliveries

For more information about these metrics, see Section 9.3, “Router metrics”.

Procedure

  1. In the OpenShift web console, switch to NetworkingRoutes, and click the URL for the grafana Route.

    The Grafana Log In page appears.

  2. Enter your user name and password, and then click Log In.

    The default Grafana user name and password are both admin. After logging in for the first time, you can change the password.

  3. On the top header, click the dashboard drop-down menu, and then select the Qpid Dispatch Router or Qpid Dispatch Router - Delayed Deliveries dashboard.

    Figure 9.1. Delayed Deliveries dashboard

    Grafana dashboard showing delayed deliveries

9.3. Router metrics

The following metrics are available in Prometheus:

qdr_connections_total
The total number of network connections to the router. This includes connections from and to any AMQP route container.
qdr_links_total
The total number of incoming and outgoing links attached to the router.
qdr_addresses_total
The total number of addresses known to the router.
qdr_routers_total
The total number of routers known to the router.
qdr_link_routes_total
The total number of active and inactive link routes configured for the router. See Understanding link routing for more details.
qdr_auto_links_total
The total number of incoming and outgoing auto links configured for the router. See Configuring brokered messaging for more details about autolinks.
qdr_presettled_deliveries_total
The total number of presettled deliveries arriving at the router. The router settles the incoming deliveries and propagates the settlement to the message destination, also known as fire and forget.
qdr_dropped_presettled_deliveries_total
The total number of presettled deliveries that the router dropped due to congestion. The router settles the incoming deliveries and propagates the settlement to the message destination, also known as fire and forget.
qdr_accepted_deliveries_total
The total number of deliveries accepted at the router. See Understanding message routing for more information on accepted deliveries.
qdr_released_deliveries_total
The total number of deliveries released at the router. See Understanding message routing for more information on released deliveries.
qdr_rejected_deliveries_total
The total number of deliveries rejected at the router. See Understanding message routing for more information on rejected deliveries.
qdr_modified_deliveries_total
The total number of deliveries modified at the router. See Understanding message routing for more information on modified deliveries.
qdr_deliveries_ingress_total
The total number of messages delivered to the router from clients. This includes management messages, but not route control messages.
qdr_deliveries_egress_total
The total number of messages sent from the router to clients. This includes management messages, but not route control messages.
qdr_deliveries_transit_total, qdr_deliveries_ingress_route_container_total
The total number of messages passing through the router for delivery to a different router.
qdr_deliveries_egress_route_container_total
The total number of deliveries sent to AMQP route containers from the router This includes messages to an AMQ Broker instance and management messages, but not route control messages.
qdr_deliveries_delayed_1sec_total
The total number of deliveries forwarded by the router that were unsettled for more than one second.
qdr_deliveries_delayed_10sec_total
The total number of deliveries forwarded by the router that were unsettled for more than ten seconds.
qdr_deliveries_stuck_total
The total number of deliveries that cannot be delivered. Typically, deliveries cannot be delivered due to lack of credit as described in Message routing flow control
qdr_links_blocked_total
The total number of links that are blocked.
qdr_deliveries_redirected_to_fallback_total
The total number of deliveries that were forwarded to a fallback destination. See Handling undeliverable messages for more information.