Chapter 3. Customized metrics for your Quarkus applications

Micrometer provides an API that allows you to construct your own custom metrics. The most common types of meters supported by monitoring systems are gauges, counters, and summaries. The following sections build an example endpoint, and observes endpoint behavior using these basic meter types.

To register meters, you need a reference to a MeterRegistry, which is configured and maintained by the Micrometer extension. The MeterRegistry can be injected into your application as follows:

package org.acme.micrometer;

import io.micrometer.core.instrument.MeterRegistry;

import javax.ws.rs.GET;
import javax.ws.rs.Path;
import javax.ws.rs.PathParam;
import javax.ws.rs.Produces;

@Path("/")
@Produces("text/plain")
public class ExampleResource {

    private final MeterRegistry registry;

    ExampleResource(MeterRegistry registry) {
        this.registry = registry;
    }
}

Micrometer does have conventions, such as meters must be created and named using dots to separate segments, for example, a.name.like.this. Micrometer then translates that name into the format that the selected registry prefers. Prometheus uses underscores, which means the previous name will appear as a_name_like_this in Prometheus-formatted metrics output.

Micrometer maintains an internal mapping between unique metric identifier and tag combinations and specific meter instances. Using register, counter, or other methods to increment counters or record values does not create a new instance of a meter unless that combination of identifier and tag or label value hasn’t been seen before.

Gauges

Gauges measure a value that can increase or decrease over time, like the speedometer on a car. Gauges can be useful when monitoring the statistics for a cache or collection. Consider the following simple example that observes the size of a list:

    LinkedList<Long> list = new LinkedList<>();

    // Update the constructor to create the gauge
    ExampleResource(MeterRegistry registry) {
        this.registry = registry;
        registry.gaugeCollectionSize("example.list.size", Tags.empty(), list);
    }

    @GET
    @Path("gauge/{number}")
    public Long checkListSize(@PathParam("number") long number) {
        if (number == 2 || number % 2 == 0) {
            // add even numbers to the list
            list.add(number);
        } else {
            // remove items from the list for odd numbers
            try {
                number = list.removeFirst();
            } catch (NoSuchElementException nse) {
                number = 0;
            }
        }
        return number;
    }

When using Prometheus, the value of the created gauge and the size of the list is observed when the Prometheus endpoint is visited. It is important to note that gauges are sampled rather than set; there is no record of how the value associated with a gauge might have changed between measurements.

Micrometer provides a few additional mechanisms for creating gauges. Note that Micrometer does not create strong references to the objects it observes by default. Depending on the registry, Micrometer either omits gauges that observe objects that have been garbage-collected entirely or uses NaN (not a number) as the observed value.

Never gauge something you can count. Gauges can be less straight-forward to use than counters. If what you are measuring can be counted (because the value always increments), use a counter instead.

Counters

Counters are used to measure values that only increase. In the example below, you will count the number of times you test a number to see if it is prime:

    @GET
    @Path("prime/{number}")
    public String checkIfPrime(@PathParam("number") long number) {
        if (number < 1) {
            return "Only natural numbers can be prime numbers.";
        }
        if (number == 1 || number == 2 || number % 2 == 0) {
            return number + " is not prime.";
        }

        if ( testPrimeNumber(number) ) {
            return number + " is prime.";
        } else {
            return number + " is not prime.";
        }
    }

    protected boolean testPrimeNumber(long number) {
        // Count the number of times we test for a prime number
        registry.counter("example.prime.number").increment();
        for (int i = 3; i < Math.floor(Math.sqrt(number)) + 1; i = i + 2) {
            if (number % i == 0) {
                return false;
            }
        }
        return true;
    }

It might be tempting to add a label or tag to the counter indicating what value was checked. Remember that each unique combination of metric name (testPrimeNumber) and label value produces a unique time series. Using an unbounded set of data as label values can lead to a "cardinality explosion", an exponential increase in the creation of new time series.

It is possible to add a label that would convey a little more information, however. In the example below, adjustments were made and the counter was moved to add some labels.

    @GET
    @Path("prime/{number}")
    public String checkIfPrime(@PathParam("number") long number) {
        if (number < 1) {
            registry.counter("example.prime.number", "type", "not-natural").increment();
            return "Only natural numbers can be prime numbers.";
        }
        if (number == 1 ) {
            registry.counter("example.prime.number", "type", "one").increment();
            return number + " is not prime.";
        }
        if (number == 2 || number % 2 == 0) {
            registry.counter("example.prime.number", "type", "even").increment();
            return number + " is not prime.";
        }

        if ( testPrimeNumber(number) ) {
            registry.counter("example.prime.number", "type", "prime").increment();
            return number + " is prime.";
        } else {
            registry.counter("example.prime.number", "type", "not-prime").increment();
            return number + " is not prime.";
        }
    }

    protected boolean testPrimeNumber(long number) {
        for (int i = 3; i < Math.floor(Math.sqrt(number)) + 1; i = i + 2) {
            if (number % i == 0) {
                return false;
            }
        }
        return true;
    }

Looking at the data produced by this counter, you can tell how often a negative number was checked, or the number one, or an even number, and so on. Try the following sequence and look for example_prime_number_total in the plain text output. Note that the _total suffix is added when Micrometer applies Prometheus naming conventions to example.prime.number, the originally specified counter name.

# If you did not leave quarkus running in dev mode, start it again:
./mvnw compile quarkus:dev

curl http://localhost:8080/example/prime/-1
curl http://localhost:8080/example/prime/0
curl http://localhost:8080/example/prime/1
curl http://localhost:8080/example/prime/2
curl http://localhost:8080/example/prime/3
curl http://localhost:8080/example/prime/15
curl http://localhost:8080/q/metrics

Never count something you can time or summarize. Counters only record a count, which might be all that is needed. However, if you want to understand more about how a value is changing, a timer (when the base unit of measurement is time) or a distribution summary might be more appropriate.

Summaries and Timers

Timers and distribution summaries in Micrometer are very similar. Both allow you to record an observed value, which will be aggregated with other recorded values and stored as a sum. Micrometer also increments a counter to indicate the number of measurements that have been recorded and tracks the maximum observed value within a specified interval of time.

Distribution summaries are populated by calling the record method to record observed values, while timers provide additional capabilities specific to working with time and measuring durations. For example, we can use a timer to measure how long it takes to calculate prime numbers using one of the record methods that wraps the invocation of a Supplier function:

    protected boolean testPrimeNumber(long number) {
        Timer timer = registry.timer("example.prime.number.test");
        return timer.record(() -> {
            for (int i = 3; i < Math.floor(Math.sqrt(number)) + 1; i = i + 2) {
                if (number % i == 0) {
                    return false;
                }
            }
            return true;
        });
    }

Micrometer will apply Prometheus conventions when emitting metrics for this timer. Prometheus measures time in seconds. Micrometer converts measured durations into seconds and includes the unit in the metric name, per convention. After visiting the prime endpoint a few more times, look in the plain text output for the following three entries: example_prime_number_test_seconds_count, example_prime_number_test_seconds_sum, and example_prime_number_test_seconds_max.

# If you did not leave quarkus running in dev mode, start it again:
./mvnw compile quarkus:dev

curl http://localhost:8080/example/prime/256
curl http://localhost:8080/q/metrics
curl http://localhost:8080/example/prime/7919
curl http://localhost:8080/q/metrics

Both timers and distribution summaries can be configured to emit additional statistics, like histogram data, precomputed percentiles, or service level objective (SLO) boundaries. Note that the count, sum, and histogram data can be re-aggregated across dimensions (or across a series of instances), while precomputed percentile values cannot.