Chapter 3. Configuring the Time Series Database (Gnocchi) for Telemetry

Time series database (Gnocchi) is a multi-tenant, metrics and resource database. It is designed to store metrics at a very large scale while providing access to metrics and resources information to operators and users.

3.1. Understanding the Time Series Database

This section defines the commonly used terms for the Time series database (Gnocchi)features.

Aggregation method
A function used to aggregate multiple measures into an aggregate. For example, the min aggregation method aggregates the values of different measures to the minimum value of all the measures in the time range.
Aggregate
A data point tuple generated from several measures according to the archive policy. An aggregate is composed of a time stamp and a value.
Archive policy
An aggregate storage policy attached to a metric. An archive policy determines how long aggregates are kept in a metric and how aggregates are aggregated (the aggregation method).
Granularity
The time between two aggregates in an aggregated time series of a metric.
Measure
An incoming data point tuple sent to the Time series database by the API. A measure is composed of a time stamp and a value.
Metric
An entity storing aggregates identified by an UUID. A metric can be attached to a resource using a name. How a metric stores its aggregates is defined by the archive policy that the metric is associated to.
Resource
An entity representing anything in your infrastructure that you associate a metric with. A resource is identified by a unique ID and can contain attributes.
Time series
A list of aggregates ordered by time.
Timespan
The time period for which a metric keeps its aggregates. It is used in the context of archive policy.

3.2. Metrics

The Time series database (Gnocchi) stores metrics from Telemetry that designate anything that can be measured, for example, the CPU usage of a server, the temperature of a room or the number of bytes sent by a network interface.

A metric has the following properties:

  • UUID to identify the metric
  • Metric name
  • Archive policy used to store and aggregate the measures

The Time series database stores the following metrics by default, as defined in the etc/ceilometer/polling.yaml file:

[root@controller-0 ~]# podman exec -ti ceilometer_agent_central cat /etc/ceilometer/polling.yaml
---
sources:
    - name: some_pollsters
      interval: 300
      meters:
        - cpu
        - memory.usage
        - network.incoming.bytes
        - network.incoming.packets
        - network.outgoing.bytes
        - network.outgoing.packets
        - disk.read.bytes
        - disk.read.requests
        - disk.write.bytes
        - disk.write.requests
        - hardware.cpu.util
        - hardware.memory.used
        - hardware.memory.total
        - hardware.memory.buffer
        - hardware.memory.cached
        - hardware.memory.swap.avail
        - hardware.memory.swap.total
        - hardware.system_stats.io.outgoing.blocks
        - hardware.system_stats.io.incoming.blocks
        - hardware.network.ip.incoming.datagrams
        - hardware.network.ip.outgoing.datagrams

The polling.yaml file also specifies the default polling interval of 300 seconds (5 minutes).

3.3. Time Series Database Components

Currently, Gnocchi uses the Identity service for authentication and Redis for incoming measure storage. To store the aggregated measures, Gnocchi relies on either Swift or Ceph (Object Storage). Gnocchi also leverages MySQL to store the index of resources and metrics.

The Time series database provides the statsd deamon (gnocchi-statsd) that is compatible with the statsd protocol and can listen to the metrics sent over the network. In order to enable statsd support in Gnocchi, you need to configure the [statsd] option in the configuration file. The resource ID parameter is used as the main generic resource where all the metrics are attached, a user and project ID that are associated with the resource and metrics, and an archive policy name that is used to create the metrics.

All the metrics are created dynamically as the metrics are sent to gnocchi-statsd, and attached with the provided name to the resource ID you configured.

3.4. Running the Time Series Database

Run the Time series database by running the HTTP server and metric daemon:

# gnocchi-api
# gnocchi-metricd

3.5. Running As A WSGI Application

You can run Gnocchi through a WSGI service such as mod_wsgi or any other WSGI application. The file gnocchi/rest/app.wsgi provided with Gnocchi allows you to enable Gnocchi as a WSGI application.

The Gnocchi API tier runs using WSGI. This means it can be run using Apache httpd and mod_wsgi, or another HTTP daemon such as uwsgi. You should configure the number of processes and threads according to the number of CPUs you have, usually around 1.5 × number of CPUs. If one server is not enough, you can spawn any number of new API servers to scale Gnocchi out, even on different machines.

3.6. metricd Workers

By default, the gnocchi-metricd daemon spans all your CPU power in order to maximize CPU utilization when computing metric aggregation. You can use the gnocchi status command to query the HTTP API and get the cluster status for metric processing. This command displays the number of metrics to process, known as the processing backlog for the gnocchi-metricd. As long as this backlog is not continuously increasing, that means that gnocchi-metricd is able to cope with the amount of metric that are being sent. If the number of measure to process is continuously increasing, you need to (maybe temporarily) increase the number of the gnocchi-metricd daemons. You can run any number of metricd daemons on any number of servers.

For director-based deployments, you can adjust certain metric processing parameters in your environment file:

  • MetricProcessingDelay - Adjusts the delay period between iterations of metric processing.
  • GnocchiMetricdWorkers - Configure the number of metricd workers.

3.7. Monitoring the Time Series Database

The /v1/status endpoint of the HTTP API returns various information, such as the number of measures to process (measures backlog), which you can easily monitor. Making sure that the HTTP server and the gnocchi-metricd daemon are running and are not writing anything alarming in their logs is a sign of good health of the overall system.

3.8. Backing up and Restoring the Time Series Database

In order to be able to recover from an unfortunate event, you need to backup both the index and the storage. That means creating a database dump (PostgreSQL or MySQL) and doing snapshots or copies of your data storage (Ceph, Swift or your file system). The procedure to restore is: restore your index and storage backups, re-install Gnocchi if necessary, and restart it.