Chapter 3. Configuring the Time Series Database (Gnocchi) for Telemetry

Time series database (Gnocchi) is a multi-project, metrics, and resource database. It is designed to store metrics at a very large scale while providing access to metrics and resources information to operators and users.

3.1. Understanding the Time Series Database

This section defines the commonly used terms for the Time series database (Gnocchi)features.

Aggregation method
A function used to aggregate multiple measures into an aggregate. For example, the min aggregation method aggregates the values of different measures to the minimum value of all the measures in the time range.
Aggregate
A data point tuple generated from several measures according to the archive policy. An aggregate is composed of a time stamp and a value.
Archive policy
An aggregate storage policy attached to a metric. An archive policy determines how long aggregates are kept in a metric and how aggregates are aggregated (the aggregation method).
Granularity
The time between two aggregates in an aggregated time series of a metric.
Measure
An incoming data point tuple sent to the Time series database by the API. A measure is composed of a time stamp and a value.
Metric
An entity storing aggregates identified by an UUID. A metric can be attached to a resource using a name. How a metric stores its aggregates is defined by the archive policy that the metric is associated to.
Resource
An entity representing anything in your infrastructure that you associate a metric with. A resource is identified by a unique ID and can contain attributes.
Time series
A list of aggregates ordered by time.
Timespan
The time period for which a metric keeps its aggregates. It is used in the context of archive policy.

3.2. Metrics

The Time series database (Gnocchi) stores metrics from Telemetry that designate anything that can be measured, for example, the CPU usage of a server, the temperature of a room or the number of bytes sent by a network interface.

A metric has the following properties:

  • UUID to identify the metric
  • Metric name
  • Archive policy used to store and aggregate the measures

The Time series database stores the following metrics by default, as defined in the etc/ceilometer/polling.yaml file:

[root@controller-0 ~]# podman exec -ti ceilometer_agent_central cat /etc/ceilometer/polling.yaml
---
sources:
    - name: some_pollsters
      interval: 300
      meters:
        - cpu
        - memory.usage
        - network.incoming.bytes
        - network.incoming.packets
        - network.outgoing.bytes
        - network.outgoing.packets
        - disk.read.bytes
        - disk.read.requests
        - disk.write.bytes
        - disk.write.requests
        - hardware.cpu.util
        - hardware.memory.used
        - hardware.memory.total
        - hardware.memory.buffer
        - hardware.memory.cached
        - hardware.memory.swap.avail
        - hardware.memory.swap.total
        - hardware.system_stats.io.outgoing.blocks
        - hardware.system_stats.io.incoming.blocks
        - hardware.network.ip.incoming.datagrams
        - hardware.network.ip.outgoing.datagrams

The polling.yaml file also specifies the default polling interval of 300 seconds (5 minutes).

3.3. Time Series Database Components

Currently, Gnocchi uses the Identity service for authentication and Redis for incoming measure storage. To store the aggregated measures, Gnocchi relies on either Swift or Ceph (Object Storage). Gnocchi also leverages MySQL to store the index of resources and metrics.

The time series database provides the statsd deamon (gnocchi-statsd) that is compatible with the statsd protocol and can listen to the metrics sent over the network. To enable statsd support in Gnocchi, configure the [statsd] option in the configuration file. The resource ID parameter is used as the main generic resource where all the metrics are attached, a user and project ID that are associated with the resource and metrics, and an archive policy name that is used to create the metrics.

All the metrics are created dynamically as the metrics are sent to gnocchi-statsd, and attached with the provided name to the resource ID you configured.

3.4. Running the Time Series Database

Run the time series database by running the HTTP server and metric daemon:

# gnocchi-api
# gnocchi-metricd

3.5. Running As A WSGI Application

You can run Gnocchi through a WSGI service such as mod_wsgi or any other WSGI application. You can use the gnocchi/rest/app.wsgi file, which is provided with Gnocchi, to enable Gnocchi as a WSGI application.

The Gnocchi API tier runs using WSGI. This means it can be run using Apache httpd and mod_wsgi, or another HTTP daemon such as uwsgi. Configure the number of processes and threads according to the number of CPUs you have, usually around 1.5 × number of CPUs. If one server is not enough, you can spawn any number of new API servers to scale Gnocchi out, even on different machines.

3.6. metricd Workers

By default, the gnocchi-metricd daemon spans all your CPU power to maximize CPU utilization when computing metric aggregation. You can use the gnocchi status command to query the HTTP API and get the cluster status for metric processing. This command displays the number of metrics to process, known as the processing backlog for the gnocchi-metricd. As long as this backlog is not continuously increasing, that means that gnocchi-metricd can cope with the amount of metric that are being sent. If the number of measure to process is continuously increasing, you might need to temporarily increase the number of the gnocchi-metricd daemons. You can run any number of metricd daemons on any number of servers.

For director-based deployments, you can adjust certain metric processing parameters in your environment file:

  • MetricProcessingDelay - Adjusts the delay period between iterations of metric processing.
  • GnocchiMetricdWorkers - Configure the number of metricd workers.

3.7. Monitoring the Time Series Database

The /v1/status endpoint of the HTTP API returns various information, such as the number of measures to process (measures backlog), which you can easily monitor. To verify good health of the overall system, ensure that the HTTP server and the gnocchi-metricd daemon are running and are not writing errors in their log files.

3.8. Backing up and Restoring the Time Series Database

To recover from an unfortunate event, backup both the index and the storage. You must create a database dump (PostgreSQL or MySQL), and create snapshots or copies of your data storage (Ceph, Swift or your file system). The procedure to restore is: restore your index and storage backups, re-install Gnocchi if necessary, and restart it.

3.9. Batch deleting old resources from Gnocchi

To remove outdated measures, create the archive policy to suit your requirements. To batch delete resources, metrics and measures, use the CLI or REST API. For example, to delete resources and all their associated metrics that were terminated 30 days ago, run the following command:

openstack metric resource batch delete "ended_at < '-30days'"