Chapter 3. Installing the client-side tools

Before you deploy the overcloud, you need to determine the configuration settings to apply to each client. Copy the example environment files from the heat template collection and modify the files to suit your environment.

3.1. Setting centralized logging client parameters

For more information, see Enabling centralized logging during deployment in the Logging, Monitoring, and Troubleshooting guide.

3.2. Setting monitoring client parameters

The monitoring solution collects system information periodically and provides a mechanism to store and monitor the values in a variety of ways using a data collecting agent. Red Hat supports collectd as a collection agent. Collectd-sensubility is an extention of collectd and communicates with Sensu server side through RabbitMQ. You can use Service Telemetry Framework (STF) to store the data, and in turn, monitor systems, find performance bottlenecks, and predict future system load. For more information about Service Telemetry Framework, see the Service Telemetry Framework guide.

To configure collectd and collectd-sensubility, complete the following steps:

  1. Create config.yaml in your home directory, for example, /home/templates/custom, and configure the MetricsQdrConnectors parameter to point to STF server side:

        - host: qdr-normal-sa-telemetry.apps.remote.tld
          port: 443
          role: inter-router
          sslProfile: sslProfile
          verifyHostname: false
        - name: sslProfile
  2. In the config.yaml file, list the plugins you want to use under CollectdExtraPlugins. You can also provide parameters in the ExtraConfig section. By default, collectd comes with the cpu, df, disk, hugepages, interface, load, memory, processes, tcpconns, unixsock, and uptime plugins. You can add additional plugins using the CollectdExtraPlugins parameter. You can also provide additional configuration information for the CollectdExtraPlugins using the ExtraConfig option. For example, to enable the virt plugin, and configure the connection string and the hostname format, use the following syntax:

        - disk
        - df
        - virt
        collectd::plugin::virt::connection: "qemu:///system"
        collectd::plugin::virt::hostname_format: "hostname uuid"

    Do not remove the unixsock plugin. Removal results in the permanent marking of the collectd container as unhealthy.

  3. Optional: To collect metric and event data through AMQ Interconnect, add the line MetricsQdrExternalEndpoint: true to the config.yaml file:

        MetricsQdrExternalEndpoint: true
  4. To enable collectd-sensubility, add the following environment configuration to the config.yaml file:

      CollectdEnableSensubility: true
      # Use this if there is restricted access for your checks by using the sudo command.
      # The rule will be created in /etc/sudoers.d for sensubility to enable it calling restricted commands via sensubility executor.
      CollectdSensubilityExecSudoRule: "collectd ALL = NOPASSWD: <some command or ALL for all commands>"
      # Connection URL to Sensu server side for reporting check results.
      CollectdSensubilityConnection: "amqp://sensu:sensu@<sensu server side IP>:5672//sensu"
      # Interval in seconds for sending keepalive messages to Sensu server side.
      CollectdSensubilityKeepaliveInterval: 20
      # Path to temporary directory where the check scripts are created.
      CollectdSensubilityTmpDir: /var/tmp/collectd-sensubility-checks
      # Path to shell used for executing check scripts.
      CollectdSensubilityShellPath: /usr/bin/sh
      # To improve check execution rate use this parameter and value to change the number of goroutines spawned for executing check  scripts.
      CollectdSensubilityWorkerCount: 2
      # JSON-formatted definition of standalone checks to be scheduled on client side. If you need to schedule checks
      # on overcloud nodes instead of Sensu server, use this parameter. Configuration is compatible with Sensu check definition.
      # For more information, see
      # There are some configuration options which sensubility ignores such as: extension, publish, cron, stdin, hooks.
          command: "ping -c1 -W1"
          interval: 30
      # The following parameters are used to modify standard, standalone checks for monitoring container health on overcloud nodes.
      # Do not modify these parameters.
      # CollectdEnableContainerHealthCheck: true
      # CollectdContainerHealthCheckCommand: <snip>
      # CollectdContainerHealthCheckInterval: 10
      # The Sensu server side event handler to use for events created by the container health check.
      # CollectdContainerHealthCheckHandlers:
      #   - handle-container-health-check
      # CollectdContainerHealthCheckOccurrences: 3
      # CollectdContainerHealthCheckRefresh: 90
  5. Deploy the overcloud. Include config.yaml, collectd-write-qdr.yaml, and one of the qdr-*.yaml files in your overcloud deploy command:

    $ openstack overcloud deploy
    -e  /home/templates/custom/config.yaml
    -e tripleo-heat-templates/environments/metrics/collectd-write-qdr.yaml
    -e tripleo-heat-templates/environments/metrics/qdr-form-controller-mesh.yaml
  6. Optional: To enable overcloud RabbitMQ monitoring, include the collectd-read-rabbitmq.yaml file in the overcloud deploy command.

Additional resources

3.3. Collecting data through AMQ Interconnect

To subscribe to the available AMQ Interconnect addresses for metric and event data consumption, create an environment file to expose AMQ Interconnect for client connections, and deploy the overcloud.


The Service Telemetry Operator simplifies the deployment of all data ingestion and data storage components for single cloud deployments. To share the data storage domain with multiple clouds, see Configuring multiple clouds in the Service Telemetry Framework guide.


  1. Log on to the Red Hat OpenStack Platform undercloud as the stack user.
  2. Create a configuration file called data-collection.yaml in the /home/stack directory.
  3. To enable external endpoints, add the MetricsQdrExternalEndpoint: true parameter to the data-collection.yaml file:

        MetricsQdrExternalEndpoint: true
  4. To enable collectd and AMQ Interconnect, add the following files to your Red Hat OpenStack Platform director deployment:

    • the data-collection.yaml environment file
    • the qdr-form-controller-mesh.yaml file that enables the client side AMQ Interconnect to connect to the external endpoints

      openstack overcloud deploy <other arguments>
        --templates /usr/share/openstack-tripleo-heat-templates \
        --environment-file <...other-environment-files...> \
        --environment-file /usr/share/openstack-tripleo-heat-templates/environments/metrics/qdr-form-controller-mesh.yaml \
        --environment-file /home/stack/data-collection.yaml
  5. Optional: To collect Ceilometer and collectd events, include ceilometer-write-qdr.yaml and collectd-write-qdr.yaml file in your overcloud deploy command.
  6. Deploy the overcloud.

Additional resources

3.4. Collectd plugin configurations

There are many configuration possibilities of Red Hat OpenStack Platform director. You can configure multiple collectd plugins to suit your environment. Each documented plugin has a description and example configuration. Some plugins have a table of metrics that you can query for from Grafana or Prometheus, and a list of options that you can configure, if available.

Additional resources

To view a complete list of collectd plugin options, see collectd plugins in the Service Telemetry Framework guide.

3.4.1. amqp1

Use the amqp1 plugin to write values to an amqp1 message bus, for example, AMQ Interconnect.

Example configuration

      - amqp1
      collectd::plugin::amqp1::send_queue_limit: 50

3.4.2. cpu

Use the cpu plugin to monitor the amount of time spent by the CPU in various states, for example, executing user code, executing system code, waiting for IO-operations, and being idle. The cpu plugin does not collect percentages. It collects jiffies, which are units of scheduling. On many Linux systems, there are approximately 100 jiffies in one second, but this does not mean that you get a percentage value. Depending on system load, hardware, whether or not the system is virtualized, and other factors, there can be more or less than 100 jiffies in one second. There is no guarantee that all states add up to 100, which is a requirement for percentages.

Table 3.1. cpu metrics



Amount of idle time



CPU blocked by interrupts



Amount of time running low priority processes



Amount of cycles spent in servicing interrupt requests



The percentage of time a virtual CPU waits for a real CPU while the hypervisor is servicing another virtual processor



Amount spent on system level (kernel)



Jiffies used by user processes



CPU waiting on outstanding I/O request



  • collectd::plugin::cpu::reportbystate
  • collectd::plugin::cpu::valuespercentage
  • collectd::plugin::cpu::reportbycpu
  • collectd::plugin::cpu::reportnumcpu
  • collectd::plugin::cpu::reportgueststate
  • collectd::plugin::cpu::subtractgueststate
  • collectd::plugin::cpu::interval

Example configuration

      - cpu
        collectd::plugin::cpu::reportbystate: true

Additional resources

For more information about configuring the cpu plugin, see the plugin manpages.

3.4.3. ovs_stats

Use the ovs_stats plugin to collect statistics of OVS connected interfaces. This plugin uses the OVSDB management protocol (RFC7047) monitor mechanism to get statistics from OVSDB.


  • collectd::plugin::ovs_stats::address
  • collectd::plugin::ovs_stats::bridges
  • collectd::plugin::ovs_stats::port
  • collectd::plugin::ovs_stats::socket

Example configuration

          - ovs_stats

Additional resources

3.4.4. mcelog

Use the mcelog plugin to send notifications and statistics relevant to Machine Check Exceptions when they occur. Configure mcelog to run on the platform in daemon mode and ensure that logging capabilities are enabled.

Example configuration

        CollectdExtraPlugins: mcelog
        CollectdEnableMcelog: true

Additional resources

3.4.5. pcie_errors

Use the pcie_errors plugin to poll PCI config space for baseline and Advanced Error Reporting (AER) errors, and to parse syslog for AER events. Errors are reported through notifications.


  • collectd::plugin::pcie_errors::reportbystate
  • collectd::plugin::pcie_errors::source
  • collectd::plugin::pcie_errors::access
  • collectd::plugin::pcie_errors::reportmasked
  • collectd::plugin::pcie_errors::persistentnotifications

Example configuration

       - pcie_errors

Additional resources

3.4.6. virt

Use the virt plugin to collect CPU, disk, network load, and other metrics for virtual machines on the host. Metrics are collected through the libvirt API.


  • collectd::plugin::virt::connection
  • collectd::plugin::virt::refresh_interval
  • collectd::plugin::virt::domain
  • collectd::plugin::virt::block_device
  • collectd::plugin::virt::interface_device
  • Collectd::plugin::virt::ignore_selected
  • collectd::plugin::virt::plugin_instance_format
  • collectd::plugin::virt::hostname_format
  • collectd::plugin::virt::interface_format
  • collectd::plugin::virt::extra_stats
  • collectd::plugin::virt::interval

Example configuration

    collectd::plugin::virt::plugin_instance_format: name

Additional resources

For more information about configuring the virt plugin, see the plugin manpages.

3.4.7. write_http

This output plugin submits values to an HTTP server using POST requests and encoding metrics with JSON or using the PUTVAL command.


  • collectd::plugin::write_http::url
  • collectd::plugin::write_http::password
  • collectd::plugin::write_http::username
  • collectd::plugin::write_http::verifypeer
  • collectd::plugin::write_http::verifyhost
  • collectd::plugin::write_http::cacert
  • collectd::plugin::write_http::capath
  • collectd::plugin::write_http::clientkey
  • collectd::plugin::write_http::clientcert
  • collectd::plugin::write_http::clientkeypass
  • collectd::plugin::write_http::header
  • collectd::plugin::write_http::sslversion
  • collectd::plugin::write_http::format
  • collectd::plugin::write_http::attribute
  • collectd::plugin::write_http::ttl
  • collectd::plugin::write_http::prefix
  • collectd::plugin::write_http::metrics
  • collectd::plugin::write_http::notifications
  • collectd::plugin::write_http::storerates
  • collectd::plugin::write_http::buffersize
  • collectd::plugin::write_http::lowspeedlimit
  • collectd::plugin::write_http::timeout
  • collectd::plugin::write_http::loghttperror

Example configuration

      - write_http
                url: ""
                metrics: true
                header: “X-Custom-Header: custom_value"

Additional resources

3.5. YAML files

You can include the following YAML files in your overcloud deploy command when you configure collectd:

  • collectd-read-rabbitmq.yaml: Enables and configures python-collect-rabbitmq to monitor the overcloud RabbitMQ instance.
  • collectd-write-qdr.yaml: Enables collectd to send telemetry and notification data through AMQ Interconnect.
  • qdr-edge-only.yaml: Enables deployment of AMQ Interconnect. Each overcloud node has one local qdrouterd service running and operating in edge mode. For example, sending received data straight to defined MetricsQdrConnectors.
  • qdr-form-controller-mesh.yaml: Enables deployment of AMQ Interconnect. Each overcloud node has one local qdrouterd service forming a mesh topology. For example, AMQ Interconnect routers on controllers operate in interior router mode, with connections to defined MetricsQdrConnectors, and AMQ Interconnect routers on other node types connect in edge mode to the interior routers running on the controllers.

Additional resources

For more information about configuring collectd, see Section 3.2, “Setting monitoring client parameters”.