Menu Close

Chapter 4. Gathering Information About the Environment

4.1. Monitoring and observability

This chapter provides a number of ways to monitor and obtain metrics and logs from your Red Hat Virtualization system. These methods include:

  • Using Data Warehouse and Grafana to monitor RHV
  • Sending metrics to a remote instance of Elasticsearch
  • Deploying Insights in Red Hat Virtualization Manager

4.1.1. Using Data Warehouse and Grafana to monitor RHV

4.1.1.1. Grafana overview

Grafana is a web-based UI tool used to display reports based on data collected from the oVirt Data Warehouse PostgreSQL database under the database name ovirt_engine_history. For details of the available report dashboards, see Grafana dashboards and Grafana website - dashboards.

Data from the Manager is collected every minute and aggregated in hourly and daily aggregations. The data is retained according to the scale setting defined in the Data Warehouse configuration during engine-setup (Basic or Full scale):

  • Basic (default) - samples data saved for 24 hours, hourly data saved for 1 month, daily data - no daily aggregations saved.
  • Full (recommended)- samples data saved for 24 hours, hourly data saved for 2 months, daily aggregations saved for 5 years.

Full sample scaling may require migrating the Data Warehouse to a separate virtual machine.

Note

Red Hat only supports installing the Data Warehouse database, the Data Warehouse service and Grafana all on the same machine as each other, even though you can install each of these components on separate machines from each other.

4.1.1.2. Installation

Grafana integration is enabled and installed by default when you run Red Hat Virtualization Manager engine-setup in a Stand Alone Manager installation, and in the Self-Hosted engine installation.

Note

Grafana is not installed by default and you may need to install it manually under some scenarios such as performing an upgrade from an earlier version of RHV, restoring a backup, or when the Data Warehouse is migrated to a separate machine.

To enable Grafana integration manually:

  1. Put the environment in global maintenance mode:

    # hosted-engine --set-maintenance --mode=global
  2. Log in to the machine where you want to install Grafana. This should be the same machine where the Data Warehouse is configured; usually the Manager machine.
  3. Run the engine-setup command as follows:

    # engine-setup --reconfigure-optional-components
  4. Answer Yes to install Grafana on this machine:

    Configure Grafana on this host (Yes, No) [Yes]:
  5. Disable global maintenance mode:

    # hosted-engine --set-maintenance --mode=none

To access the Grafana dashboards:

  • Go to https://<engine FQDN or IP address>/ovirt-engine-grafana

or

  • Click Monitoring Portal in the web administration welcome page for the Administration Portal.
4.1.1.2.1. Configuring Grafana for Single Sign-on

The Manager engine-setup automatically configures Grafana to allow existing users on the Manager to log in with SSO from the Administration Portal, but does not automatically create users. You need to create new users (Invite in the Grafana UI), confirm the new user, and then they can log in.

  1. Set an email address for the user in the Manager, if it is not already defined.
  2. Log in to Grafana with an existing admin user (the initially configured admin).
  3. Go to ConfigurationUsers and select Invite.
  4. Input the email address and name, and select a Role.
  5. Send the invitation using one of these options:

    • Select Send invite mail and click Submit. For this option, you need an operational local mail server configured on the Grafana machine.

      or

    • Select Pending Invites

      • Locate the entry you want
      • Select Copy invite
      • Copy and use this link to create the account by pasting it directly into a browser address bar, or by sending it to another user.

If you use the Pending Invites option, no email is sent, and the email address does not really need to exist - any valid looking address will work, as long as it’s configured as the email address of a Manager user.

To log in with this account:

  1. Log in to the Red Hat Virtualization web administration welcome page using the account that has this email address.
  2. Select Monitoring Portal to open the Grafana dashboard.
  3. Select Sign in with oVirt Engine Auth.

4.1.1.3. Built-in Grafana dashboards

The following dashboards are available in the initial Grafana setup to report Data Center, Cluster, Host, and Virtual Machine data:

Table 4.1. Built-in Grafana dashboards

Dashboard typeContent

Executive dashboards

  • System dashboard - resource usage and up-time for hosts and storage domains in the system, according to the latest configurations.
  • Data Center dashboard - resource usage, peaks, and up-time for clusters, hosts, and storage domains in a selected data center, according to the latest configurations.
  • Cluster dashboard - resource usage, peaks, over-commit, and up-time for hosts and virtual machines in a selected cluster, according to the latest configurations.
  • Host dashboard - latest and historical configuration details and resource usage metrics of a selected host over a selected period.
  • Virtual Machine dashboard - latest and historical configuration details and resource usage metrics of a selected virtual machine over a selected period.
  • Executive dashboard - user resource usage and number of operating systems for hosts and virtual machines in selected clusters over a selected period.

Inventory dashboards

  • Inventory dashboard - number of hosts, virtual machines, and running virtual machines, resources usage and over-commit rates for selected data centers, according to the latest configurations.
  • Hosts Inventory dashboard - FQDN, VDSM version, operating system, CPU model, CPU cores, memory size, create date, delete date, and hardware details for selected hosts, according to the latest configurations.
  • Storage Domains Inventory dashboard - domain type, storage type, available disk size, used disk size, total disk size, creation date, and delete date for selected storage domains over a selected period.
  • Virtual Machines Inventory dashboard - template name, operating system, CPU cores, memory size, create date, and delete date for selected virtual machines, according to the latest configurations.

Service Level dashboards

  • Uptime dashboard - planned downtime, unplanned downtime, and total time for the hosts, high availability virtual machines, and all virtual machines in selected clusters in a selected period.
  • Hosts Uptime dashboard - the uptime, planned downtime, and unplanned downtime for selected hosts in a selected period.
  • Virtual Machines Uptime dashboard - the uptime, planned downtime, and unplanned downtime for selected virtual machines in a selected period.
  • Cluster Quality of Service

    • Hosts dashboard - the time selected hosts have performed above and below the CPU and memory threshold in a selected period.
    • Virtual Machines dashboard - the time selected virtual machines have performed above and below the CPU and memory threshold in a selected period.

Trend dashboards

  • Trend dashboard - usage rates for the 5 most and least utilized virtual machines and hosts by memory and by CPU in selected clusters over a selected period.
  • Hosts Trend dashboard - resource usage (number of virtual machines, CPU, memory, and network Tx/Rx) for selected hosts over a selected period.
  • Virtual Machines Trend dashboard -resource usage (CPU, memory, network Tx/Rx, disk I/O) for selected virtual machines over a selected period.
  • Hosts Resource Usage dashboard - daily and hourly resource usage (number of virtual machines, CPU, memory, network Tx/Rx) for selected hosts in a selected period.
  • Virtual Machines Resource Usage dashboard - daily and hourly resource usage (CPU, memory, network Tx/Rx, disk I/O) for selected virtual machines in a selected period.
Note

The Grafana dashboards includes direct links to the Red Hat Virtualization Administration Portal, allowing you to quickly view additional details for your clusters, hosts, and virtual machines.

4.1.1.4. Customized Grafana dashboards

You can create customized dashboards or copy and modify existing dashboards according to your reporting needs.

Note

Built-in dashboards cannot be customized.

4.1.2. Sending metrics and logs to a remote instance of Elasticsearch

Note

Red Hat does not own or maintain Elasticsearch. You need to have a working familiarity with Elasticsearch setup and maintenance to deploy this option.

You can configure the Red Hat Virtualization Manager and hosts to send metrics data and logs to your existing Elasticsearch instance.

To do this, run the Ansible role that configures collectd and rsyslog on the Manager and all hosts to collect engine.log, vdsm.log, and collectd metrics, and send them to your Elasticsearch instance.

For more information, including a full list with explanations of available Metrics Schema, see Sending RHV monitoring data to a remote Elasticsearch instance.

4.1.2.1. Installing collectd and rsyslog

Deploy collectd and rsyslog on the hosts to collect logs and metrics.

Note

You do not need to repeat this procedure for new hosts. Every new host that is added is automatically configured by the Manager to send the data to Elasticsearch during host-deploy.

Procedure

  1. Log in to the Manager machine using SSH.
  2. Copy /etc/ovirt-engine-metrics/config.yml.example to create /etc/ovirt-engine-metrics/config.yml.d/config.yml:

    # cp /etc/ovirt-engine-metrics/config.yml.example /etc/ovirt-engine-metrics/config.yml.d/config.yml
  3. Edit the ovirt_env_name and elasticsearch_host parameters in config.yml and save the file. The following additional parameters can be added to the file:

    use_omelasticsearch_cert: false
    rsyslog_elasticsearch_usehttps_metrics: !!str off
    rsyslog_elasticsearch_usehttps_logs: !!str off
    • When using certificates, set use_omelasticsearch_cert to true.
    • To disable logs or metrics, use the rsyslog_elasticsearch_usehttps_metrics and/or rsyslog_elasticsearch_usehttps_logs parameters.
  4. Deploy collectd and rsyslog on the hosts:

    # /usr/share/ovirt-engine-metrics/setup/ansible/configure_ovirt_machines_for_metrics.sh

    The configure_ovirt_machines_for_metrics.sh script runs an Ansible role that includes linux-system-roles (see Administration and configuration tasks using System Roles in RHEL) and uses it to deploy and configure rsyslog on the host. rsyslog collects metrics from collectd and sends them to Elasticsearch.

4.1.2.2. Logging schema and analyzing logs

Use the Discover page to interactively explore data collected from RHV. Each set of results that is collected is referred to as a document. Documents are collected from the following log files:

  • engine.log - contains all oVirt Engine UI crashes, Active Directory lookups, database issues, and other events.
  • vdsm.log - the log file for the VDSM, the Manager’s agent on the virtualization hosts, and contains host-related events.

The following fields are available:

parameterdescription

_id

The unique ID of the document

_index

The ID of the index to which the document belongs. The index with the project.ovirt-logs prefix is the only relevant index in the Discover page.

hostname

For the engine.log this is the hostname of the Manager. For the vdsm.log this is the hostname of the host.

level

The log record severity: TRACE, DEBUG, INFO, WARN, ERROR, FATAL.

message

The body of the document message.

ovirt.class

The name of a Java class that produced this log.

ovirt.correlationid

For the engine.log only. This ID is used to correlate the multiple parts of a single task performed by the Manager.

ovirt.thread

The name of a Java thread inside which the log record was produced.

tag

Predefined sets of metadata that can be used to filter the data.

@timestamp

The [time](Troubleshooting#information-is-missing-from-kibana) that the record was issued.

_score

N/A

_type

N/A

ipaddr4

The machine’s IP address.

ovirt.cluster_name

For the vdsm.log only. The name of the cluster to which the host belongs.

ovirt.engine_fqdn

The Manager’s FQDN.

ovirt.module_lineno

The file and line number within the file that ran the command defined in ovirt.class.

4.1.3. Deploying Insights

To deploy Red Hat Insights on an existing Red Hat Enterprise Linux (RHEL) system with Red Hat Virtualization Manager installed, complete these tasks:

  • Register the system to the Red Hat Insights application.
  • Enable data collection from the Red Hat Virtualization environment.

4.1.3.1. Register the system to Red Hat Insights

Register the system to communicate with the Red Hat Insights service and to view results displayed in the Red Hat Insights console.

[root@server ~]# insights-client --register

4.1.3.2. Enable data collection from the Red Hat Virtualization environment

Modify the /etc/ovirt-engine/rhv-log-collector-analyzer/rhv-log-collector-analyzer.conf file to include the following line:

upload-json=True

4.1.3.3. View your Insights results in the Insights Console

System and infrastructure results can be viewed in the Insights console. The Overview tab provides a dashboard view of current risks to your infrastructure. From this starting point, you can investigate how a specific rule is affecting your system, or take a system-based approach to view all the rule matches that pose a risk to the system.

Procedure

  1. Select Rule hits by severity to view rules by the Total Risk they pose to your infrastructure (Critical, Important, Moderate, or Low).

    Or

  2. Select Rule hits by category to see the type of risk they pose to your infrastructure (Availability, Stability, Performance, or Security).
  3. Search for a specific rule by name, or scroll through the list of rules to see high-level information about risk, systems exposed, and availability of Ansible Playbook to automate remediation.
  4. Click a rule to see a description of the rule, learn more from relevant knowledge base articles, and view a list of systems that are affected.
  5. Click a system to see specific information about detected issues and steps to resolve the issue.