Red Hat Training

A Red Hat training course is available for Red Hat Ceph Storage

Chapter 5. Monitoring Ceph Clusters Running in Containers with the Red Hat Ceph Storage Dashboard

The Red Hat Ceph Storage Dashboard provides a monitoring dashboard to visualize the state of a Ceph Storage Cluster. Also, the Red Hat Ceph Storage Dashboard architecture provides a framework for additional modules to add functionality to the storage cluster.

Prerequisites

  • A Red Hat Ceph Storage cluster running in containers

5.1. The Red Hat Ceph Storage Dashboard

The Red Hat Ceph Storage Dashboard provides a monitoring dashboard for Ceph clusters to visualize the storage cluster state. The dashboard is accessible from a web browser and provides a number of metrics and graphs about the state of the cluster, Monitors, OSDs, Pools, or the network.

With the previous releases of Red Hat Ceph Storage, monitoring data was sourced through a collectd plugin, which sent the data to an instance of the Graphite monitoring utility. Starting with Red Hat Ceph Storage 3.3, monitoring data is sourced directly from the ceph-mgr daemon, using the ceph-mgr Prometheus plugin.

The introduction of Prometheus as the monitoring data source simplifies deployment and operational management of the Red Hat Ceph Storage Dashboard solution, along with reducing the overall hardware requirements. By sourcing the Ceph monitoring data directly, the Red Hat Ceph Storage Dashboard solution is better able to support Ceph clusters deployed in containers.

Note

With this change in architecture, there is no migration path for monitoring data from Red Hat Ceph Storage 2.x and 3.0 to Red Hat Ceph Storage 3.3.

The Red Hat Ceph Storage Dashboard uses the following utilities:

  • The Ansible automation application for deployment.
  • The embedded Prometheus ceph-mgr plugin.
  • The Prometheus node-exporter daemon, running on each node of the storage cluster.
  • The Grafana platform to provide a user interface and alerting.

The Red Hat Ceph Storage Dashboard supports the following features:

General Features
  • Support for Red Hat Ceph Storage 3.1 and higher
  • SELinux support
  • Support for FileStore and BlueStore OSD back ends
  • Support for encrypted and non-encrypted OSDs
  • Support for Monitor, OSD, the Ceph Object Gateway, and iSCSI roles
  • Initial support for the Metadata Servers (MDS)
  • Drill down and dashboard links
  • 15 second granularity
  • Support for Hard Disk Drives (HDD), Solid-state Drives (SSD), Non-volatile Memory Express (NVMe) interface, and Intel® Cache Acceleration Software (Intel® CAS)
Node Metrics
  • CPU and RAM usage
  • Network load
Configurable Alerts
  • Out-of-Band (OOB) alerts and triggers
  • Notification channel is automatically defined during the installation
  • The Ceph Health Summary dashboard created by default

    See the Red Hat Ceph Storage Dashboard Alerts section for details.

Cluster Summary
  • OSD configuration summary
  • OSD FileStore and BlueStore summary
  • Cluster versions breakdown by role
  • Disk size summary
  • Host size by capacity and disk count
  • Placement Groups (PGs) status breakdown
  • Pool counts
  • Device class summary, HDD vs. SSD
Cluster Details
  • Cluster flags status (noout, nodown, and others)
  • OSD or Ceph Object Gateway hosts up and down status
  • Per pool capacity usage
  • Raw capacity utilization
  • Indicators for active scrub and recovery processes
  • Growth tracking and forecast (raw capacity)
  • Information about OSDs that are down or near full, including the OSD host and disk
  • Distribution of PGs per OSD
  • OSDs by PG counts, highlighting the over or under utilized OSDs
OSD Performance
  • Information about I/O operations per second (IOPS) and throughput by pool
  • OSD performance indicators
  • Disk statistics per OSD
  • Cluster wide disk throughput
  • Read/write ratio (client IOPS)
  • Disk utilization heat map
  • Network load by Ceph role
The Ceph Object Gateway Details
  • Aggregated load view
  • Per host latency and throughput
  • Workload breakdown by HTTP operations
The Ceph iSCSI Gateway Details
  • Aggregated views
  • Configuration
  • Performance
  • Per Gateway resource utilization
  • Per client load and configuration
  • Per Ceph Block Device image performance

5.2. Installing the Red Hat Ceph Storage Dashboard

The Red Hat Ceph Storage Dashboard provides a visual dashboard to monitor various metrics in a running Ceph Storage Cluster.

Note

For information on upgrading the Red Hat Ceph Storage Dashboard see Upgrading Red Hat Ceph Storage Dashboard in the Installation Guide for Red Hat Enterprise Linux.

Prerequisites

  • A Ceph Storage cluster running in containers deployed with the Ansible automation application.
  • The storage cluster nodes use Red Hat Enterprise Linux 7.

    For details, see Section 1.1.1, “Registering Red Hat Ceph Storage Nodes to the CDN and Attaching Subscriptions”.

  • A separate node, the Red Hat Ceph Storage Dashboard node, for receiving data from the cluster nodes and providing the Red Hat Ceph Storage Dashboard.
  • Prepare the Red Hat Ceph Storage Dashboard node:

    • Register the system with the Red Hat Content Delivery Network (CDN), attach subscriptions, and enable Red Hat Enterprise Linux repositories. For details, see Section 1.1.1, “Registering Red Hat Ceph Storage Nodes to the CDN and Attaching Subscriptions”.
    • Enable the Tools repository on all nodes.

      For details, see the Enabling the Red Hat Ceph Storage Repositories section in the Red Hat Ceph Storage 3 Installation Guide for Red Hat Enterprise Linux.

    • If using a firewall, then ensure that the following TCP ports are open:

      Table 5.1. TCP Port Requirements

      PortUseWhere?

      3000

      Grafana

      The Red Hat Ceph Storage Dashboard node.

      9090

      Basic Prometheus graphs

      The Red Hat Ceph Storage Dashboard node.

      9100

      Prometheus' node-exporter daemon

      All storage cluster nodes.

      9283

      Gathering Ceph data

      All ceph-mgr nodes.

      9287

      Ceph iSCSI gateway data

      All Ceph iSCSI gateway nodes.

      For more details see the Using Firewalls chapter in the Security Guide for Red Hat Enterprise Linux 7.

Procedure

Run the following commands on the Ansible administration node as the root user.

  1. Install the cephmetrics-ansible package.

    [root@admin ~]# yum install cephmetrics-ansible
  2. Using the Ceph Ansible inventory as a base, add the Red Hat Ceph Storage Dashboard node under the [ceph-grafana] section of the Ansible inventory file, by default located at /etc/ansible/hosts.

    [ceph-grafana]
    $HOST_NAME

    Replace:

    • $HOST_NAME with the name of the Red Hat Ceph Storage Dashboard node

    For example:

    [ceph-grafana]
    node0
  3. Change to the /usr/share/cephmetrics-ansible/ directory.

    [root@admin ~]# cd /usr/share/cephmetrics-ansible
  4. Run the Ansible playbook.

    [root@admin cephmetrics-ansible]# ansible-playbook -v playbook.yml
    Important

    Every time you update the cluster configuration, for example, you add or remove a MON or OSD node, you must re-run the cephmetrics Ansible playbook.

    Note

    The cephmetrics Ansible playbook does the following actions:

    • Updates the ceph-mgr instance to enable the prometheus plugin and opens TCP port 9283.
    • Deploys the Prometheus node-exporter daemon to each node in the storage cluster.

      • Opens TCP port 9100.
      • Starts the node-exporter daemon.
    • Deploys Grafana and Prometheus containers under Docker/systemd on the Red Hat Ceph Storage Dashboard node.

      • Prometheus is configured to gather data from the ceph-mgr nodes and the node-exporters running on each ceph host
      • Opens TCP port 3000.
      • The dashboards, themes and user accounts are all created in Grafana.
      • Outputs the URL of Grafana for the administrator.

5.3. Accessing the Red Hat Ceph Storage Dashboard

Accessing the Red Hat Ceph Storage Dashboard gives you access to the web-based management tool for administrating Red Hat Ceph Storage clusters.

Prerequisites

Procedure

  1. Enter the following URL to a web browser:

    http://$HOST_NAME:3000

    Replace:

    • $HOST_NAME with the name of the Red Hat Ceph Storage Dashboard node

    For example:

    http://cephmetrics:3000
  2. Enter the password for the admin user. If you did not set the password during the installation, use admin, which is the default password.

    Once logged in, you are automatically placed on the Ceph At a Glance dashboard. The Ceph At a Glance dashboard provides a high-level overviews of capacity, performance, and node-level performance information.

    Example

    RHCS Dashboard Grafana Ceph At a Glance page

Additional Resources

5.4. Changing the default Red Hat Ceph Storage dashboard password

The default user name and password for accessing the Red Hat Ceph Storage Dashboard is set to admin and admin. For security reasons, you might want to change the password after the installation.

Note

To prevent the password from resetting to the default value, update the custom password in the /usr/share/cephmetrics-ansible/group_vars/all.yml file.

Procedure

  1. Click the Grafana icon in the upper-left corner.
  2. Hover over the user name you want to modify the password for. In this case admin.
  3. Click Profile.
  4. Click Change Password.
  5. Enter the new password twice and click Change Password.

Additional Resource

5.5. The Prometheus plugin for Red Hat Ceph Storage

As a storage administrator, you can gather performance data, export that data using the Prometheus plugin module for the Red Hat Ceph Storage Dashboard, and then perform queries on this data. The Prometheus module allows ceph-mgr to expose Ceph related state and performance data to a Prometheus server.

5.5.1. Prerequisites

  • Running Red Hat Ceph Storage 3.1 or higher.
  • Installation of the Red Hat Ceph Storage Dashboard.

5.5.2. The Prometheus plugin

The Prometheus plugin provides an exporter to pass on Ceph performance counters from the collection point in ceph-mgr. The Red Hat Ceph Storage Dashboard receives MMgrReport messages from all MgrClient processes, such as Ceph Monitors and OSDs. A circular buffer of the last number of samples contains the performance counter schema data and the actual counter data. This plugin creates an HTTP endpoint and retrieves the latest sample of every counter when polled. The HTTP path and query parameters are ignored; all extant counters for all reporting entities are returned in a text exposition format.

Additional Resources

5.5.3. Managing the Prometheus environment

To monitor a Ceph storage cluster with Prometheus you can configure and enable the Prometheus exporter so the metadata information about the Ceph storage cluster can be collected.

Prerequisites

  • A running Red Hat Ceph Storage 3.1 cluster
  • Installation of the Red Hat Ceph Storage Dashboard

Procedure

  1. As the root user, open and edit the /etc/prometheus/prometheus.yml file.

    1. Under the global section, set the scrape_interval and evaluation_interval options to 15 seconds.

      Example

      global:
        scrape_interval:     15s
        evaluation_interval: 15s

    2. Under the scrape_configs section, add the honor_labels: true option, and edit the targets, and instance options for each of the ceph-mgr nodes.

      Example

      scrape_configs:
        - job_name: 'node'
          honor_labels: true
          static_configs:
          - targets: [ 'node1.example.com:9100' ]
            labels:
              instance: "node1.example.com"
          - targets: ['node2.example.com:9100']
            labels:
              instance: "node2.example.com"

      Note

      Using the honor_labels option enables Ceph to output properly-labelled data relating to any node in the Ceph storage cluster. This allows Ceph to export the proper instance label without Prometheus overwriting it.

    3. To add a new node, simply add the targets, and instance options in the following format:

      Example

      - targets: [ 'new-node.example.com:9100' ]
        labels:
          instance: "new-node"

      Note

      The instance label has to match what appears in Ceph’s OSD metadata instance field, which is the short host name of the node. This helps to correlate Ceph stats with the node’s stats.

  2. Add Ceph targets to the /etc/prometheus/ceph_targets.yml file in the following format.

    Example

    [
        {
            "targets": [ "cephnode1.example.com:9283" ],
            "labels": {}
        }
    ]

  3. Enable the Prometheus module:

    # ceph mgr module enable prometheus

5.5.4. Working with the Prometheus data and queries

The statistic names are exactly as Ceph names them, with illegal characters translated to underscores, and ceph_ prefixed to all names. All Ceph daemon statistics have a ceph_daemon label that identifies the type and ID of the daemon they come from, for example: osd.123. Some statistics can come from different types of daemons, so when querying you will want to filter on Ceph daemons starting with osd to avoid mixing in the Ceph Monitor and RocksDB stats. The global Ceph storage cluster statistics have labels appropriate to what they report on. For example, metrics relating to pools have a pool_id label. The long running averages that represent the histograms from core Ceph are represented by a pair of sum and count performance metrics.

The following example queries can be used in the Prometheus expression browser:

Show the physical disk utilization of an OSD

(irate(node_disk_io_time_ms[1m]) /10) and on(device,instance) ceph_disk_occupation{ceph_daemon="osd.1"}

Show the physical IOPS of an OSD as seen from the operating system

irate(node_disk_reads_completed[1m]) + irate(node_disk_writes_completed[1m]) and on (device, instance) ceph_disk_occupation{ceph_daemon="osd.1"}

Pool and OSD metadata series

Special data series are output to enable the displaying and the querying on certain metadata fields. Pools have a ceph_pool_metadata field, for example:

ceph_pool_metadata{pool_id="2",name="cephfs_metadata_a"} 1.0

OSDs have a ceph_osd_metadata field, for example:

ceph_osd_metadata{cluster_addr="172.21.9.34:6802/19096",device_class="ssd",ceph_daemon="osd.0",public_addr="172.21.9.34:6801/19096",weight="1.0"} 1.0

Correlating drive statistics with node_exporter

The Prometheus output from Ceph is designed to be used in conjunction with the generic node monitoring from the Prometheus node exporter. Correlation of Ceph OSD statistics with the generic node monitoring drive statistics, special data series are output, for example:

ceph_disk_occupation{ceph_daemon="osd.0",device="sdd", exported_instance="node1"}

To get disk statistics by an OSD ID, use either the and operator or the asterisk (*) operator in the Prometheus query. All metadata metrics have the value of 1 so they act neutral with asterisk operator. Using asterisk operator allows to use group_left and group_right grouping modifiers, so that the resulting metric has additional labels from one side of the query. For example:

rate(node_disk_bytes_written[30s]) and on (device,instance) ceph_disk_occupation{ceph_daemon="osd.0"}

Using label_replace

The label_replace function can add a label to, or alter a label of, a metric within a query. To correlate an OSD and its disks write rate, the following query can be used:

label_replace(rate(node_disk_bytes_written[30s]), "exported_instance", "$1", "instance", "(.*):.*") and on (device,exported_instance) ceph_disk_occupation{ceph_daemon="osd.0"}

Additional Resources

  • See Prometheus querying basics for more information on constructing queries.
  • See Prometheus' label_replace documentation for more information.

5.5.5. Using the Prometheus expression browser

Use the builtin Prometheus expression browser to run queries against the collected data.

Prerequisites

  • A running Red Hat Ceph Storage 3.1 cluster
  • Installation of the Red Hat Ceph Storage Dashboard

Procedure

  1. Enter the URL for the Prometh the web browser:

    http://$DASHBOARD_SEVER_NAME:9090/graph

    Replace…​

    • $DASHBOARD_SEVER_NAME with the name of the Red Hat Ceph Storage Dashboard server.
  2. Click on Graph, then type in or paste the query into the query window and press the Execute button.

    1. View the results in the console window.
  3. Click on Graph to view the rendered data.

Additional Resources

5.5.6. Additional Resources

5.6. The Red Hat Ceph Storage Dashboard alerts

This section includes information about alerting in the Red Hat Ceph Storage Dashboard.

5.6.1. Prerequisites

5.6.2. About Alerts

The Red Hat Ceph Storage Dashboard supports alerting mechanism that is provided by the Grafana platform. You can configure the dashboard to send you a notification when a metric that you are interested in reaches certain value. Such metrics are in the Alert Status dashboard.

By default, Alert Status already includes certain metrics, such as Overall Ceph Health, OSDs Down, or Pool Capacity. You can add metrics that you are interested in to this dashboard or change their trigger values.

Here is a list of the pre-defined alerts that are included with Red Hat Ceph Storage Dashboard:

  • Overall Ceph Health
  • Disks Near Full (>85%)
  • OSD Down
  • OSD Host Down
  • PG’s Stuck Inactive
  • OSD Host Less - Free Capacity Check
  • OSD’s With High Response Times
  • Network Errors
  • Pool Capacity High
  • Monitors Down
  • Overall Cluster Capacity Low
  • OSDs With High PG Count

5.6.3. Accessing the Alert Status dashboard

Certain Red Hat Ceph Storage Dashboard alerts are configured by default in the Alert Status dashboard. This section shows two ways to access it.

Procedure

To access the dashboard:

  • In the main At the Glance dashboard, click the Active Alerts panel in the upper-right corner.

Or..

  • Click the dashboard menu from in the upper-left corner next to the Grafana icon. Select Alert Status.

5.6.4. Configuring the Notification Target

A notification channel called cephmetrics is automatically created during installation. All preconfigured alerts reference the cephmetrics channel but before you can receive the alerts, complete the notification channel definition by selecting the desired notification type. The Grafana platform supports a number of different notification types including email, Slack, and PagerDuty.

Procedure
  • To configure the notification channel, follow the instructions in the Alert Notifications section on the Grafana web page.

5.6.5. Changing the Default Alerts and Adding New Ones

This section explains how to change the trigger value on already configured alerts and how to add new alerts to the Alert Status dashboard.

Procedure
  • To change the trigger value on alerts or to add new alerts, follow the Alerting Engine & Rules Guide on the Grafana web pages.

    Important

    To prevent overriding custom alerts, the Alert Status dashboard will not be updated when upgrading the Red Hat Ceph Storage Dashboard packages when you change the trigger values or add new alerts.

Additional Resources