Monitoring Ceph with Nagios
To deploy Nagios Core with Ceph requires:
- A running Ceph cluster.
- A running Nagios core server.
In lieu of Nagios Core, you may also substitute the more feature rich commercial version, Nagios XI.
Installing Nagios Core
Installing Nagios Core involves downloading the Nagios Core source code; then, configuring, making and installing it on the host that will run Nagios Core instance.
The following sections describe the process for RHEL. For Ubuntu users, see Installing Nagios Core from Source for installation details.
Installing Nagios Pre-requisites
Install the pre-requisites.
# yum install -y httpd php php-cli gcc glibc glibc-common gd gd-devel net-snmp openssl-devel wget unzip
Open port 80 for httpd.
# firewall-cmd --zone=public --add-port=80/tcp
# firewall-cmd --zone=public --add-port=80/tcp --permanent
Creating a Nagios User and Group
Create a user and group for Nagios Core.
# useradd nagios
# passwd nagios
# groupadd nagcmd
# usermod -a -G nagcmd nagios
# usermod -a -G nagcmd apache
Download Nagios Source Code and Plug-ins
Download the latest version of Nagios Core and Plug-ins.
# wget https://assets.nagios.com/downloads/nagioscore/releases/nagios-4.3.1.tar.gz
# wget http://www.nagios-plugins.org/download/nagios-plugins-2.2.1.tar.gz
# tar zxf nagios-4.3.1.tar.gz
# tar zxf nagios-plugins-2.2.1.tar.gz
# cd nagios-4.3.1
Make and Install Nagios Core
To make and install Nagios Core, first run ./configure.
# ./configure --with-command-group=nagcmd
After running ./configure, compile the Nagios Core source code.
# make all
After making Nagios Core, install it.
# make install
# make install-init
# make install-config
# make install-commandmode
# make install-webconf
Copy the event handlers and change their ownership.
# cp -R contrib/eventhandlers/ /usr/local/nagios/libexec/
# chown -R nagios:nagios /usr/local/nagios/libexec/eventhandlers
Finally, run the pre-flight check.
# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Make and Install Nagios Core Plug-ins
Make and install the Nagios Core plug-ins.
# cd ../nagios-plugins-2.2.1
# ./configure --with-nagios-user=nagios --with-nagios-group=nagios
# make
# make install
Create a Default Nagios Core User
Create a user for the Nagios Core user interface.
$ sudo htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
NOTE: This was tested with the root user # and using sudo. Using sudo worked properly.
IMPORTANT: If adding a user other than nagiosadmin, ensure the /usr/local/nagios/etc/cgi.cfg file gets updated with the username too.
Also modify the /usr/local/nagios/etc/objects/contacts.cfg file with the user name, full name and email address as needed.
Start Nagios
Add Nagios Core as a service and enable it. Then start the Nagios Core daemon and Apache.
# chkconfig --add nagios
# chkconfig --level 35 nagios on
# systemctl start nagios
# systemctl enable httpd
# systemctl start httpd
Log in to Nagios Core
With the Nagios up and running, log in to the web user interface.
http://<ip-address>/nagios
Nagios Core will prompt for a user name and password. Input the login and password of the default Nagios Core user.
Installing Nagios Remote Plug-in Executor (NRPE)
To monitor Ceph Storage cluster hosts, install Nagios Plug-ins, the Ceph plug-ins and the NRPE add-on to each of the Ceph cluster's hosts.
For demonstration purposes, this section adds NRPE to a Ceph monitor node with the hostname mon. Repeat the remaining procedures on all Ceph nodes that Nagios should monitor.
For Ubuntu-based clusters, see Nagios NRPE Documentation.
Install Prerequisites
NRPE requires OpenSSL. Install the following libraries first.
# yum install openssl openssl-devel
Create a Nagios User
Installation requires a Nagios user. So create the user first.
# useradd nagios
# passwd nagios
Download, Make and Install the Nagios Plug-ins
Download the latest version of the Nagios plug-ins. Then, make and install them.
# wget http://www.nagios-plugins.org/download/nagios-plugins-2.2.1.tar.gz
# tar zxf nagios-plugins-2.2.1.tar.gz
# cd nagios-plugins-2.2.1
# ./configure
# make
# make install
Download, Make and Install the Nagios Ceph Plug-ins
Download the latest verion of the Ceph plug-ins. See
https://github.com/valerytschopp/ceph-nagios-plugins for details.
# cd ~
# git clone --recursive https://github.com/valerytschopp/ceph-nagios-plugins.git
# cd ceph-nagios-plugins
# make dist
# make install
Install xinetd
NRPE uses xinetd for communication. Install it before installing the NRPE module.
# yum install xinetd
Download, Make and Install Nagios NRPE
# wget https://github.com/NagiosEnterprises/nrpe/releases/download/nrpe-3.1.0/nrpe-3.1.0.tar.gz
# tar xvfz nrpe-3.1.0.tar.gz
# cd nrpe-3.1.0
# ./configure
# make all
# make install-groups-users
# make install
# make install-config
# make install-init
Add nrpe 5666/tcp to the /etc/services file.
Start the Services
# service xinetd restart
# systemctl reload xinetd
# systemctl enable nrpe && systemctl start nrpe
Open Port 5666
Open port 5666 to allow communication with NRPE.
# firewall-cmd --zone=public --add-port=5666/tcp
# firewall-cmd --zone=public --add-port=5666/tcp --permanent
Add the Nagios Core Server IP Address
In order for the Nagios Core server to access NRPE on a remote machine, the remote machine's xinetd and NRPE configurations must be updated with the IP address of the Nagios Core server.
Edit the xinetd configuration with the Nagios server's IP address.
# vim /etc/xinetd.d/nrpe
# default: off
# description: NRPE (Nagios Remote Plugin Executor)
service nrpe
{
disable = yes
socket_type = stream
port = 5666
wait = no
user = nagios
group = nagios
server = /usr/local/nagios/bin/nrpe
server_args = -c /usr/local/nagios/etc/nrpe.cfg --inetd
only_from = 127.0.0.1,<ip-address-of-nagios-core>
log_on_success =
}
Add the IP address of the Nagios Core server to the only_from setting. Then, restrat xinetd.
# service xinetd restart
Edit the NRPE configuration with the Nagios server's IP address.
# vim /usr/local/nagios/etc/nrpe.cfg
allowed_hosts=127.0.0.1,<ip-address-of-nagios-core>
Add the IP address of the Nagios Core server to the allowed_hosts setting. Then, restrat nrpe.
# systemctl restart nrpe
Test the Installation
Ensure that the make and install procedures worked.
# /usr/local/nagios/libexec/check_nrpe -H localhost
It should echo NRPE v3.1.0-rc1 if it is working correctly.
Configure the Nagios Core Server
After configuring NRPE on a Ceph host, configure the Nagios Core Server to recognize and monitor the host.
Install the check_nrpe Plug-in
# wget https://github.com/NagiosEnterprises/nrpe/releases/download/nrpe-3.1.0/nrpe-3.1.0.tar.gz
# tar xvfz nrpe-3.1.0.tar.gz
# cd nrpe-3.1.0
# ./configure
# make check_nrpe
# make install-plugin
Check to Ensure Connectivity
Ensure that the make and install procedures worked and that there is connectivity between the Nagios Core server and the remote host containing NRPE.
# /usr/local/nagios/libexec/check_nrpe -H <IP-address-of-remote-host>
It should echo NRPE v3.1.0-rc1 if it is working correctly.
Create a Configuration for the Remote Host
# cd /usr/local/nagios/etc/objects
# cp localhost.cfg mon.cfg
Replace localhost with the hostname of the remote host, and the loopback IP address with the IP address of the remote host. Finally, delete or comment out the Host Group definition.
Change the file ownership to nagios.
# chown nagios:nagios mon.cfg
Add a cfg_file= reference to the mon.cfg file in /usr/local/nagios/etc/nagios.cfg.
# vim /usr/local/nagios/etc/nagios.cfg
For example:
cfg_file=/usr/local/nagios/etc/objects/mon.cfg
Then, restart the Nagios server.
# systemctl restart nagios
Configure Ceph Plug-ins
There are some open source Ceph plug-ins provided at https://github.com/valerytschopp/ceph-nagios-plugins. They include:
check_ceph_df: This plug-in outputs messages related toceph dffor the entire cluster or for individual pools. This plug-in only needs to run on Ceph monitor hosts. Multiple instances may be configured to monitor individual pools.check_ceph_health: This plug-in outputs the result ofceph health. This plug-in only needs to run on Ceph monitor hosts.check_ceph_mon: This plug-in checks a single monitor and returnsOKif the monitor is up and running orWARNif it is down or missing. This plug-in only needs to run on Ceph monitor hosts.check_ceph_osd: This plug-in checks an OSD host or a single OSD and returnsOKif the OSD is up and running orWARNif it is down. This plug-in only needs to run on Ceph OSD hosts.check_ceph_rgw: This plug-in checks a single Ceph Object Gateway and returnsOKand the buckets and data usage if it is up and running orWARNif it is down or missing. This plug-in only needs to run on Ceph Object Gateway hosts.check_ceph_mds: This plug-in checks a single metadata server and returnsOKif it is up and running,WARNif it is laggy andErrorif it is down or missing. This plug-in only needs to run on Ceph metadata server hosts.
These plug-ins get installed on the appropriate Ceph hosts. The following sections describe how to configure the ceph health plug-in on a monitor host.
Create Keyring and Key
Log in to the monitor server and create a Ceph key and keyring for the Nagios.
# ssh mon
# cd /etc/ceph
# ceph auth get-or-create client.nagios mon 'allow r' > client.nagios.keyring
Each plug-in will require authentication. Repeat this procedure for each host that contains a plug-in.
Test the Ceph Plug-in Installation
Before proceeding with additional configuration, ensure that the plug-ins are working. For example:
# /usr/lib/nagios/plugins/check_ceph_health --id nagios --keyring /etc/ceph/client.nagios.keyring
The check_ceph_health plug-in performs the the equivalent of:
# ceph health
Add a Command for the Ceph Plug-in
Add a command for the check_ceph_health plug-in.
# vim /usr/local/nagios/etc/nrpe.cfg
For example:
command[check_ceph_health]=/usr/lib/nagios/plugins/check_ceph_health --id nagios --keyring /etc/ceph/client.nagios.keyring
Save and restart NRPE.
# systemctl restart nrpe
Repeat this procedure for each Ceph plug-in applicable to the host. See https://github.com/valerytschopp/ceph-nagios-plugins for usage.
Define the check_nrpe Command
Return to the Nagios server and define a check_nrpe command for the NRPE plug-in.
# cd /usr/local/nagios/etc/objects
# vi commands.cfg
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
Define a Service for the Plug-in
On the Nagios server, edit the configuration file for the host and add a service for the Ceph plug-in. For example:
# vim /usr/local/nagios/etc/objects/mon.cfg
define service {
use generic-service
host_name mon
service_description Ceph Health Check
check_command check_nrpe!check_ceph_health
}
Note that the check_command setting uses check_nrpe! before the Ceph plug-in name. This tells NRPE to execute the check_ceph_health command on the remote host.
Repeat this procedure for each plug-in applicable to the host.
Then, restart the Nagios server.
# systemctl restart nagios
Summary
After completing the foregoing procedures, return to the Nagios web user interface and click on the "Hosts" link. The host should be appear in the list of hosts. Click on the host to see additional details. Click on the View Status Detail hyperlink. It should display the checks it performs. In the instant example, there should be a Ceph Health Check service with status information on the Ceph cluster.
