Red Hat Training

A Red Hat training course is available for Red Hat Gluster Storage

18.6. Configuring Nagios Manually

You can configure the Nagios server and node manually to monitor a Red Hat Gluster Storage trusted storage pool.

Note

It is recommended to configure Nagios using Auto-Discovery. For more information on configuring Nagios using Auto-Discovery, see Section 18.3.1, “Configuring Nagios”
For more information on Nagios Configuration files, see Chapter 21, Nagios Configuration Files
Configuring Nagios Server

  1. In the /etc/nagios/gluster directory, create a directory with the cluster name. All configurations for the cluster are added in this directory.
  2. In the /etc/nagios/gluster/cluster-name directory, create a file with name clustername.cfg to specify the host and hostgroup configurations. The service configurations for all the cluster and volume level services are added in this file.

    Note

    Cluster is configured as host and host group in Nagios.
    In the clustername.cfg file, add the following definitions:
    1. Define a host group with cluster name as shown below:
      define hostgroup{
                   hostgroup_name                 cluster-name
                   alias                          cluster-name
          }
    2. Define a host with cluster name as shown below:
       define host{
                  host_name                      cluster-name
                  alias                          cluster-name
                  use                            gluster-cluster
                  address                        cluster-name
          }
    3. Define Cluster-Quorum service to monitor cluster quorum status as shown below:
      define service {
                   service_description            Cluster - Quorum
                   use                            gluster-passive-service
              host_name                      cluster-name
          }
    4. Define the Cluster Utilization service to monitor cluster utilization as shown below:
      define service {
                   service_description              Cluster Utilization
                   use gluster-service-with-graph
                   check_command     check_cluster_vol_usage!warning-threshold!critcal-threshold;
                   host_name                        cluster-name
          }
    5. Add the following service definitions for each volume in the cluster:
      • Volume Status service to monitor the status of the volume as shown below:
        define service {
                         service_description             Volume Status - volume-name
                         host_name                       cluster-name
                         use gluster-service-without-graph
                         _VOL_NAME                       volume-name
                         notes                           Volume type : Volume-Type
                         check_command     check_vol_status!cluster-name!volume-name
                }
      • Volume Utilization service to monitor the volume utilization as shown below:
        define service {
                         service_description             Volume Utilization - volume-name
                         host_name                       cluster-name
                         use gluster-service-with-graph
                         _VOL_NAME                       volume-name
                         notes                           Volume type : Volume-Type
                         check_command     check_vol_utilization!cluster-name!volume-name!warning-threshold!critcal-threshold
                }
        
      • Volume Split-brain service to monitor split brain status as shown below:
        define service {
                                 service_description    Volume Split-brain status - volume-name
                                 host_name                 cluster-name
                                 use gluster-service-without-graph
                                 _VOL_NAME                      volume-name
                                check_command                  check_vol_heal_status!cluster1!vol1
        }
      • Volume Quota service to monitor the volume quota status as shown below:
        define service {
                         service_description            Volume Quota - volume-name
                         host_name                      cluster-name
                         use gluster-service-without-graph
                         _VOL_NAME                      volume-name
                         check_command    check_vol_quota_status!cluster-name!volume-name
                         notes                          Volume type : Volume-Type
                }
        
      • Volume Geo-Replication service to monitor Geo Replication status as shown below:
        define service {
                         service_description            Volume Geo Replication - volume-name
                         host_name                      cluster-name
                         use gluster-service-without-graph
                         _VOL_NAME                      volume-name
                         check_command    check_vol_georep_status!cluster-name!volume-name
                }
        
  3. In the /etc/nagios/gluster/cluster-name directory, create a file with name host-name.cfg. The host configuration for the node and service configuration for all the brick from the node are added in this file.
    In host-name.cfg file, add following definitions:
    1. Define Host for the node as shown below:
       define host {
               use                            gluster-host
               hostgroups    gluster_hosts,cluster-name
               alias                          host-name
               host_name                      host-name #Name given by user to identify the node in Nagios
               _HOST_UUID                     host-uuid #Host UUID returned by gluster peer status
               address                        host-address  # This can be FQDN or IP address of the host
            }
    2. Create the following services for each brick in the node:
      • Add Brick Utilization service as shown below:
        define service {
                        service_description            Brick Utilization - brick-path
                         host_name                     host-name  # Host name given in host definition
                         use                           brick-service
                         _VOL_NAME                     Volume-Name
                         notes                         Volume : Volume-Name
                         _BRICK_DIR                    brick-path
                }
        
      • Add Brick Status service as shown below:
        define service {
                         service_description           Brick - brick-path
                         host_name                     host-name  # Host name given in host definition
                         use          gluster-brick-status-service
                         _VOL_NAME                     Volume-Name
                         notes                         Volume : Volume-Name
                         _BRICK_DIR                    brick-path
                }
  4. Add host configurations and service configurations for all nodes in the cluster as shown in Step 3.

Configuring Red Hat Gluster Storage node

  1. In /etc/nagios directory of each Red Hat Gluster Storage node, edit nagios_server.conf file by setting the configurations as shown below:
    # NAGIOS SERVER
    # The nagios server IP address or FQDN to which the NSCA command
    # needs to be sent
    [NAGIOS-SERVER]
    nagios_server=NagiosServerIPAddress
    
    
    # CLUSTER NAME
    # The host name of the logical cluster configured in Nagios under which
    # the gluster volume services reside
    [NAGIOS-DEFINTIONS]
    cluster_name=cluster_auto
    
    
    # LOCAL HOST NAME
    # Host name given in the nagios server
    [HOST-NAME]
    hostname_in_nagios=NameOfTheHostInNagios
    
    # LOCAL HOST CONFIGURATION
    # Process monitoring sleeping intevel
    [HOST-CONF]
    proc-mon-sleep-time=TimeInSeconds
    
    
    The nagios_server.conf file is used by glusterpmd service to get server name, host name, and the process monitoring interval time.
  2. Start the glusterpmd service using the following command:
    # service glusterpmd start

Changing Nagios Monitoring time interval

By default, the active Red Hat Gluster Storage services are monitored every 10 minutes. You can change the time interval for monitoring by editing the gluster-templates.cfg file.

  1. In /etc/nagios/gluster/gluster-templates.cfg file, edit the service with gluster-service name.
  2. Add normal_check_interval and set the time interval to 1 to check all Red Hat Gluster Storage services every 1 minute as shown below:
    define service {
       name                         gluster-service
       use                          generic-service
       notifications_enabled        1
       notification_period          24x7
       notification_options         w,u,c,r,f,s
       notification_interval        120
       register                     0
       contacts                     +ovirt,snmp
       _GLUSTER_ENTITY              HOST_SERVICE
       normal_check_interval        1
    }
  3. To change this on individual service, add this property to the required service definition as shown below:
    define service {
       name                    gluster-brick-status-service
       use                     gluster-service
       register                0
       event_handler           brick_status_event_handler
       check_command           check_brick_status
       normal_check_interval   1
    }
    The check_interval is controlled by the global directive interval_length. This defaults to 60 seconds. This can be changed in /etc/nagios/nagios.cfg as shown below:
    # INTERVAL LENGTH
    # This is the seconds per unit interval as used in the
    # host/contact/service configuration files.  Setting this to 60 means
    # that each interval is one minute long (60 seconds).  Other settings
    # have not been tested much, so your mileage is likely to vary...
    
    interval_length=TimeInSeconds