Monitoring-entitled systems can have probes applied to them that constantly confirm their health and full operability. This section lists the available probes broken down by command group, such as Apache.
Many probes that monitor internal system aspects (such as the Linux::Disk Usage probe) rather than external aspects (such as the Network Services::SSH probe) require the installation of the Red Hat Network monitoring daemon (
rhnmd). This requirement is noted within the individual probe reference.
Each probe has its own reference in this section that identifies required fields (marked with *), default values, and the thresholds that may be set to trigger alerts. Similarly, the beginning of each command group's section contains information applicable to all probes in that group. Section A.1, “Probe Guidelines”
covers general guidelines; the remaining sections examine individual probes.
Nearly all of the probes use Transmission Control Protocol (TCP) as their transport protocol. Exceptions to this are noted within the individual probe references.
The following general guidelines outline the meaning of each probe state, and provide guidance in setting thresholds for your probes.
The following list provides a brief description of the meaning of each probe state:
The probes that cannot collect the metrics needed to determine probe state. Most (though not all) probes enter this state when exceeding their timeout period. Probes in this state may be configured incorrectly, as well.
The probes whose data has not been received by the Red Hat Satellite. It is normal for new probes to be in this state. However, if all probes move into this state, the monitoring infrastructure may be failing.
The probes that have run successfully without error. This is the desired state for all probes.
The probes that have crossed their WARNING thresholds.
The probes that have crossed their CRITICAL thresholds or reached a critical status by some other means. (Some probes become critical when exceeding their timeout period.)
While adding probes, select meaningful thresholds that, when crossed, notify you and your administrators of problems within your infrastructure. Timeout periods are entered in seconds unless otherwise indicated. Exceptions to these rules are noted within the individual probe references.
Some probes have thresholds based on time. In order for such CRITICAL and WARNING thresholds to work as intended, their values cannot exceed the amount of time allotted to the timeout period. Otherwise, an UNKNOWN status is returned in all instances of extended latency, thereby nullifying the thresholds. For this reason, Red Hat strongly recommends ensuring that timeout periods exceed all timed thresholds.
Run your probes without notifications for a time to establish baseline performance for each of your systems. Although the default values provided for probes may suit your needs, every organization has a different environment that may require altering thresholds.