Red Hat Training

A Red Hat training course is available for Red Hat JBoss Operations Network

5. Alerts and Notifications: An Introduction

An alert is a configuration setting that lets an administrator know that something has happened to a resource. Conditions and notifications are configured together in an alert definition for a resource.
There are three major components to an alert definition:
  • The information that identifies that specific alert definition (the name, priority, and whether it is active)
  • The conditions that trigger the alert, which depends on the area of the resource being monitored
  • The method and settings to use to send the alert

5.1. Alert Conditions

The condition is any situation, event, or level on a resource that crosses a certain threshold. Basically, a condition sets parameters on what is "normal" behavior or performance for a resource. Once it crosses that boundary, JBoss ON issues an alert. This can be a metric value that has changed to an undesirable level, an event, or a recurring metric reading.
Alerts through alert definitions against are defined for individual resources or for compatible groups of resources. An alert definition specifies the conditions that trigger the alert and the type and settings of any notification that should be triggered.
When an alert is registered, the alert identifies the alert definition which was triggered (which identifies the alert condition) and the metric or event value which precipitated the alert.
An alert can be issued every single time a condition is met, or an alert can be issued and then disabled until an administrator acknowledges it. Depending on the condition, it can be useful to prevent multiple alerts and notifications from being sent for a single ongoing set of circumstances.
Resources may have multiple alert definitions or no alert definitions.
Alerts can be disabled and enabled manually by administrators. Disabling alerts avoids unnecessary alerts being recorded when a resource is taken offline or when it is expected to be in a condition that will trigger alerts.
An alert conditions answers four questions: what, when, who, and where. The what is the threshold or condition that triggers the alert (such as the free memory drops below a certain point). The when sets the frequency or timing for sending an alert using a defined dampening rule. And the who and where controls how administrators are notified of the alert.
A single condition can be enough to issue an alert, or an alert definition can require that an alert is issued only if multiple conditions are met simultaneously. This provides very granular control over when an alert is issued, which makes alerting information more valuable and relevant.
A condition can be any of five different metrics, listed in Table 3, “Types of Alert Conditions”. These alert conditions correspond directly to the monitoring metrics available for that type of resource. All of the possible metrics for each resource type are listed in the Resource Monitoring Reference.

Table 3. Types of Alert Conditions

Condition Type Description
Metric A specific monitoring area that is checked and the thresholds for that area which trigger a response. Metrics are usually numeric responses of some sort (e.g., percent CPU usage, number of requests, or a cache hit ratio).
Trait A change in a value for a specific setting. Traits are usually string values.
Availability A sudden change in whether the resource is available or unavailable.
Operation A specific action or task that is performed on the resource.
Severity A certain type of error message, matching a given string, is recorded.
Along with setting the threshold, the condition sets how JBoss ON counts events for it to trigger alerts. A condition may need to occur several times over a short period of time for it to be a problem, but if it occurs once, it is not a problem. Dampening prevents an alert from being sent until the condition occurs with enough frequency to indicate a true problem. For example, a condition may be set to alert if the CPU hits 80% usage. In real life, a server may bounce between 78% and 80% CPU over several minutes, it could hit 80% once for only a few seconds, or it could hit 80% and stay there.
The condition dampening setting tells JBoss ON how to interpret those monitoring data.
  • JBoss ON could send an alert every time the condition is encountered. In that case, there would be multiple alerts issued if the CPU percentage bounced around, while only one alert would be sent if it hit it briefly or hit it and stayed there.
  • JBoss ON could send an alert only if the condition was encountered a certain number of times consecutively or X number of times out of Y number of polls. In this case, only a recurring or sustained problem would trigger an alert. A momentary spike or trough wouldn't be enough to fire a notification.
  • The other option is that a notification is sent only if the problem occurs within a set time period. This can be useful to track the frequency of recurring problems or to track how long a condition persisted.

5.2. Notification Methods

Every alert is recorded and viewable in the JBoss ON GUI. Alerts have an optional configuration, though, of sending an external notification whenever the alert is issued.
Once an incident occurs, there has to be a way to let a systems administrator know what is going on, so they can respond to an issue. This is done by configuring a notification.
JBoss ON has several different methods of sending a notification:
  • Email
  • SNMP traps
  • Resource operations
  • JBoss ON users and roles
  • Resource scripts (as operations)
  • JBoss ON CLI scripts
It is also possible to write custom alert methods, which are implemented as server-side plug-ins. Creating custom plug-ins is described in the JBoss Operations Network Plug-ins Writing Guide.
Because alerts and notifications are configured through server-side plug-ins, custom notification senders can be written; writing server-side plug-ins is covered in more detail in the JBoss Operations Network Plug-ins Writing Guide.
These alert methods can be configured individually for a specific alert definition.

Note

You can "cluster" alert notifications.
Alert notifications can be broadcast through several different methods at the same time. For example, if a public website goes down, then a company may want notifications to be sent to their head web administrator and their company's external microblog feed at the same time.

5.3. Alert Operations

A parallel response to an alert is to launch an operation. Resource operations (which, like metrics, are defined in the resource type agent plug-in) are launched, like a notification, in response to a triggered alert. Alert operations can be run on the resource that issued the alert or on any other resource in the inventory, which allows immediate and automatic responses to alert conditions. For instance, a JBoss server may begin performing badly because its JVM is out of memory. The JVM is the resource which issues its alert, but the response by the agent is to restart the JBoss server.
When a certain alert condition occurs, the JBoss ON agent can respond by initiating an operation on a resource. This is part of the alert definition configuration, but it's worth calling out because it is such a useful tool for managing responses to alerts. Whenever an alert is fired, the agent can perform some kind of action, like restarting a server. This can be done either on the resource which issued the alert or on another resource.
Remote operations can be exceedingly useful (and versatile). For example, a JBoss server may begin performing badly because its JVM is out of memory. The JVM is the resource which issues its alert, but the response by the agent is to restart the JBoss server.
Regular operations are either initiated immediately or run on defined schedules for a specific configured resource. Alert operations are even more flexible than regular operations for two reasons:
  • Alert operations are fired responsively to address any alert or event.
  • Alert operations can be initiated on any resource in the JBoss ON inventory, not only the resource which sent the alert. That means that an operation can be run for a different application on the same host server or even on an entirely different server.

Note

The operations performed in response to an alert are the same as the operations which can be scheduled to run on a resource. The operations available for an alert depend on the target resource on which the operation will run — not the resource where the alert is set.
The type of operation which is available to be run for an alert depends on the type of resource that is the target of the operation. (This may not be the same as the resource which has the alert configured.) There are two types of alert operations:
  • Operations that are the same as regular operations.
  • JavaScripts that can be run on any platform as an operation for script resources.

Note

Alert operations senders can be used to run scripts on remote resources. For example, if a resource goes down, a diagnostic script can be run on its parent platform or another resource can be brought online and properly configured to take its place.

Note

A single alert can initiate multiple operations. All alert operations, as with all alert notifications, are run in the order they are listed in the alert definition.
Alert operations can accept tokens to fill in certain values automatically. These have the following form:
<%space.param_name%>
The space gives the JBoss ON configuration area where the value is derived; this will commonly be either alert or resource. The param_name gives the entry value that is being supplied. For example, to point to the URL of the specific fired alert, the token would be <%alert.url%>, while to pull in the resource name, the token would be <%resource.name%>. The possible tokens are listed in Table 4, “Available Alert Operation Tokens”.

5.4. Alert Histories and Acknowledgments

Having a record of alert incidents can help improve performance, incident analysis, and other admin tasks.
Every time an alert is sent, JBoss ON makes a record of it. Each alert notification and the conditions that triggered it are stored in the alert history for the resource.
JBoss ON also enables users to acknowledge alerts. An administrator who takes or verifies an action after an alert can mark that alert as acknowledged to indicate that the issue is closed. The name of the user and the time of the acknowledgment are recorded with the alert details.
The alert history and acknowledgment history are both valuable for auditing and assessing infrastructure performance.

5.5. Group Alerting and Alert Templates

Most alerts can be defined consistently for multiple resources of the same type. JBoss ON has two ways to accomplish this:
  • Alert templates
  • Alerts on compatible groups
An alert template is a configuration setting for the JBoss ON server. An alert is configured for a specific resource type (even if no resource of that type exists in the inventory yet). Whenever a resource is added, any alert templates in the JBoss ON configuration are automatically applied to that resource. Alert templates can be configured to allow local changes (for example, Resource A may have different baselines or expected behavior, so the alert conditions can be altered). Templates can also be strictly enforced, so that every resource of that type has exactly the same settings.
Alerts can be configured on compatible groups. As with alert templates, the compatible group's alert definitions trickle down to the rest of the group members. When a resource is added to a group, the alerts are automatically added to the resource. When the resource is removed from the group, the alert is automatically deleted. Group alerting works for both manual groups and dynamic groups. As with alert templates, group alerts can allow local changes or enforce the group alert settings.