Red Hat Training

A Red Hat training course is available for Red Hat JBoss Operations Network

Chapter 25. Defining Alerts

25.1. Planning Alerts

An alert is a configuration setting that lets an administrator know that something has happened to a resource. Conditions and notifications are configured together in an alert definition for a resource.
There are three major components to an alert definition:

25.1.1. An Alerting Strategy in Four Questions

Alerting can be very basic, an immediate notification of something going on with your IT infrastructure. Alerting can also define complex scenarios and detailed responses, allowing administrators to respond automatically and proactively to problems in resources.
Both approaches are equally valid and equally useful, depending on your environment, management processes, and plans. Putting a little effort into planning your alerts — simple and complex — can make your overall management much easier.

25.1.1.1. What's the Condition?

This is probably the single most important thing to determine. What event do you want to know about?
An alert definition can have one condition or multiple conditions, and those conditions can trigger an alert if only one is true or only if all are true. One of the easiest things to do is picture the scenario that needs to be reported and work backward to determine what condition or set of conditions best represents that scenario.
Alerts can be very general ("there was a change") or very specific. General conditions are more informational. Very specific conditions can allow very specific responses — meaning that detailed and useful responses can be automated until an administrator can review and intervene.
There is no limit on the number of alert definitions that can be configured, apart from reasonable performance constraints. It may make sense to have one alert that sends a notification for a simple metric value change, while a series of other alerts look at, say, the change for that metric in relation to a different metric, and then runs a certain JBoss ON CLI script depending on what other factors are involved.

25.1.1.2. What's the Frequency?

A condition can occur and recur, across multiple monitoring scans. How many times do you need to be informed of the same condition?
There are a lot of factors to consider: what kind of response will be taken, what kind of audit trail you need to keep, how common a condition is likely to be, and how long a condition may exist.
Some factors in the alert definition such as dampening rules and disable/recover alert settings throttle how many notifications are sent.
One thing to remember is that an alert notification is every response that the JBoss ON server takes when an alert is triggered. It can be sending an email, posting a message in the UI, running a CLI script, or initiating an operation.
For common, relatively low-priority alerts, probably a single notification is sufficient, so dampening and disable rules can be set to prevent spamming administrators with emails or flooding the recent alerts portlet.
For infrequent issues, critical failures, or high priority resources, the frequency of the alert notification depends on what actions you need to take. Informational alerts may sent for as along as the condition persists, while an alert with an aggressive response like running a CLI script may only be run once and then disabled to prevent from repeatedly running the same script.

25.1.1.3. What's the Response to Take?

Responding to an alert has two phases: the immediate response and then long-term mitigation.
When an alert for a certain condition or event on your resource is fired, what is the very first thing that should happen? Should there be an email to an administrator? Should the JBoss ON server take some action to manage the resource? JBoss ON supports SNMP traps, so should an SNMP notification be sent to an external auditing/monitoring application?
Like conditions, there is no limit to the number of notifications that can be sent when an alert occurs. Multiple responses may make sense in some cases, where JBoss ON should email an administrator and restart a resource can simultaneously.
Operations and CLI script allow administrators to automate responses, and particularly for business-critical systems, this allows JBoss ON to take an immediate action before an administrator may even be aware there is a problem.
Once an immediate action has been taken (either by JBoss ON or by an administrator), then the administrator can acknowledge the alert, essentially dismissing it. That final step can be an important procedure, a way of stating that an administrator has reviewed the alert, identified what caused it, and found a solution.
Having a way of reviewing and signing off on events — not just as a JBoss ON task, but in real life — establishes a reliable procedure for handling IT problems and improving overall maintenance strategies.

25.1.1.4. How Many Resources Does This Affect?

Alert definitions can be set for an individual resource, for a group of compatible resources, or for an entire resource type.
Being able to apply the same definition to multiple resources is invaluable because of how easy it is to manage complex definitions.
Note
Alert templates for a resource type automatically apply to all existing and new resources of that type.
Be cognizant, though, of how different resources of the same type need to be alerted. For example, production servers require a very high level of reporting, monitoring metrics, and, possibly, automated response. QA or development servers may only require general alert information with little scripting or advanced responses. The number, frequency, and type of metrics collected on a production system may vary from those collected on QE or development systems, and that has an impact on what conditions can be used for alert definitions.
Grouping is an asset. QE, development, staging, and production resources can be separated into different groups, with the appropriate levels of monitoring and alerting set on each.

25.1.2. Basic Procedure for Setting Alerts for a Resource

Note
It is not possible to edit an alert condition or an alert notification after they are set. To change the conditions or notifications for an alert definition, delete the condition or notification and create a new one.
  1. Click the Inventory tab in the top menu.
  2. Select the resource type in the Resources menu table on the left, and then browse or search for the resource.
  3. Click the resource name in the list.
  4. Click the Alerts tab for the resource.
  5. In the Definitions subtab, click the New button to create the new alert.
  6. In the General Properties tab, give the basic information about the alert.
    • Name. Gives the name of the specific alert definition. This must be unique for the resource.
    • Description. Contains an optional description of the alert; this can be very useful if you want to trigger different kinds of alert responses at different conditions for the same resource.
    • Priority. Sets the priority or severity that is given to an alert triggered by this definition.
    • Enabled. Sets whether the alert definition is active. Alert definitions can be disabled to prevent unnecessary or spurious alerts if there is, for instance, a network outage or routine maintenance window for the resource.
  7. In the Conditions tab, set the metric or issue that triggers the alert. Click the Add button to bring up the conditions form.
    Note
    There can be more than one condition set to trigger an alert. For example, you may only want to receive a notification for a server if its CPU goes above 80% and its available memory drops below 25MB. The ALL setting for the conditions restricts the alert notification to only when both criteria are met. Alternatively, you may want to know when either one occurs so that you can immediately change the load balancing configuration for the network. In that case, the ANY setting fires off a notification as soon as even one condition threshold is met.
    1. Click the Add a new condition button.
    2. From the initial drop-down menu, select the type of condition. The categories of conditions are described in Section 25.2.1, “Reasons for Firing an Alert”, and the exact conditions available to be set for every resource are listed in the Resource Monitoring Reference.
    3. Set the values for the condition.
  8. In the Notifications tab, click Add to set a notification for the alert.
    1. Select the method to use to send the alert notification in the Sender option.
      The Sender option first sets the specific type of alert method (such as email or SNMP) and then opens the appropriate form to fill in the details for that specific method.
    2. Fill in the required information for the alert sender method. The method may require contact information, SNMP settings, operations, or scripts, depending on what is selected.
  9. In the Recovery tab, set whether to disable an alert until the resource state is recovered. Optionally, select another alert to enable (or recover) when this alert fires.
    A recover alert takes a disabled alert and re-enables it. This is used for two alerts which show changing states, like a pair of alerts to show when availability goes down and then back up.
  10. In the Dampening tab, give the dampening (or frequency) rule on how often to send notifications for the same alert event.
    The frequency for sending alerts depends on the expected behavior of the resource. There has to be a balance between sending too many alerts and sending too few. There are several frequency settings:
    • Consecutive. Sends an alert if the condition occurs a certain number of times in a row for metric calculations. For example, if this is set to three, then the condition must be detected in three consecutive metric collection periods for the alert to be fired. If this is set to one, then it sends an alert every time the condition occurs.
    • Last N evaluations. This sets a number of times that the condition has to occur in a given number of monitoring evaluations cycles before an alert is sent.
    • Time period. The other two similar dampening rules set a recurrence based on the JBoss ON monitoring cycles. This sets the alerting rule based on a specific time period.
  11. Click OK to save the alert definition.

25.1.3. Enabling and Disabling Alert Definitions

When an alert definition is disabled, no alert notifications are triggered for that set of conditions. Disabling definitions is very useful when resources are being taken offline for a know reason (such as upgrades or maintenance) and any alerts triggered during that time would be wrong. Alert definitions can be re-enabled later just as easily.
  1. Click the Inventory tab in the top menu.
  2. Select the resource type in the Resources menu table on the left, and then browse or search for the resource.
  3. Click the Alerts tab.
  4. In the Definitions subtab, select any of the definitions to enable or disable.
  5. Click the Enable or Disable button.
  6. Confirm the action.

25.1.4. Group Alerting and Alert Templates

Most alerts can be defined consistently for multiple resources of the same type. JBoss ON has two ways to accomplish this:
  • Alert templates
  • Alerts on compatible groups
An alert template is a configuration setting for the JBoss ON server. An alert is configured for a specific resource type (even if no resource of that type exists in the inventory yet). Whenever a resource is added, any alert templates in the JBoss ON configuration are automatically applied to that resource. Alert templates can be configured to allow local changes (for example, Resource A may have different baselines or expected behavior, so the alert conditions can be altered). Templates can also be strictly enforced, so that every resource of that type has exactly the same settings.
Note
Alert templates for a resource type automatically apply to all existing and new resources of that type. This can be deselected when the template is edited, so that changes only apply to new resources.
Alerts can be configured on compatible groups. As with alert templates, the compatible group's alert definitions trickle down to the rest of the group members. When a resource is added to a group, the alerts are automatically added to the resource. When the resource is removed from the group, the alert is automatically deleted. Group alerting works for both manual groups and dynamic groups. As with alert templates, group alerts can allow local changes or enforce the group alert settings.
Templates make configuration really easy to apply consistently and often, and JBoss ON allows templates to be set for alerts based on their general resource type.
Group alerts, like alert templates, apply equally to every member of a compatible group. Group alerts offer more control over which resources have the alert definition, however, since resources can be manually added to the group or selected based on a search filter. When a resource joins or leaves a group, its alert definitions are automatically updated.

25.1.4.1. Creating Alert Definition Templates

Alert templates are fully defined alert definitions — from conditions to notification methods — that are created for any of the managed resource types in  JBoss ON. Servers or applications of the same type will probably have the same set of alert conditions that apply, such as free memory or CPU usage. An alert definition template creates an alert based on the general type of resource. So, there can be alert templates for Windows, Linux, and Solaris servers, Tomcat and Apache servers, and services like sshd and cron. Every time a resource of that type is added, then the alert definition is automatically added to the resource with the predefined settings. Any alert assigned to a resource through a template can be edited locally for that resource, so these alert definitions are still flexible and customizable.
Note
Alert templates for a resource type automatically apply to all existing and new resources of that type. This can be deselected when the template is edited, so that changes only apply to new resources.
To create an alert definition template:
  1. In the top navigation, open the Administration menu, and then the Configuration menu.
  2. Select the Alert Templates menu item. This opens a long list of resource types, both for platforms and server types.
  3. Locate the type of resource for which to create the template definition.
  4. Click the New button to create a global alert definition. Set up the alert exactly the same way as setting an alert for a single resource (as in Section 25.1.2, “Basic Procedure for Setting Alerts for a Resource”).
  5. Save the template.
The template definition is then applied to all current and new resources of that type.

25.1.4.2. Configuring Group Alerts

Group alerts can only be set on compatible groups.
  1. In the Inventory tab in the top menu, select the Compatible Groups item in the Groups menu on the left.
  2. In the main window, select the group to add the alert to.
  3. Click the Alerts tab for the group.
  4. In the Definitions subtab, click the New button.
  5. Configure the basic alert definition and notifications, as in Section 25.1.2, “Basic Procedure for Setting Alerts for a Resource”.