Chapter 7. Monitoring brokers for problems

AMQ Broker includes an internal tool called the Critical Analyzer that actively monitors running brokers for problems such as deadlock conditions. In a production environment, a problem such as a deadlock condition can be caused by IO errors, a defective disk, memory shortage, or excess CPU usage caused by other processes.

The Critical Analyzer periodically measures the response time for critical operations such as queue delivery (that is, adding of messages to a queue on the broker) and journal operations. If the response time of a checked operation exceeds a configurable timeout value, the broker is considered unstable. In this case, you can configure the Critical Analyzer to simply log a message or take action to protect the broker, such as shutting down the broker or stopping the virtual machine (VM) that is running the broker.

7.1. Configuring the Critical Analyzer

The following procedure shows how to configure the Critical Analyzer to monitor the broker for problems.

Procedure

  1. Open the <broker_instance_dir>/etc/broker.xml configuration file.

    The default configuration for the Critical Analyzer is shown below.

    <critical-analyzer>true</critical-analyzer>
    <critical-analyzer-timeout>120000</critical-analyzer-timeout>
    <critical-analyzer-check-period>60000</critical-analyzer-check-period>
    <critical-analyzer-policy>HALT</critical-analyzer-policy>
  2. Specify parameter values, as described below.

    critical-analyzer
    Specifies whether to enable or disable the Critical Analyzer tool. The default value is true, which means that the tool is enabled.
    critical-analyzer-timeout
    Timeout, in milliseconds, for the checks run by the Critical Analyzer. If the time taken by one of the checked operations exceeds this value, the broker is considered unstable.
    critical-analyzer-check-period
    Time period, in milliseconds, between consecutive checks by the Critical Analyzer for each operation.
    critical-analyzer-policy
    If the broker fails a check and is considered unstable, this parameter specifies whether the broker logs a message (LOG), stops the virtual machine (VM) hosting the broker (HALT), or shuts down the broker (SHUTDOWN).

    Based on the policy option that you have configured, if the response time for a critical operation exceeds the configured timeout value, you see output that resembles one of the following:

    critical-analyzer-policy=LOG

    [Artemis Critical Analyzer] 18:11:52,145 WARN [org.apache.activemq.artemis.core.server] AMQ224081: The component org.apache.activemq.artemis.tests.integration.critical.CriticalSimpleTest$2@5af97850 is not responsive

    critical-analyzer-policy=HALT

    [Artemis Critical Analyzer] 18:10:00,831 ERROR [org.apache.activemq.artemis.core.server] AMQ224079: The process for the virtual machine will be killed, as component org.apache.activemq.artemis.tests.integration.critical.CriticalSimpleTest$2@5af97850 is not responsive

    critical-analyzer-policy=SHUTDOWN

    [Artemis Critical Analyzer] 18:07:53,475 ERROR [org.apache.activemq.artemis.core.server] AMQ224080: The server process will now be stopped, as component org.apache.activemq.artemis.tests.integration.critical.CriticalSimpleTest$2@5af97850 is not responsive

    You also see a thread dump on the broker that resembles the following:

    [Artemis Critical Analyzer] 18:10:00,836 WARN  [org.apache.activemq.artemis.core.server] AMQ222199: Thread dump: AMQ119001: Generating thread dump
    * =============================================================================== AMQ119002: Thread Thread[Thread-1 (ActiveMQ-scheduled-threads),5,main] name = Thread-1 (ActiveMQ-scheduled-threads) id = 19 group = java.lang.ThreadGroup[name=main,maxpri=10] sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088) java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:745) =============================================================================== ..... .......... =============================================================================== AMQ119003: End Thread dump *

Revised on 2022-03-30 11:53:23 UTC