Chapter 22. Client Failover

AMQ Broker 7.1 defines two types of client failover, each of which is covered in its own section later in this chapter: automatic client failover and application-level client failover. The broker also provides 100% transparent automatic reattachment of connections to the same broker, as in the case of transient network problems, for example. This is similar to failover, except the client is reconnecting to the same broker.

During failover, if the client has consumers on any non persistent or temporary queues, those queues are automatically re-created during failover on the slave broker, since the slave broker does not have any knowledge of non persistent queues.

22.1. Automatic Client Failover

A client can receive information about all master and slave brokers, so that in the event of a connection failure, it can reconnect to the slave broker. The slave broker then automatically re-creates any sessions and consumers that existed on each connection before failover. This feature saves you from having to hand-code manual reconnection logic in your applications.

When a session is re-created on the slave, it does not have any knowledge of messages already sent or acknowledged. Any in-flight sends or acknowledgements at the time of failover might also be lost. However, even without 100% transparent failover, it is simple to guarantee once and only once delivery, even in the case of failure, by using a combination of duplicate detection and retrying of transactions.

Clients detect connection failure when they have not received packets from the broker within a configurable period of time. See Detecting Dead Connections for more information.

You have a number of methods to configure clients to receive information about master and slave. One option is to configure clients to connect to a specific broker and then receive information about the other brokers in the cluster. See Configuring a Client to Use Static Discovery for more information. The most common way, however, is to use broker discovery. For details on how to configure broker discovery, see Configuring a Client to Use Dynamic Discovery.

Also, you can configure the client by adding parameters to the query string of the URL used to connect to the broker, as in the example below.

connectionFactory.ConnectionFactory=tcp://localhost:61616?ha=true&reconnectAttempts=3

Procedure

To configure your clients for failover through the use of a query string, ensure the following components of the URL are set properly.

  1. The host:port portion of the URL should point to a master broker that is properly configured with a backup. This host and port is used only for the initial connection. The host:port value has nothing to do with the actual connection failover between a live and a backup server. In the example above, localhost:61616 is used for the host:port.
  2. (Optional) To use more than one broker as a possible initial connection, group the host:port entries as in the following example:

    connectionFactory.ConnectionFactory=(tcp://host1:port,tcp://host2:port)?ha=true&reconnectAttempts=3
  3. Include the name-value pair ha=true as part of the query string to ensure the client receives information about each master and slave broker in the cluster.
  4. Include the name-value pair reconnectAttempts=n, where n is an integer greater than 0. This parameter sets the number of times the client attempts to reconnect to a broker.
Note

Failover occurs only if ha=true and reconnectAttempts is greater than 0. Also, the client must make an initial connection to the master broker in order to receive information about other brokers. If the initial connection fails, the client can only retry to establish it. See Failing Over During the Initial Connection for more information.

22.1.1. Failing Over During the Initial Connection

Because the client does not receive information about every broker until after the first connection to the HA cluster, there is a window of time where the client can connect only to the broker included in the connection URL. Therefore, if a failure happens during this initial connection, the client cannot failover to other master brokers, but can only try to re-establish the initial connection. Clients can be configured for set number of reconnection attempts. Once the number of attempts has been made an exception is thrown.

Setting the Number of Reconnection Attempts

Procedure

The examples below shows how to set the number of reconnection attempts to 3 using the AMQ JMS client. The default value is 0, that is, try only once.

  • Set the number of reconnection attempts by passing a value to ServerLocator.setInitialConnectAttempts().

    ConnectionFactory cf =  ActiveMQJMSClient.createConnectionFactory(...)
    cf.setInitialConnectAttempts(3);
Setting a Global Number of Reconnection Attempts

Alternatively, you can apply a global value for the maximum number of reconnection attempts within the broker’s configuration. The maximum is applied to all client connections.

Procedure

  • Edit BROKER_INSTANCE_DIR/etc/broker.xml by adding the initial-connect-attempts configuration element and providing a value for the time-to-live, as in the example below.

    <configuration>
     <core>
      ...
      <initial-connect-attempts>3</initial-connect-attempts> 1
      ...
     </core>
    </configuration>
    1
    All clients connecting to the broker are allowed a maximum of three attempts to reconnect. The default is -1, which allows clients unlimited attempts.

22.1.2. Handling Blocking Calls During Failover

When failover occurs and the client is waiting for a response from the broker to continue its execution, the newly created session does not have any knowledge of the call that was in progress. The initial call might otherwise hang forever, waiting for a response that never comes. To prevent this, the broker is designed to unblock any blocking calls that were in progress at the time of failover by making them throw an exception. Client code can catch these exceptions and retry any operations if desired.

When using AMQ JMS clients, if the unblocked method is a call to commit() or prepare(), the transaction is automatically rolled back and the broker throws an exception.

22.1.3. Handling Failover with Transactions

When using AMQ JMS clients, if the session is transactional and messages have already been sent or acknowledged in the current transaction, the broker cannot be sure that those messages or their acknowledgements were lost during the failover. Consequently, the transaction is marked for rollback only. Any subsequent attempt to commit it throws an javax.jms.TransactionRolledBackException.

Warning

The caveat to this rule is when XA is used. If a two-phase commit is used and prepare() has already been called, rolling back could cause a HeuristicMixedException. Because of this, the commit throws an XAException.XA_RETRY exception, which informs the Transaction Manager it should retry the commit at some later point. If the original commit has not occurred, it still exists and can be committed. If the commit does not exist, it is assumed to have been committed, although the transaction manager might log a warning. A side effect of this exception is that any nonpersistent messages are lost. To avoid such losses, always use persistent messages when using XA. This is not an issue with acknowledgements since they are flushed to the broker before prepare() is called.

The AMQ JMS client code must catch the exception and perform any necessary client side rollback. There is no need to roll back the session, however, because it was already rolled back. The user can then retry the transactional operations again on the same session.

If failover occurs when a commit call is being executed, the broker unblocks the call to prevent the AMQ JMS client from waiting indefinitely for a response. Consequently, the client cannot determine whether the transaction commit was actually processed on the master broker before failure occurred.

To remedy this, the AMQ JMS client can enable duplicate detection in the transaction, and retry the transaction operations again after the call is unblocked. If the transaction was successfully committed on the master broker before failover, duplicate detection ensures that any durable messages present in the transaction when it is retried are ignored on the broker side. This prevents messages from being sent more than once.

If the session is non transactional, messages or acknowledgements can be lost in case of failover. If you want to provide once and only once delivery guarantees for non transacted sessions, enable duplicate detection and catch unblock exceptions.

22.1.4. Getting Notified of Connection Failure

JMS provides a standard mechanism for getting notified asynchronously of connection failure: java.jms.ExceptionListener.

Any ExceptionListener or SessionFailureListener instance is always called by the broker if a connection failure occurs, whether the connection was successfully failed over, reconnected, or reattached. You can find out if a reconnect or a reattach has happened by examining the failedOver flag passed in on the connectionFailed on SessionFailureListener. Alternatively, you can inspect the error code of the javax.jms.JMSException, which can be one of the following:

Table 22.1. JMSException Error Codes

Error codeDescription

FAILOVER

Failover has occurred and the broker has successfully reattached or reconnected

DISCONNECT

No failover has occurred and the broker is disconnected

22.2. Application-Level Failover

In some cases you might not want automatic client failover, but prefer to code your own reconnection logic in a failure handler instead. This is known as application-level failover, since the failover is handled at the application level.

To implement application-level failover when using JMS, set an ExceptionListener class on the JMS connection. The ExceptionListener is called by the broker in the event that a connection failure is detected. In your ExceptionListener, you should close your old JMS connections. You might also want to look up new connection factory instances from JNDI and create new connections.