37.2. Fail-over Modes

HornetQ defines two types of client fail-over:
  • Automatic client fail-over
  • Application-level client fail-over
HornetQ provides transparent automatic reattachment of connections to the same server (for example, in case of transient network problems). This is similar to fail-over, except the connection is reconnecting to the same server. More information on this topic is discussed in Chapter 32, Client Reconnection and Session Reattachment.
During fail-over, if the client has consumers on any non persistent or temporary queues, those queues will be automatically recreated during fail-over on the backup node, since the backup node will not have any knowledge of non persistent queues.

37.2.1. Automatic Client fail-over

HornetQ clients can be configured with knowledge of live and backup servers, so that in event of connection failure at the client - live server connection, the client will detect this and reconnect to the backup server. The backup server will then automatically recreate any sessions and consumers that existed on each connection before fail-over, thus saving the user from having to hand-code manual reconnection logic.
HornetQ clients detect connection failure when it has not received packets from the server within the time given by client-failure-check-period as explained in section Chapter 15, Detecting Dead Connections. If the client does not receive data in good time, it will assume the connection has failed and attempt fail-over.
HornetQ clients can be configured with the list of live-backup server pairs in a number of different ways. They can be configured explicitly or probably the most common way of doing this is to use server discovery for the client to automatically discover the list. For full details on how to configure server discovery, refer to Section 36.2, “Server discovery”. Alternatively, the clients can explicitly specify pairs of live-backup server as explained in Section 36.5.2, “Specifying a Static Cluster Server List”.
To enable automatic client fail-over, the client must be configured to allow non-zero reconnection attempts (as explained in Chapter 32, Client Reconnection and Session Reattachment).
Sometimes it is desirable for a client to fail-over onto a backup server even if the live server is just cleanly shutdown rather than having crashed or the connection failed. To configure this set the property FailoverOnServerShutdown to true either on the HornetQConnectionFactory if using JMS or in the JBOSS_DIST/jboss-as/server/<PROFILE>/deploy/hornetq/hornetq-jms.xml file when defining the connection factory, or if using core by setting the property directly on the ClientSessionFactoryImpl instance after creation. The default value for this property is false, this means that by default HornetQ clients will not fail-over to a backup server if the live server is shutdown cleanly.

Note

Cleanly shutting down the server will not trigger fail-over on the client by default. For the client to fail-over when its server is cleanly shutdown, set the property FailoverOnServerShutdown to true.
Using Ctrl+C (in a Linux terminal) causes the server to cleanly shut down, so client fail-over is not triggered unless this property is correctly configured.
By default fail-over will only occur after at least one connection has been made to the live server. Applying this logic practically means that fail-over will not occur if the client fails to make an initial connection to the live server. The client will retry connecting to the live server according to the reconnect-attempts property and fail after this number of attempts.
In some cases, you may want the client to automatically try the backup server it fails to make an initial connection to the live server. In this case, set the property FailoverOnInitialConnection, or failover-on-initial-connection in XML, on the ClientSessionFactoryImpl or HornetQConnectionFactory. The default value for this parameter is false.

Note

HornetQ does not replicate full server state between live and backup servers. When the new session is automatically recreated on the backup it will not have any knowledge of messages already sent or acknowledged in that session. Any in-flight sends or acknowledgments at the time of fail-over might also be lost.

37.2.1.1. Handling Blocking Calls During fail-over

If the client code is in a blocking call to the server, waiting for a response to continue its execution, when fail-over occurs, the new session will not have any knowledge of the call that was in progress. This call might otherwise hang for ever, waiting for a response that will never come.
To prevent this, HornetQ will unblock any blocking calls that were in progress at the time of fail-over by making them throw a javax.jms.JMSException (if using JMS), or a HornetQException with error code HornetQException.UNBLOCKED. It is up to the client code to catch this exception and retry any operations if desired.
If the method being unblocked is a call to commit(), or prepare(), then the transaction will be automatically rolled back and HornetQ will throw a javax.jms.TransactionRolledBackException (if using JMS), or a HornetQException with error code HornetQException.TRANSACTION_ROLLED_BACK if using the core API.

37.2.1.2. Handling fail-over With Transactions

If the session is transactional and messages have already been sent or acknowledged in the current transaction, then the server cannot be sure that messages sent or acknowledgments have not been lost during the fail-over.
Consequently the transaction will be marked as rollback-only, and any subsequent attempt to commit will throw a javax.jms.TransactionRolledBackException (if using JMS), or a HornetQException with error code HornetQException.TRANSACTION_ROLLED_BACK if using the core API.
It is up to the user to catch the exception, and perform any client side local rollback code as necessary. There is no need to manually rollback the session - it is already rolled back. The user can then just retry the transactional operations again on the same session.
If fail-over occurs when a commit call is being executed, the server, as previously described, will unblock the call to prevent a hang, since no response will come back. In this case it is not easy for the client to determine whether the transaction commit was actually processed on the live server before failure occurred.
To remedy this, the client can enable duplicate detection (Chapter 35, Duplicate Message Detection) in the transaction, and retry the transaction operations again after the call is unblocked. If the transaction had indeed been committed on the live server successfully before fail-over, then when the transaction is retried, duplicate detection will ensure that any durable messages resent in the transaction will be ignored on the server to prevent them getting sent more than once.

Note

By catching the rollback exceptions and retrying, catching unblocked calls and enabling duplicate detection, once and only once delivery guarantees for messages can be provided in the case of failure, guaranteeing 100% no loss or duplication of messages.

37.2.1.3. Handling fail-over With Non Transactional Sessions

If the session is non transactional, messages or acknowledgments can be lost in the event of fail-over.
To provide once and only once delivery guarantees for non transacted sessions too, enabled duplicate detection, and catch unblock exceptions as described in Section 37.2.1.1, “Handling Blocking Calls During fail-over”