18.16. High Availability

18.16.1. High Availability Introduction

HornetQ supports the ability to continue functioning after failure of one or more of the servers. Part of this is achieved through failover support where client connections migrate from the live server to a backup server in the event of the live server failing. To keep the backup server current, messages are replicated from the live server to the backup server continuously through two strategies: shared store and replication.
There are two types of high-availability Topologies:
  • Dedicated Topology: This topology comprises of two EAP servers. In the first server HornetQ is configured as a live server. In the second server HornetQ is configured as a backup server. The EAP server which has HornetQ configured as a backup server, acts only as a container for HornetQ. This server is inactive and can not host deployments like EJBs, MDBs or Servlets.
  • Collocated Topology: This topology contains two EAP servers. Each EAP server contains two HornetQ servers (a live server and a backup server). The HornetQ live server on first EAP server and the HornetQ backup server on the second EAP server form a live backup pair. Whereas the HornetQ live server on the second EAP server and the HornetQ backup server on the first EAP server form another live backup pair.
In collocated topology, as soon as a live HornetQ server (part of live-backup pair) fails, the backup HornetQ server takes up and becomes active. When the backup HornetQ server shuts down in case of failback then destinations and connection factories configured in the backup server are unbound from JNDI (Java Naming and Directory Interface).
Java Naming and Directory Interface is shared with the other live HornetQ server (part of the other live-backup pair). Therefore unbounding of destinations and connection factories from JNDI also unbounds destinations and connection factories for this live HornetQ server.

Important

Configuration of collocated backup servers cannot contain configuration of destinations or connection factories.

Note

The following information references standalone-full-ha.xml. The configuration changes can be applied to standalone-full-ha.xml, or any configuration files derived from it.

18.16.2. About HornetQ Shared Stores

When using a shared store, both the live and backup servers share the same, entire data directory, using a shared file system. This includes the paging directory, journal directory, large messages, and the binding journal. When failover occurs and the backup server takes over, it will load the persistent storage from the shared file system. Clients can then connect to it.
This form of high-availability differs from data replication, as it requires a shared file system accessible by both the live and backup nodes. This will usually be a high performance Storage Area Network (SAN) of some kind.
The advantage of shared store high-availability is that no replication occurs between the live and backup nodes. This means it does not suffer any performance penalties due to the overhead of replication during normal operation.
The disadvantage of shared store replication is that it requires a shared file system, and when the backup server activates it must load the journal from the shared store. This can take some time, depending on the amount of data in the store.
If the highest performance during normal operation is required, there is access to a fast SAN, and a slightly slower failover rate is acceptable (depending on the amount of data), shared store high-availability is recommended.

Note

HornetQ's data replication mechanism would replicate JMS data and not replicate bindings.

18.16.3. About HornetQ Storage Configurations

HornetQ supports shared storage when using the Red Hat Enterprise Linux version of NFSv4, either ASYNCIO or NIO journal type, for shared storage. The Red Hat Enterprise Linux NFS implementation supports both direct I/O (opening files with the O_DIRECT flag set), and kernel based asynchronous I/O. When configuring NFS for shared storage, it is recommended to use a highly-available NFS configuration.

Important

When using the Red Hat Enterprise Linux NFSv4 as a shared storage option, the client cache must be disabled.

18.16.4. About HornetQ Journal Types

Two journal types are available for HornetQ:
  • ASYNCIO
  • NIO
The ASYNCIO journal type, also known as AIO, is a thin native code wrapper around the Linux asynchronous IO library (AIO). Using native functionality can provide better performance than NIO. This journal type is only supported on Red Hat Enterprise Linux and requires that libaio and the Native Components package are installed where JBoss EAP 6 is running. See the Installation Guide for instructions on installing the Native Components package.

Important

Check the server log after JBoss EAP 6 is started, to ensure that the native library successfully loaded, and that the ASYNCIO journal type is being used. If the native library fails to load, HornetQ will revert to the NIO journal type, and this will be stated in the server log.
The NIO journal type uses standard Java NIO to interface with the file system. It provides very good performance and runs on all supported platforms.
To specify the HornetQ journal type, set the parameter <journal-type> in the Messaging subsystem.

18.16.5. Configuring HornetQ for Dedicated Topology with Shared Store

To configure the live and backup servers for shared store in dedicated topology, configure the standalone-X.xml files on each server to have the following:
<shared-store>true</shared-store>
<paging-directory path="${shared.directory}/paging"/>
<bindings-directory path="${shared.directory}/bindings"/>
<journal-directory path="${shared.directory}/journal"/>
<large-messages-directory path="${shared.directory}/large-messages"/>
.
.
.
<cluster-connections>
   <cluster-connection name="my-cluster">
      ...
   </cluster-connection>
</cluster-connections>

Table 18.14. HornetQ Servers Setup Attributes (for both live and backup servers)

Attribute Description
shared-store
Whether this server is using shared store or not. Default is false
paging-directory path
This indicates the path to the paging directory. This path is the same for both live and backup servers as they share this directory
bindings-directory path
This indicates the path to the binding journal. This path is the same for both live and backup servers as they share this journal
journal-directory path
This indicates the path to the journal directory. This path is the same for both live and backup servers as they share this directory
large-messages-directory path
This indicates the path to the large messages directory. This path is the same for both live and backup servers as they share this directory
failover-on-shutdown
Whether this server becomes active when live or currently active backup server shuts down
The backup server must also be flagged explicitly as a backup.
<backup>true</backup>
The setup attribute exclusively for HornetQ backup server is: allow-failback. This specifies whether the backup server will automatically shutdown if the original live server comes back up.

18.16.6. HornetQ Message Replication

Warning

Only persistent messages are replicated. Any non-persistent messages do not survive failover.
Message replication between a live and a backup server is achieved via network traffic as the live and backup servers do not share the same data stores. All the journals are replicated between the two servers as long as the two servers are within the same cluster and have the same cluster username and password. All persistent data traffic received by the live server gets replicated to the backup server.
When the backup server comes online, it looks for and connects to a live server to attempt synchronization. While it is synchronizing, it is unavailable as a backup server. Synchronization can take a long time depending on the amount of data to be synchronized and the network speed. If the backup server comes online and no live server is available, the backup server will wait until the live server is available in the cluster.
To enable servers to replicate data, a link must be defined between them in the standalone-full-ha.xml file. A backup server will only replicate with a live server with the same group name. The group name must be defined in the backup-group-name parameter in the standalone-full-ha.xml file on each server.
In the event of a live server failing, the correctly configured and fully synchronized backup server takes over its duties. The backup server will activate only if the live server has failed and the backup server is able to connect to more than half of the servers in the cluster. If more than half of the other servers in the cluster also fail to respond it would indicate a general network failure and the backup server will wait to retry the connection to the live server.
To get to the original state after failover, it is necessary to start the live server and wait until it is fully synchronized with the backup server. When this has been achieved, you can shutdown the backup server for the original live server to activate again. This happens automatically if the allow-failback attribute is set to true.

18.16.7. Configuring the HornetQ Servers for Replication

To configure the live and backup servers to be a replicating pair, configure the standalone-full-ha.xml files on each server to have the following settings:
<shared-store>false</shared-store>
<backup-group-name>NameOfLiveBackupPair</backup-group-name>
<check-for-live-server>true</check-for-live-server>
.
.
.
<cluster-connections>
   <cluster-connection name="my-cluster">
      ...
   </cluster-connection>
</cluster-connections>

Warning

Administrators must take care not to mix settings for shared store and replicated configurations. For example, the backup-group-name attribute, which is used for replication, should not be set when shared-store is set to true, which indicates a shared store.

Table 18.15. HornetQ Replicating Setup Attributes

Attribute Description
shared-store
Whether this server is using shared store or not. This value should be set to false for a replicated configuration. Default is false.
backup-group-name
This is the unique name which identifies a live/backup pair that should replicate with each other
check-for-live-server
If a replicated live server should check the current cluster to see if there is already a live server with the same node id. Default is false.
failover-on-shutdown
Whether this backup server (if it is a backup server) becomes the live server on a normal server shutdown. Default is false.
The backup server must also be flagged explicitly as a backup.
<backup>true</backup>

Table 18.16. HornetQ Backup Server Setup Attributes

Attribute Description
allow-failback
Whether this server will automatically shutdown if the original live server comes back up. Default is true.
max-saved-replicated-journal-size
The maximum number of backup journals to keep after failback occurs. Specifying this attribute is only necessary if allow-failback is true. Default value is 2, which means that after 2 failbacks the backup server must be restarted in order to be able to replicate journal from live server and become backup again.

18.16.8. About High-availability (HA) Failover

High-availability failover is available with either automatic client failover, or application-level failover, through a live-backup structure. Each live server has a backup server. Only one backup per live server is supported.
The backup server only takes over if the live server crashes and there is a failover. After the live server has been restarted, and if the allow-failback attribute is set to true, it becomes the live server again. When the original live server takes over, the backup server reverts to being backup for the live server.

Important

Clustering should be enabled even if you are not using the clustering capabilities. This is because each node of the HA cluster must have a cluster-connection to all of the other nodes, in order to negotiate roles with the other servers.
High availability cluster topology is achieved by the live and backup server as they send information about their connection details using IP multicasts. If IP multicasts can not be used, it is also possible to use a static configuration of the initial connections. After the initial connection, the client is informed about the topology. If the current connection is stale, the client establishes a new connection to another node.
After a live server has failed and a backup server has taken over, you will need to restart the live server and have clients fail back. To do this, restart the original live server and kill the new live server. You can do this by killing the process itself or wait for the server to crash on its own. You can also cause failover to occur on normal server shutdown, to enable this set the failover-on-shutdown property to true in the standalone.xml configuration file:
<failover-on-shutdown>true</failover-on-shutdown>
By default, the failover-on-shutdown property is set to false.
You can also force the new live server to shutdown when the old live server comes back up allowing the original live server to take over automatically by setting the allow-failback property to true in the standalone.xml configuration file:
<allow-failback>true</allow-failback>
In replication HA mode, to force the new live server to shutdown when the old live server comes back, set the check-for-live-server property to true in standalone.xml configuration file:
<check-for-live-server>true</check-for-live-server>

18.16.9. Deployments on HornetQ Backup Servers

In a dedicated HA environment, a JBoss EAP 6 server with HornetQ configured as a backup must not be used to host any deployments which use or connect to the HornetQ backup on that server. This includes deployments such as Enterprise Java Beans (Stateless Session Beans, Message Driven Beans), or servlets.
If a JBoss EAP 6 server has a HornetQ collocated backup configuration (where in the messaging subsystem there is a HornetQ server configured as 'live' and another HornetQ server configured as backup), then the JBoss EAP 6 server can host deployments as long as they are configured to connect to the 'live' HornetQ server.

18.16.10. HornetQ Failover Modes

HornetQ defines two types of client failover:
  • Automatic client failover
  • Application-level client failover
HornetQ provides transparent automatic reattachment of connections to the same server, for example, in case of transient network problems. This is similar to failover, except it is reconnecting to the same server.
During failover, if the client has consumers on any non persistent or temporary queues, those queues are automatically recreated during failover on the backup node, since the backup node does not have any information about non persistent queues.

18.16.11. Automatic Client Failover

HornetQ clients can be configured to receive information about live and backup servers, this information helps in event of client connection failure - live server connection, the client detects failover and reconnects to the backup server. The backup server automatically recreates any sessions and consumers that existed on each connection before failover, thus saving the user from having to hand-code manual reconnection logic.
HornetQ clients detect connection failure if packets are not received from the server within the time specified in client-failure-check-period. If the client does not receive data in time, the client assumes the connection has failed and attempts failover. If the socket is closed by the operating system, the server process is killed rather than the machine itself crashing, then the client immediately initiates failover.
HornetQ clients can be configured in different ways to discover the list of live-backup server groups. The client can be configured explicitly or use server discovery for the client to automatically discover the list. Alternatively, the clients can explicitly connect to a specific server and download the current servers and backups.
To enable automatic client failover, the client must be configured to allow non-zero reconnection attempts.
By default, failover only occurs after at least one connection has been made to the live server. The client retries connecting to the live server as specified in the reconnect-attempts property and fails after the specified number of attempts.

18.16.12. Application-Level Failover

In some cases, as per your requirement, you could handle any connection failure manually by specifying reconnection logic in a custom failure handler. You can define this as application-level failover, since the failover is handled at the user application level.
To implement application-level failover, if you are using JMS, you need to set an ExceptionListener class on the JMS connection. If a connection failure is detected, the ExceptionListener class is called by HornetQ. In your ExceptionListener, close the old JMS connections, look up for new connection factory instances from JNDI and create new connections.
If you are using the core API, then the procedure is very similar: set a FailureListener on the core ClientSession instances.