Show Table of Contents
18.16. High Availability
18.16.1. High Availability Introduction
HornetQ supports the ability to continue functioning after failure of one or more of the servers. Part of this is achieved through failover support where client connections migrate from the live server to a backup server in the event of the live server failing. To keep the backup server current, messages are replicated from the live server to the backup server continuously through two strategies: shared store and replication.
There are two types of high-availability Topologies:
In collocated topology, as soon as a live HornetQ server (part of live-backup pair) fails, the backup HornetQ server takes up and becomes active. When the backup HornetQ server shuts down in case of failback then destinations and connection factories configured in the backup server are unbound from JNDI (Java Naming and Directory Interface).
Dedicated Topology: This topology comprises of two EAP servers. In the first server HornetQ is configured as a live server. In the second server HornetQ is configured as a backup server. The EAP server which has HornetQ configured as a backup server, acts only as a container for HornetQ. This server is inactive and can not host deployments like EJBs, MDBs or Servlets.
Collocated Topology: This topology contains two EAP servers. Each EAP server contains two HornetQ servers (a live server and a backup server). The HornetQ live server on first EAP server and the HornetQ backup server on the second EAP server form a live backup pair. Whereas the HornetQ live server on the second EAP server and the HornetQ backup server on the first EAP server form another live backup pair.
Java Naming and Directory Interface is shared with the other live HornetQ server (part of the other live-backup pair). Therefore unbounding of destinations and connection factories from JNDI also unbounds destinations and connection factories for this live HornetQ server.
Configuration of collocated backup servers cannot contain configuration of destinations or connection factories.
The following information references
standalone-full-ha.xml. The configuration changes can be applied to
standalone-full-ha.xml, or any configuration files derived from it.
18.16.3. About HornetQ Storage Configurations
HornetQ supports shared storage when using the Red Hat Enterprise Linux version of NFSv4, either ASYNCIO or NIO journal type, for shared storage. The Red Hat Enterprise Linux NFS implementation supports both direct I/O (opening files with the O_DIRECT flag set), and kernel based asynchronous I/O. When configuring NFS for shared storage, it is recommended to use a highly-available NFS configuration.
When using the Red Hat Enterprise Linux NFSv4 as a shared storage option, the client cache must be disabled.
18.16.4. About HornetQ Journal Types
Two journal types are available for HornetQ:
The ASYNCIO journal type, also known as AIO, is a thin native code wrapper around the Linux asynchronous IO library (AIO). Using native functionality can provide better performance than NIO. This journal type is only supported on Red Hat Enterprise Linux and requires that
libaioand the Native Components package are installed where JBoss EAP 6 is running. See the Installation Guide for instructions on installing the Native Components package.
Check the server log after JBoss EAP 6 is started, to ensure that the native library successfully loaded, and that the ASYNCIO journal type is being used. If the native library fails to load, HornetQ will revert to the NIO journal type, and this will be stated in the server log.
The NIO journal type uses standard Java NIO to interface with the file system. It provides very good performance and runs on all supported platforms.
To specify the HornetQ journal type, set the parameter
18.16.6. HornetQ Message Replication
Only persistent messages are replicated. Any non-persistent messages do not survive failover.
Message replication between a live and a backup server is achieved via network traffic as the live and backup servers do not share the same data stores. All the journals are replicated between the two servers as long as the two servers are within the same cluster and have the same cluster username and password. All persistent data traffic received by the live server gets replicated to the backup server.
When the backup server comes online, it looks for and connects to a live server to attempt synchronization. While it is synchronizing, it is unavailable as a backup server. Synchronization can take a long time depending on the amount of data to be synchronized and the network speed. If the backup server comes online and no live server is available, the backup server will wait until the live server is available in the cluster.
To enable servers to replicate data, a link must be defined between them in the
standalone-full-ha.xmlfile. A backup server will only replicate with a live server with the same group name. The group name must be defined in the
backup-group-nameparameter in the
standalone-full-ha.xmlfile on each server.
In the event of a live server failing, the correctly configured and fully synchronized backup server takes over its duties. The backup server will activate only if the live server has failed and the backup server is able to connect to more than half of the servers in the cluster. If more than half of the other servers in the cluster also fail to respond it would indicate a general network failure and the backup server will wait to retry the connection to the live server.
To get to the original state after failover, it is necessary to start the live server and wait until it is fully synchronized with the backup server. When this has been achieved, you can shutdown the backup server for the original live server to activate again. This happens automatically if the
allow-failbackattribute is set to true.
18.16.7. Configuring the HornetQ Servers for Replication
To configure the live and backup servers to be a replicating pair, configure the
standalone-full-ha.xmlfiles on each server to have the following settings:
<shared-store>false</shared-store> <backup-group-name>NameOfLiveBackupPair</backup-group-name> <check-for-live-server>true</check-for-live-server> . . . <cluster-connections> <cluster-connection name="my-cluster"> ... </cluster-connection> </cluster-connections>
Administrators must take care not to mix settings for shared store and replicated configurations. For example, the
backup-group-nameattribute, which is used for replication, should not be set when
shared-storeis set to
true, which indicates a shared store.
Table 18.15. HornetQ Replicating Setup Attributes
Whether this server is using shared store or not. This value should be set to false for a replicated configuration. Default is false.
This is the unique name which identifies a live/backup pair that should replicate with each other
If a replicated live server should check the current cluster to see if there is already a live server with the same node id. Default is false.
Whether this backup server (if it is a backup server) becomes the live server on a normal server shutdown. Default is false.
The backup server must also be flagged explicitly as a backup.
Table 18.16. HornetQ Backup Server Setup Attributes
Whether this server will automatically shutdown if the original live server comes back up. Default is true.
The maximum number of backup journals to keep after failback occurs. Specifying this attribute is only necessary if allow-failback is true. Default value is 2, which means that after 2 failbacks the backup server must be restarted in order to be able to replicate journal from live server and become backup again.
18.16.8. About High-availability (HA) Failover
High-availability failover is available with either automatic client failover, or application-level failover, through a live-backup structure. Each live server has a backup server. Only one backup per live server is supported.
The backup server only takes over if the live server crashes and there is a failover. After the live server has been restarted, and if the
allow-failbackattribute is set to true, it becomes the live server again. When the original live server takes over, the backup server reverts to being backup for the live server.
Clustering should be enabled even if you are not using the clustering capabilities. This is because each node of the HA cluster must have a cluster-connection to all of the other nodes, in order to negotiate roles with the other servers.
High availability cluster topology is achieved by the live and backup server as they send information about their connection details using IP multicasts. If IP multicasts can not be used, it is also possible to use a static configuration of the initial connections. After the initial connection, the client is informed about the topology. If the current connection is stale, the client establishes a new connection to another node.
After a live server has failed and a backup server has taken over, you will need to restart the live server and have clients fail back. To do this, restart the original live server and kill the new live server. You can do this by killing the process itself or wait for the server to crash on its own. You can also cause failover to occur on normal server shutdown, to enable this set the
failover-on-shutdownproperty to true in the
By default, the
failover-on-shutdownproperty is set to false.
You can also force the new live server to shutdown when the old live server comes back up allowing the original live server to take over automatically by setting the
allow-failbackproperty to true in the
In replication HA mode, to force the new live server to shutdown when the old live server comes back, set the
check-for-live-serverproperty to true in
18.16.9. Deployments on HornetQ Backup Servers
In a dedicated HA environment, a JBoss EAP 6 server with HornetQ configured as a backup must not be used to host any deployments which use or connect to the HornetQ backup on that server. This includes deployments such as Enterprise Java Beans (Stateless Session Beans, Message Driven Beans), or servlets.
If a JBoss EAP 6 server has a HornetQ collocated backup configuration (where in the messaging subsystem there is a HornetQ server configured as 'live' and another HornetQ server configured as backup), then the JBoss EAP 6 server can host deployments as long as they are configured to connect to the 'live' HornetQ server.
18.16.10. HornetQ Failover Modes
HornetQ defines two types of client failover:
- Automatic client failover
- Application-level client failover
HornetQ provides transparent automatic reattachment of connections to the same server, for example, in case of transient network problems. This is similar to failover, except it is reconnecting to the same server.
During failover, if the client has consumers on any non persistent or temporary queues, those queues are automatically recreated during failover on the backup node, since the backup node does not have any information about non persistent queues.
18.16.11. Automatic Client Failover
HornetQ clients can be configured to receive information about live and backup servers, this information helps in event of client connection failure - live server connection, the client detects failover and reconnects to the backup server. The backup server automatically recreates any sessions and consumers that existed on each connection before failover, thus saving the user from having to hand-code manual reconnection logic.
HornetQ clients detect connection failure if packets are not received from the server within the time specified in
client-failure-check-period. If the client does not receive data in time, the client assumes the connection has failed and attempts failover. If the socket is closed by the operating system, the server process is killed rather than the machine itself crashing, then the client immediately initiates failover.
HornetQ clients can be configured in different ways to discover the list of live-backup server groups. The client can be configured explicitly or use server discovery for the client to automatically discover the list. Alternatively, the clients can explicitly connect to a specific server and download the current servers and backups.
To enable automatic client failover, the client must be configured to allow non-zero reconnection attempts.
By default, failover only occurs after at least one connection has been made to the live server. The client retries connecting to the live server as specified in the
reconnect-attemptsproperty and fails after the specified number of attempts.
18.16.12. Application-Level Failover
In some cases, as per your requirement, you could handle any connection failure manually by specifying reconnection logic in a custom failure handler. You can define this as application-level failover, since the failover is handled at the user application level.
To implement application-level failover, if you are using JMS, you need to set an ExceptionListener class on the JMS connection. If a connection failure is detected, the ExceptionListener class is called by HornetQ. In your ExceptionListener, close the old JMS connections, look up for new connection factory instances from JNDI and create new connections.
If you are using the core API, then the procedure is very similar: set a FailureListener on the core ClientSession instances.