17.9. Message Replication

17.9.1. HornetQ Message Replication

HornetQ supports the ability to continue functioning after failure of one or more of the servers. Part of this is achieved through failover support where client connections migrate from the live server to a backup server in the event of the live server failing. To keep the backup server current, messages are replicated from the live server to the backup server continuously through two strategies: shared store and replication. This section covers the replication strategy.

Warning

Only persistent messages are replicated. Any non-persistent messages do not survive failover.
Message replication between a live and a backup server is achieved via network traffic as the live and backup servers do not share the same data stores. All the journals are replicated between the two servers as long as the two servers are within the same cluster and have the same cluster username and password. All (persistent) data traffic received by the live server gets replicated to the backup server.
When the backup server comes online, it looks for, and connects to a live server to attempt synchronization. While it is synchronizing, it is unavailable as a backup server. Synchronization can take a long time depending on the amount of data to be synchronized and the network speed.
How a backup server looks for a live server to replicate data from depends on whether the backup-group-name parameter has been defined in the hornetq-configuration.xml file. A backup server will only connect to a live server that shares the same group name. In the absence of this parameter, a backup server will try and connect to any live server.
In the event of a live server failing, the correctly configured and synchronized backup server will take over its duties. The backup server will establish if the live server has failed if it is unable to connect to it but is still able to connect to more than half of the other servers within its cluster. If more than half of the other servers in the cluster also fail to respond it would indicate a general network failure and the backup server will wait to retry the connection to the live server.