When disaster strikes, a highly available messaging system can continue to deliver messages successfully.
If planned for, power outages and network, hardware, or software failures need not obstruct message delivery. Employing multiple brokers, each configured to persist messages across multiple machines, and configuring consumers to failover to a functioning broker provide a robust solution.
To build fault tolerance into a messaging system, you could run multiple standalone brokers, each on separate machines connected together over a network. When one machine or broker failed, clients, using the failover protocol, could automatically reconnect to a broker on another of the networked machines and continue working.
To implement fault tolerance, you need to set up master/slave topologies, which enable master brokers to replicate messages to their slave brokers (see Master/Slave topologies), and to configure clients to use the failover protocol (see Failover Protocol in Fault Tolerant Messaging).
![]() | Tip |
|---|---|
Stand-alone brokers in this scenario know nothing about the consumers on any of the other brokers. Consequently, if one of these brokers had no consumers, messages sent to it would pile up without being processed. The Network of Brokers feature solves this problem (see Network of brokers). |
A master/slave topology defines an initial master broker and one or more initial slave brokers. When the master broker fails, one of the slave brokers starts up and takes over, becoming the new master broker. Clients using the failover protocol can reconnect to the new master broker and resume processing as normal.
Slave brokers have access to all messages sent to the master broker. Whether all messages are replicated from the master broker to the slaves depends on whether or not the topology includes a shared message store.
Shared nothing master/slave networks—Both master and slave have their own independent message store. The master replicates all message commands (messages, acknowledgements, subscriptions, transactions, and so on) to the slave via a special connector,
masterConnectorURI, before acting on them.Shared-nothing clusters are very resilient because they have no single point of failure. However, they also have a number of drawbacks:
Persistent messaging suffers additional latency because producers must wait for messages to be replicated to the slave and stored in the slave's persistent store.
Brokers do not autosynchronize with each other.
Only one slave per broker is allowed.
Reintroducing a failed master into the cluster involves shutting down the entire cluster and manually synchronizing the databases.
Shared database or file system master/slave networks—Multiple brokers share the same relational database or file system, but only one broker can be active at any given time. The broker that gets the lock to the database or file system becomes the master. Brokers polling for the lock or attempting to connect to the message store after the master is established automatically become slaves.
Resynchronization is not an issue, and failback occurs automatically (see Figure 7.2 and Built-in Broker Fault Tolerance for an example use case).
For details, see the Fault Tolerant Messaging.






![[Tip]](imagesdb/tip.gif)


