Red Hat DocumentationFuse ESBToggle FramesPrintFeedback

Shared File System Master/Slave

Overview

If you have an existing high availability (HA) infrastructure based on a shared file system (SAN file system), it is relatively easy to set up a fault-tolerant cluster of brokers using this technology. Brokers automatically configure themselves to operate in master mode or slave mode, depending on whether or not they manage to grab an exclusive lock on the underlying data directory. Replication of data is managed by the underlying SAN file system.

Storage area networks

A storage area network (SAN) is a storage system that enables you to attach remote storage devices (such as disk drives or tape drives) to a computer, making them appear as if they were local storage devices. A distinctive feature of the SAN architecture is that it combines data centres, physically separated by large distances, into a single storage network. With the addition of suitable software or hardware, a SAN system can be designed to provide data replication across multiple data centres, making it ideal for disaster recovery.

Because of the exceptional demands for high speed and reliability in a storage network (where gigabit bandwidths are required), SANs were originally built using dedicated fibre-optic or twisted copper wire cables and associated protocols, such as Fiber Channel (FC), were developed for this purpose. Alternative network protocols, such as FCoE and iSCSI, have also been developed to enable SANS to be built over high-speed Ethernet networks.

For more details about SANs, see the storage area network Wikipedia article.

SAN filesystems

The SAN itself defines only block-level access to storage devices. Another layer of software, the SAN file system or shared disk file system, is needed to provide the file-level access, implementing the file and directory abstractions. Because the SAN file system can be shared between multiple computers, it must also be capable of regulating concurrent access to files. File locking is, therefore, an important feature of a SAN file system.

Exclusive file lock on a shared file system

A SAN file system must implement an efficient and reliable system of file locking to ensure that different computers cannot write to the same file at the same time. The shared file system master/slave failover pattern depends on a reliable file locking mechanism in order to function correctly.

Warning

OCFS2 is incompatible with this failover pattern, because mutex file locking from Java is not supported.

Warning

NFSv3 is incompatible with this failover pattern. In the event of an abnormal termination of a master broker, which is an NFSv3 client, the NFSv3 server does not time out the lock held by the client. This renders the Fuse Message Broker data directory inaccessible, because the slave broker cannot acquire the lock and therefore cannot start up. In this case, the only way to unblock the failover cluster in NFSv3 is to reboot all broker instances.

On the other hand, NFSv4 is compatible with this failover pattern, because its design includes timeouts for locks. When an NFSv4 client holding a lock terminates abnormally, the lock is automatically released after 30 seconds, allowing another NFSv4 client to grab the lock.

Creating master and slave brokers

In the shared file system master/slave pattern, there is nothing special to distinguish a master broker from the slave brokers. There can be any number of brokers in a failover cluster. Membership of a particular failover cluster is defined by the fact that all of the brokers in the cluster use the same persistence layer and store their data in the same shared directory. The brokers in the cluster therefore compete to grab the exclusive lock on the data file. The first broker to grab the exclusive lock is the master and all of the other brokers in the cluster are the slaves. The master and the slaves now behave as follows:

  • The master retains the exclusive lock on the data file, preventing the other brokers from accessing the data. The master starts up its transport connectors and network connectors, enabling other messaging clients and message brokers to connect to it.

  • The slaves keep attempting to grab the lock on the data file, but they do not succeed as long as the master is running. The slaves do not start up any transport connectors or network connectors and are thus inaccessible to messaging clients and brokers.

Sample broker configuration

The only condition that brokers in a cluster must satisfy is that they all must use the same persistence layer and the persistence layer must put its data into a directory in a shared file system. For example, assuming that /sharedFileSystem/sharedBrokerData is a directory in a shared file system, you could configure a Kaha DB persistence layer as follows:

Example 14. Shared file cluster configuration

<persistenceAdapter>
  <kahaDB directory="/sharedFileSystem/sharedBrokerData"/>
</persistenceAdapter>

Alternatively, the AMQ persistence layer is also suitable for this failover scenario:

Example 15. Alternate shared file cluster configuration

<persistenceAdapter>
  <amqPersistenceAdapter directory="/sharedFileSystem/sharedBrokerData"/>
</persistenceAdapter>

Sample client configuration

Clients of the failover cluster must be configured with a failover URL that lists the URLs for all of the brokers in the cluster. For example, assuming that there are three brokers in the cluster, deployed on the hosts, broker1, broker2, and broker3, and all listening on IP port 61616, you could use the following failover URL for the clients:

failover:(tcp://broker1:61616,tcp://broker2:61616,tcp://broker3:61616)

In this case, it does not matter in which order the clients attempt to connect to the brokers, because the identity of the master broker is determined by chance: that is, by whichever broker is the first to grab the exclusive lock on the data file.

Initial state

Figure 25 shows the initial state of a shared file system master/slave cluster. When all of the brokers in the cluster are started, one of them grabs the exclusive lock on the broker data file, thus becoming the master. All of the other brokers in the clusters remain slaves and pause while waiting for the exclusive lock to be freed up. Only the master starts its transport connectors, so all of the clients connect to it.

Figure 25. Shared File System Initial State

Shared File System Initial State

After failure of the master

Figure 26 shows the state of the cluster after the original master has shut down or failed. As soon as the master gives up the lock (or after a suitable timeout, if the master crashes), the lock on the broker data file frees up and another broker in the cluster grabs the lock and gets promoted to master (broker2 in the figure).

Figure 26. Shared File System after Master Failure

Shared File System after Master Failure

After the clients lose their connection to the original master, they automatically try all of the other brokers listed in the failover URL. This enables them to find and connect to the new master.

After restarting the failed master

You can restart the failed master at any time and it will rejoin the cluster. Initially, however, it will have the status of a slave broker, because one of the other brokers already owns the exclusive lock on the broker data file, as shown in Figure 27.

Figure 27. Shared File System after Master Restart

Shared File System after Master Restart

Comments powered by Disqus