Chapter 9. High Availability

9.1. Clustering (High Availability)

9.1.1. Changes to Clustering in MRG 3

MRG 3 replaces the cluster module with the new ha module. This module provides active-passive clustering functionality for high availability.
The cluster module in MRG 2 was active-active: clients could connect to any broker in the cluster. The new ha module is active-passive. Exactly one broker acts as primary the other brokers act as backup. Only the primary accepts client connections. If a client attempts to connect to a backup broker, the connection is aborted and the client fails-over until it connects to the primary.
The new ha module also supports a virtual IP address. Clients can be configured with a single IP address that is automatically routed to the primary broker. This is the recommended configuration.
The fail-over exchange is provided for backwards compatibility. New implementations should use a virtual IP address instead.
Improvement to multi-threaded performance

In MRG 2, a clustered broker would only utilize a single CPU thread. Some users worked around this by running multiple clustered brokers on a single machine, to utilize the multiple cores.

In MRG 3, a clustered broker now utilizes multiple threads and can take advantage of multi-core CPUs.

9.1.2. Active-Passive Messaging Clusters

The High Availability (HA) module provides active-passive, hot-standby messaging clusters to provide fault tolerant message delivery.
In an active-passive cluster only one broker, known as the primary, is active and serving clients at a time. The other brokers are standing by as backups. Changes on the primary are replicated to all the backups so they are always up-to-date or "hot". Backup brokers reject client connection attempts, to enforce the requirement that clients only connect to the primary.
If the primary fails, one of the backups is promoted to take over as the new primary. Clients fail-over to the new primary automatically. If there are multiple backups, the other backups also fail-over to become backups of the new primary.
This approach relies on an external cluster resource manager, rgmanager, to detect failures, choose the new primary and handle network partitions.

9.1.3. Avoiding Message Loss

To avoid message loss, the primary broker sends acknowledgment of messages received from clients when the message has been replicated and acknowledged by all of the back-up brokers, or has been consumed from the primary queue.
This ensures that all acknowledged messages are safe: they have either been consumed or backed up to all backup brokers. Messages that are consumed before they are replicated do not need to be replicated. This reduces the work load when replicating a queue with active consumers.
Clients keep unacknowledged messages in the client replay buffer until they are acknowledged by the primary. If the primary fails, clients will fail-over to the new primary and re-send all their unacknowledged messages.
If the primary crashes, all the acknowledged messages will be available on the backup that takes over as the new primary. The unacknowledged messages will be re-sent by the clients. Thus no messages are lost.
Note that this means it is possible for messages to be duplicated. In the event of a failure it is possible for a message to be received by the backup that becomes the new primary and re-sent by the client. The application must take steps to identify and eliminate duplicates.
When a new primary is promoted after a fail-over it is initially in "recovering" mode. In this mode, it delays acknowledgment of messages on behalf of all the backups that were connected to the previous primary. This protects those messages against a failure of the new primary until the backups have a chance to connect and catch up.
Not all messages need to be replicated to the back-up brokers. If a message is consumed and acknowledged by a regular client before it has been replicated to a backup, then it doesn't need to be replicated.

9.1.4. HA Broker States

Joining
Initial status of a new broker that has not yet connected to the primary.
Catch-up
A backup broker that is connected to the primary and catching up on queues and messages.
Ready
A backup broker that is fully caught-up and ready to take over as primary.
Recovering
The newly-promoted primary, waiting for backups to connect and catch up.
Active
The active primary broker with all backups connected and caught-up.

9.1.5. Limitations in HA in MRG 3

There are some limitations to HA support in MRG 3:
  • HA replication is limited to 65434 queues.
  • Manual reallocation of qpidd-primary service cannot be done to a node where the qpid broker is not in ready state (is stopped, or either in catchup or joining state).
  • Failback with cluster ordered failover-domains ('ordered=1' in cluster.conf) can cause an infinite failover loop under certain conditions. To avoid this, use cluster ordered failover-domains with nofailback=1 specified in cluster.conf.
  • Local transactional changes are replicated atomically. If the primary crashes during a local transaction, no data is lost. Distributed transactions are not yet supported by HA Cluster.
  • Configuration changes (creating or deleting queues, exchanges and bindings) are replicated asynchronously. Management tools used to make changes will consider the change complete when it is complete on the primary, however it may not yet be replicated to all the backups.
  • Federation links to the primary will not fail over correctly. Federated links from the primary will be lost in fail over, they will not be re-connected to the new primary. It is possible to work around this by replacing the qpidd-primary start up script with a script that re-creates federation links when the primary is promoted.

9.1.6. Broker HA Options

Options for the qpid-ha Broker Utility

ha-cluster yes|no
Set to "yes" to have the broker join a cluster.
ha-queue-replication yes|no
Enable replication of specific queues without joining a cluster.
ha-brokers-url URL
The URL used by cluster brokers to connect to each other. The URL must contain a comma separated list of the broker addresses, rather than a virtual IP address.
The full format of the URL is given by this grammar:
url = ["amqp:"][ user ["/" password] "@" ] addr ("," addr)*
addr = tcp_addr / rmda_addr / ssl_addr / ...
tcp_addr = ["tcp:"] host [":" port]
rdma_addr = "rdma:" host [":" port]
ssl_addr = "ssl:" host [":" port]'>
ha-public-url URL
This option is only needed for backwards compatibility if you have been using the amq.failover exchange. This exchange is now obsolete, it is recommended to use a virtual IP address instead.
If set, this URL is advertized by the amq.failover exchange and overrides the broker option known-hosts-url.
ha-replicate VALUE
Specifies whether queues and exchanges are replicated by default. VALUE is one of: none, configuration, all.
ha-username USER, ha-password PASS, ha-mechanism MECHANISM
Authentication settings used by HA brokers to connect to each other. If you are using authorization then this user must have all permissions.
ha-backup-timeout SECONDS
Maximum time that a recovering primary will wait for an expected backup to connect and become ready.
Values specified as SECONDS can be a fraction of a second, e.g. "0.1" for a tenth of a second. They can also have an explicit unit, e.g. 10s (seconds), 10ms (milliseconds), 10us (microseconds), 10ns (nanoseconds)
link-maintenance-interval SECONDS
HA uses federation links to connect from backup to primary. Backup brokers check the link to the primary on this interval and re-connect if need be. Default 2 seconds. Can be set lower for faster failover (e.g. 0.1 seconds). Setting too low will result in excessive link-checking on the backups.
link-heartbeat-interval SECONDS
The number of seconds to wait for a federated link heart beat or the timeout for broker status checks.
By default this is 120 seconds. Provide a lower value (for example, 10 seconds) to enable faster failover detection in a HA scenario. If the value is set too low, a slow broker may be considered as failed and will be killed.
If no heartbeat is received for twice this interval the primary will consider that backup dead (e.g. if backup is hung or partitioned.)
It may take up to this interval for rgmanager to detect a hung or partitioned broker. The primary may take up to twice this interval to detect a hung or partitioned backup. Clients sending messages may also be delayed during this time.
To configure a HA cluster you must set at least ha-cluster and ha-brokers-url.

9.1.7. Firewall Configuration for Clustering

The following ports are used on a clustered system, and must be opened on the firewall:

Table 9.1. Ports Used by Clustered Systems

Port Protocol Component
5404 UDP cman
5405 UDP cman
5405 TCP luci
8084 TCP luci
11111 TCP ricci
14567 TCP gnbd
16851 TCP modclusterd
21064 TCP dlm
50006 TCP ccsd
50007 UDP ccsd
50008 TCP ccsd
50009 TCP ccsd
The following iptables commands, when run with root privileges, will configure the system to allow communication on these ports.
iptables -I INPUT -p udp -m udp --dport 5405  -j ACCEPT
iptables -I INPUT -p tcp -m tcp --dport 5405  -j ACCEPT
iptables -I INPUT -p tcp -m tcp --dport 8084  -j ACCEPT
iptables -I INPUT -p tcp -m tcp --dport 11111  -j ACCEPT
iptables -I INPUT -p tcp -m tcp --dport 14567  -j ACCEPT
iptables -I INPUT -p tcp -m tcp --dport 16851  -j ACCEPT
iptables -I INPUT -p tcp -m tcp --dport 21064  -j ACCEPT
iptables -I INPUT -p tcp -m tcp --dport 50006  -j ACCEPT
iptables -I INPUT -p udp -m udp --dport 50007  -j ACCEPT
iptables -I INPUT -p tcp -m tcp --dport 50008  -j ACCEPT
iptables -I INPUT -p tcp -m tcp --dport 50009  -j ACCEPT
service iptables save
service iptables restart

9.1.8. ACL Requirements for Clustering

Clustering requires federation links between brokers. When the broker has auth=yes, all federation links are disallowed by default. The following ACL rule is required to allow the federation used by HA Clustering:
acl allow <ha-username> all all

9.1.9. Cluster Resource Manager (rgmanager)

Broker fail-over is managed by the resource manager rgmanager.
The resource manager is responsible for starting the qpidd broker on each node in the cluster. The resource manager then promotes one of the brokers to be the primary. The other brokers connect to the primary as backups, using the URL provided in the ha-brokers-url configuration option.
Once connected, the backup brokers synchronize their state with the primary. When a backup is synchronized, or "hot", it is ready to take over if the primary fails. Backup brokers continually receive updates from the primary in order to stay synchronized.
If the primary fails, backup brokers go into fail-over mode. The resource manager detects the failure and promotes one of the backups to be the new primary. The other backups connect to the new primary and synchronize their state with it.
The resource manager also protects the cluster from split-brain conditions resulting from a network partition. A network partition divides a cluster into two sub-groups which cannot see each other. A quorum voting algorithm disables nodes in the inquorate sub-group.

9.1.10. Install HA Cluster Components

Procedure 9.1. Qpidd HA Component Installation Steps

  1. Open a terminal and switch to the superuser account.
  2. Run yum install qpid-cpp-server-ha to install all required components.

Procedure 9.2. Red Hat Linux HA Cluster Components Installation Steps

  1. Subscribe the system to the "RHEL Server High Availability" channel.
  2. Open a terminal and switch to the superuser account.
  3. Run yum install -y rgmanager ccs to install all required components.
  4. Disable the Network Manager before starting HA Clustering. HA Clustering will not work correctly with Network Manager started or enabled
    # chkconfig NetworkManager off
  5. Activate rgmanager, cman and ricci services.
    # chkconfig rgmanager on
    # chkconfig cman on
    # chkconfig ricci on
  6. Deactivate the qpidd service.
    # chkconfig qpidd off
    The qpidd service must be off in chkconfig because rgmanager will start and stop qpidd. If the normal system init process also attempts to start and stop qpidd it can cause rgmanager to lose track of qpidd processes.
    If qpidd is not turned off, clustat shows a qpidd service to be stopped when in fact there is a qpidd process running. In this situation, the qpidd log shows errors similar to this:
    critical Unexpected error: Daemon startup failed: Cannot lock /var/lib/qpidd/lock: Resource temporarily unavailable

9.1.11. Virtual IP Addresses

Qpid HA Clustering supports virtual IP addresses. A virtual IP address is an IP address that is relocated to the primary node in the cluster whenever a promotion occurs. The resource manager associates this address with the primary node in the cluster, and relocates it to the new primary when there is a failure. This simplifies configuration as you can publish a single IP address rather than a list.
The Virtual IP address must be correctly configured on each node. The cluster manager's only job must be to start and stop the Virtual IP address. If you use a Virtual IP address on the same network as the node's physical adapter, then the cluster manager can bind the address to the adapter when the node is promoted to primary, and no further configuration is required. If the Virtual IP address is on a different network than the physical adapter, then a second physical adapter or a virtual network interface must be configured for the Virtual IP.

9.1.12. Configure HA Cluster

MRG Messaging brokers can be clustered using cman and rgmanager to create an active-passive, hot-standby qpidd HA cluster. For further information on the underlying clustering technologies cman and rgmanager, refer to the Red Hat Enterprise Linux Cluster Administration Guide.
HA Clustering uses the /etc/cluster/cluster.conf file to configure cman and rgmanager.

Note

Broker management is required for HA to operate. It is enabled by default. The option mgmt-enable must not be set to "no".

Note

Incorrect security settings are a common cause of problems when getting started, see Section 9.1.19, “Security”.
The tool ccs provides a high-level user-friendly mechanism to configure the cluster.conf file, and is the recommended method for configuring a cluster. Refer to the Red Hat Enterprise Linux Cluster Administration Guide for more information on using the ccs tool.
The following steps use ccs to create an example cluster of 3 nodes named node1, node2 and node3. Run the following as the root user:
  1. Start the ricci service:
    service ricci start
  2. If you have not previously set the ricci password, set it now:
    passwd ricci
  3. Create a new cluster:
    ccs -h localhost --createcluster qpid-test
  4. Add three nodes:
    ccs -h localhost --addnode node1.example.com
    ccs -h localhost --addnode node2.example.com
    ccs -h localhost --addnode node3.example.com
  5. Add a failoverdomain for each:
    ccs -h localhost --addfailoverdomain node1-domain restricted
    ccs -h localhost --addfailoverdomain node2-domain restricted
    ccs -h localhost --addfailoverdomain node3-domain restricted
  6. Add a failoverdomainnode for each:
    ccs -h localhost --addfailoverdomainnode node1-domain node1.example.com
    ccs -h localhost --addfailoverdomainnode node2-domain node2.example.com
    ccs -h localhost --addfailoverdomainnode node3-domain node3.example.com
  7. Add the scripts:
    ccs -h localhost --addresource script name=qpidd file=/etc/init.d/qpidd
    ccs -h localhost --addresource script name=qpidd-primary file=/etc/init.d/qpidd-primary
  8. Add the Virtual IP Address:
    ccs -h localhost --addresource ip address=20.0.20.200 monitor_link=1
  9. Add the qpidd service for each node. It should be restarted if it fails:
    ccs -h host --addservice node1-qpidd-service domain=node1-domain recovery=restart
    ccs -h localhost --addsubservice node1-qpidd-service script ref=qpidd
    ccs -h localhost --addservice node2-qpidd-service domain=node2-domain recovery=restart
    ccs -h localhost --addsubservice node2-qpidd-service script ref=qpidd
    ccs -h localhost --addservice node3-qpidd-service domain=node3-domain recovery=restart
    ccs -h localhost --addsubservice node3-qpidd-service script ref=qpidd
  10. Add the primary qpidd service. It only runs on a single node at a time, and can run on any node:
    ccs --host localhost --addservice qpidd-primary-service recovery=relocate autostart=1 exclusive=0
    ccs -h localhost --addsubservice qpidd-primary-service script ref=qpidd-primary
    ccs -h localhost --addsubservice qpidd-primary-service ip ref=20.0.20.200
Here is a commented version of the /etc/cluster/cluster.conf file produced by the previous steps:
<?xml version="1.0"?>
<!--
This is an example of a cluster.conf file to run qpidd HA under rgmanager.
This example configures a 3 node cluster, with nodes named node1, node2 and node3.

NOTE: fencing is not shown, you must configure fencing appropriately for your cluster.
-->

<cluster name="qpid-test" config_version="18">
    <!-- The cluster has 3 nodes. Each has a unique nodeid and one vote
         for quorum. -->
    <clusternodes>
        <clusternode name="node1.example.com" nodeid="1"/>
        <clusternode name="node2.example.com" nodeid="2"/>
        <clusternode name="node3.example.com" nodeid="3"/>
    </clusternodes>
    
    <!-- Resouce Manager configuration. -->
    <rm>
        <!--
	    There is a failoverdomain for each node containing just that node.
	    This specifies that the qpidd service should always run on each node.
         -->

        <failoverdomains>
            <failoverdomain name="node1-domain" restricted="1">
                <failoverdomainnode name="node1.example.com"/>
            </failoverdomain>
            <failoverdomain name="node2-domain" restricted="1">
                <failoverdomainnode name="node2.example.com"/>
            </failoverdomain>
            <failoverdomain name="node3-domain" restricted="1">
                <failoverdomainnode name="node3.example.com"/>
            </failoverdomain>
        </failoverdomains>

        <resources>
            <!-- This script starts a qpidd broker acting as a backup. -->
            <script file="/etc/init.d/qpidd" name="qpidd"/>

            <!-- This script promotes the qpidd broker on this node to primary. -->
            <script file="/etc/init.d/qpidd-primary" name="qpidd-primary"/>

            <!-- This is a virtual IP address on a seprate network for client traffic. -->
            <ip address="20.0.20.200" monitor_link="1"/>
        </resources>

        <!-- There is a qpidd service on each node, 
             it should be restarted if it fails. -->

        <service name="node1-qpidd-service" domain="node1-domain" recovery="restart">
            <script ref="qpidd"/>
        </service>
        <service name="node2-qpidd-service" domain="node2-domain" recovery="restart">
            <script ref="qpidd"/>
        </service>
        <service name="node3-qpidd-service" domain="node3-domain"  recovery="restart">
            <script ref="qpidd"/>
        </service>

        <!-- There should always be a single qpidd-primary service, 
              it can run on any node. -->

        <service name="qpidd-primary-service" autostart="1" exclusive="0" recovery="relocate">

            <script ref="qpidd-primary"/>
            <!-- The primary has the IP addresses for brokers and clients to connect. -->
            <ip ref="20.0.20.200"/>     
        </service>
    </rm>
</cluster>
There is a failoverdomain for each node containing just that one node. This specifies that the qpidd service always runs on all nodes.
The resources section defines the qpidd script used to start the qpidd service. It also defines the qpidd-primary script which does not actually start a new service, rather it promotes the existing qpidd broker to primary status. The qpidd-primary script is installed by the qpid-cpp-server-ha package.
The resources section defines the virtual IP address for client-to-broker communication.
To take advantage of the virtual IP address, qpidd.conf should contain these lines:
ha-cluster = yes
ha-public-url = 20.0.20.200
ha-brokers-url = 20.0.10.1, 20.0.10.2, 20.0.10.3
This configuration specifies the actual network addresses of the three nodes (ha-brokers-url), and the Virtual IP address for the cluster, which clients should connect to: 20.0.10.200.
The service section defines 3 qpidd services, one for each node. Each service is in a restricted fail-over domain containing just that node, and has the restart recovery policy. This means that the rgmanager will run qpidd on each node, restarting if it fails.
There is a single qpidd-primary-service using the qpidd-primary script. It is not restricted to a domain and has the relocate recovery policy. This means rgmanager will start qpidd-primary on one of the nodes when the cluster starts and will relocate it to another node if the original node fails. Running the qpidd-primary script does not start a new broker process, it promotes the existing broker to become the primary.

9.1.13. Shutting Down qpidd on a HA Node

Both the per-node qpidd service and the re-locatable qpidd-primary service are implemented by the same qpidd daemon.
As a result, stopping the qpidd service will not stop a qpidd daemon that is acting as primary, and stopping the qpidd-primary service will not stop a qpidd process that is acting as backup.
To shut down a node that is acting as primary, you must shut down the qpidd service and relocate the primary:
clusvcadm -d somenode-qpidd-service
clusvcadm -r qpidd-primary-service
Doing this will shut down the qpidd daemon on that node. It will also prevent the primary service from relocating back to the node because the qpidd service is no longer running on that location.

9.1.14. Start and Stop HA Cluster

Note that starting the cluster enables cluster service startup on reboot. Stopping the cluster disables cluster service startup on reboot.
Start and Stop the HA Cluster on a node

To start the HA Cluster on a node:

ccs [-h host] --start
To stop the HA Cluster on a node:
ccs [-h host] --stop
Start and Stop the HA Cluster on all nodes

To start the HA Cluster on all configured nodes:

ccs [-h host] --startall
Note that this also enables cluster service startup on reboot.
To stop the HA Cluster on all configured nodes:
ccs [-h host] --stopall

9.1.15. Configure Clustering to use a non-privileged (non-root) user

When qpidd is run as a non-root user, the configuration files need to be stored in a location that is readable and writable by that user. This can be done by modifying the start-up script for qpidd in the following manner:
# diff -u /etc/rc.d/init.d/qpidd /etc/rc.d/init.d/qpidd-mod
--- /etc/rc.d/init.d/qpidd.orig 2014-01-15 19:06:19.000000000 +0100
+++ /etc/rc.d/init.d/qpidd      2014-02-07 16:02:47.136001472 +0100
@@ -38,6 +38,9 @@
 prog=qpidd
 lockfile=/var/lock/subsys/$prog
 pidfile=/var/run/qpidd.pid
+
+CFG_DIR=/var/lib/qpidd
+QPIDD_OPTIONS="--config ${CFG_DIR}/qpidd.conf --client-config ${CFG_DIR}/qpidc.conf"
 
 # Source configuration
 if [ -f /etc/sysconfig/$prog ] ; then
When the patch above is applied to /etc/rc.d/init.d/qpidd, the configuration files for the broker are read from the /var/lib/qpidd directory, rather than from /etc/qpid as they are by default.

9.1.16. Broker Administration Tools and HA

Normally, clients are not allowed to connect to a backup broker. However management tools are allowed to connect to a backup brokers. If you use these tools you must not add or remove messages from replicated queues, nor create or delete replicated queues or exchanges as this will disrupt the replication process and may cause message loss.
qpid-ha allows you to view and change HA configuration settings.
The tools qpid-config, qpid-route and qpid-stat will connect to a backup if you pass the flag --ha-admin on the command line.

9.1.17. Controlling replication of queues and exchanges

By default, queues and exchanges are not replicated automatically. You can change the default behavior by setting the ha-replicate configuration option. It has one of the following values:
all
Replicate everything automatically: queues, exchanges, bindings and messages.
configuration
Replicate the existence of queues, exchange and bindings but don't replicate messages.
none
Don't replicate anything, this is the default.
You can over-ride the default for a particular queue or exchange by passing the argument qpid.replicate when creating the queue or exchange. It takes the same values as ha-replicate.
Bindings are automatically replicated if the queue and exchange being bound both have replication all or configuration, they are not replicated otherwise.
You can create replicated queues and exchanges with the qpid-config management tool like this:
qpid-config add queue myqueue --replicate all
To create replicated queues and exchanges via the client API, add a node entry to the address like this:
"myqueue;{create:always,node:{x-declare:{arguments:{'qpid.replicate':all}}}}"
There are some built-in exchanges created automatically by the broker, these exchanges are never replicated. The built-in exchanges are the default (nameless) exchange, the AMQP standard exchanges (amq.direct, amq.topic, amq.fanout and amq.match) and the management exchanges (qpid.management, qmf.default.direct and qmf.default.topic)
Note that if you bind a replicated queue to one of these exchanges, the binding will not be replicated, so the queue will not have the binding after a fail-over.

9.1.18. Client Connection and Fail-over

Clients can only connect to the primary broker. Backup brokers reject any connection attempt by a client. Clients rejected by a backup broker will automatically fail-over until they connect to the primary. If ha-public-url contains multiple addresses, the client will try them all in rotation. If it is a virtual IP address the client will retry on the same address until reconnected.
Clients are configured with the URL for the cluster (details below for each type of client). There are two possibilities:
  1. The URL contains a single virtual IP address that is assigned to the primary broker by the resource manager. This is the recommended configuration.
  2. The URL contains multiple addresses, one for each broker in the cluster.
In the first case the resource manager assigns the Virtual IP address to the primary broker, so clients only need to retry on a single address. In the second case, clients will repeatedly retry each address in the URL until they successfully connect to the primary.
When the primary broker fails, clients retry all known cluster addresses until they connect to the new primary. The client re-sends any messages that were previously sent but not acknowledged by the broker at the time of the failure. Similarly messages that have been sent by the broker, but not acknowledged by the client, are re-queued.
TCP can be slow to detect connection failures. A client can configure a connection to use a heartbeat to detect connection failure, and can specify a time interval for the heartbeat. If heartbeats are in use, failures will be detected no later than twice the heartbeat interval. The following sections explain how to enable heartbeat in each client.
Note: If you are using a Virtual IP address in your cluster then fail-over occurs transparently on the server, and you treat the cluster as a single broker using the Virtual IP address. The following sections explain how to configure clients with multiple addresses, for use when the cluster does not use a Virtual IP.
Suppose your cluster has 3 nodes: node1, node2 and node3 all using the default AMQP port, and you are not using a virtual IP address. To connect a client you need to specify the address(es) and set the reconnect property to true. The following sub-sections show how to connect each type of client.
C++ clients

With the C++ client, you specify multiple cluster addresses in a single URL. You also need to specify the connection option reconnect to be true. For example:

qpid::messaging::Connection c("node1,node2,node3","{reconnect:true}");
Heartbeats are disabled by default. You can enable them by specifying a heartbeat interval (in seconds) for the connection via the heartbeat option. For example:
qpid::messaging::Connection c("node1,node2,node3","{reconnect:true,heartbeat:10}");
Python clients

With the Python client, you specify reconnect=True and a list of host:port addresses as reconnect_urls when calling Connection.establish or Connection.open:

connection = qpid.messaging.Connection.establish("node1", reconnect=True, reconnect_urls=["node1", "node2", "node3"])
Heartbeats are disabled by default. You can enable them by specifying a heartbeat interval (in seconds) for the connection via the 'heartbeat' option. For example:
connection = qpid.messaging.Connection.establish("node1", reconnect=True, reconnect_urls=["node1", "node2", "node3"], heartbeat=10)
Java JMS Clients

In Java JMS clients, client fail-over is handled automatically if it is enabled in the connection. You can configure a connection to use fail-over using the failover property:

connectionfactory.qpidConnectionfactory = amqp://guest:guest@clientid/test?brokerlist='tcp://localhost:5672'&failover='failover_exchange'
This property can take three values:

Fail-over Modes

failover_exchange
If the connection fails, fail over to any other broker in the cluster.
roundrobin
If the connection fails, fail over to one of the brokers specified in the brokerlist.
singlebroker
Fail-over is not supported; the connection is to a single broker only.
In a Connection URL, heartbeat is set using the idle_timeout property, which is an integer corresponding to the heartbeat period in seconds. For instance, the following line from a JNDI properties file sets the heartbeat time out to 3 seconds:
connectionfactory.qpidConnectionfactory = amqp://guest:guest@clientid/test?brokerlist='tcp://localhost:5672',idle_timeout=3

9.1.19. Security

This section outlines the HA specific aspects of security configuration. Please see for more details on enabling authentication and setting up Access Control Lists.

Note

Unless you disable authentication with auth=no in your configuration, you must set the options below and you must have an ACL file with at least the entry described below.
Backups will be unable to connect to the primary if the security configuration is incorrect. See also Section 9.3.1, “Troubleshooting Cluster configuration”
When authentication is enabled you must set the credentials used by HA brokers with following options:

Table 9.2. HA Security Options

HA Security Options
ha-username USER
User name for HA brokers. Note this must not include the @QPID suffix.
ha-password PASS
Password for HA brokers.
ha-mechanism MECHANISM
Mechanism for HA brokers. Any mechanism you enable for broker-to-broker communication can also be used by a client, so do not use ANONYMOUS in a secure environment.
This identity is used to authorize federation links from backup to primary. It is also used to authorize actions on the backup to replicate primary state, for example creating queues and exchanges.
When authorization is enabled you must have an Access Control List with the following rule to allow HA replication to function. Suppose ha-username=USER
acl allow USER@QPID all all

9.1.20. HA Clustering and Persistence

If you use a persistent store for your messages then each broker in a cluster will have its own store. If the entire cluster fails and is restarted, the first broker that becomes primary will recover from its store. All the other brokers will clear their stores and get an update from the primary to ensure consistency.

9.1.21. Queue Replication and HA

As well as support for an active-passive cluster, the HA module supports individual queue replication, even if the brokers are not in a clustered environment. The original queue is used as normal, however the replica queue is updated automatically as messages are added to or removed from the original queue.
To create a replica queue, the HA module must be loaded on both the original and replica brokers, which is done automatically by default.
For standalone brokers, the ha-queue-replication=yes configuration option must be specified. This option is not required for brokers that are part of a clustered environment, because the option is loaded automatically.

Important

Queue modification must only be received through automatic updates from the original queue.
Manually adding or removing messages on the replica queue will make replication inconsistent, and may cause message loss.
The HA module does not enforce restricted access to the replica queue (as it does in the case of a cluster). The application must ensure the replica is not used until it has been disconnected from the original.

Example 9.1. Replicate a Queue Between Nodes

Suppose that myqueue is a queue on node1.
To create a replica of myqueue on node2, run the following command:
qpid-config --broker=node2 add queue --start-replica node1 myqueue
If myqueue already exists on the replica broker, run the following command to start replication from the original queue:
qpid-ha replicate -b node2 node1 myqueue