Chapter 2. Cluster Management with CMAN

Cluster management manages cluster quorum and cluster membership. CMAN (an abbreviation for cluster manager) performs cluster management in the High Availability Add-On for Red Hat Enterprise Linux. CMAN is a distributed cluster manager and runs in each cluster node; cluster management is distributed across all nodes in the cluster.
CMAN keeps track of membership by monitoring messages from other cluster nodes. When cluster membership changes, the cluster manager notifies the other infrastructure components, which then take appropriate action. If a cluster node does not transmit a message within a prescribed amount of time, the cluster manager removes the node from the cluster and communicates to other cluster infrastructure components that the node is not a member. Other cluster infrastructure components determine what actions to take upon notification that node is no longer a cluster member. For example, Fencing would fence the node that is no longer a member.
CMAN keeps track of cluster quorum by monitoring the count of cluster nodes. If more than half the nodes are active, the cluster has quorum. If half the nodes (or fewer) are active, the cluster does not have quorum, and all cluster activity is stopped. Cluster quorum prevents the occurrence of a "split-brain" condition — a condition where two instances of the same cluster are running. A split-brain condition would allow each cluster instance to access cluster resources without knowledge of the other cluster instance, resulting in corrupted cluster integrity.

2.1. Cluster Quorum

Quorum is a voting algorithm used by CMAN.
A cluster can only function correctly if there is general agreement between the members regarding their status. We say a cluster has quorum if a majority of nodes are alive, communicating, and agree on the active cluster members. For example, in a thirteen-node cluster, quorum is only reached if seven or more nodes are communicating. If the seventh node dies, the cluster loses quorum and can no longer function.
A cluster must maintain quorum to prevent split-brain issues. If quorum was not enforced, quorum, a communication error on that same thirteen-node cluster may cause a situation where six nodes are operating on the shared storage, while another six nodes are also operating on it, independently. Because of the communication error, the two partial-clusters would overwrite areas of the disk and corrupt the file system. With quorum rules enforced, only one of the partial clusters can use the shared storage, thus protecting data integrity.
Quorum doesn't prevent split-brain situations, but it does decide who is dominant and allowed to function in the cluster. Should split-brain occur, quorum prevents more than one cluster group from doing anything.
Quorum is determined by communication of messages among cluster nodes via Ethernet. Optionally, quorum can be determined by a combination of communicating messages via Ethernet and through a quorum disk. For quorum via Ethernet, quorum consists of a simple majority (50% of the nodes + 1 extra). When configuring a quorum disk, quorum consists of user-specified conditions.

Note

By default, each node has one quorum vote. Optionally, you can configure each node to have more than one vote.

2.1.1. Quorum Disks

A quorum disk or partition is a section of a disk that's set up for use with components of the cluster project. It has a couple of purposes. Again, I'll explain with an example.
Suppose you have nodes A and B, and node A fails to get several of cluster manager's "heartbeat" packets from node B. Node A doesn't know why it hasn't received the packets, but there are several possibilities: either node B has failed, the network switch or hub has failed, node A's network adapter has failed, or maybe just because node B was just too busy to send the packet. That can happen if your cluster is extremely large, your systems are extremely busy or your network is flakey.
Node A doesn't know which is the case, and it doesn't know whether the problem lies within itself or with node B. This is especially problematic in a two-node cluster because both nodes, out of touch with one another, can try to fence the other.
So before fencing a node, it would be nice to have another way to check if the other node is really alive, even though we can't seem to contact it. A quorum disk gives you the ability to do just that. Before fencing a node that's out of touch, the cluster software can check whether the node is still alive based on whether it has written data to the quorum partition.
In the case of two-node systems, the quorum disk also acts as a tie-breaker. If a node has access to the quorum disk and the network, that counts as two votes.
A node that has lost contact with the network or the quorum disk has lost a vote, and therefore may safely be fenced.
Further information about configuring quorum disk parameters is provided in the chapters on Conga and ccs administration in the Cluster Administration manual.

2.1.2. Tie-breakers

Tie-breakers are additional heuristics that allow a cluster partition to decide whether or not it is quorate in the event of an even-split - prior to fencing. A typical tie-breaker construct is an IP tie-breaker, sometimes called a ping node.
With such a tie-breaker, nodes not only monitor each other, but also an upstream router that is on the same path as cluster communications. If the two nodes lose contact with each other, the one that wins is the one that can still ping the upstream router. Of course, there are cases, such as a switch-loop, where it is possible for two nodes to see the upstream router - but not each other - causing what is called a split brain. That is why, even when using tie-breakers, it is important to ensure that fencing is configured correctly.
Other types of tie-breakers include where a shared partition, often called a quorum disk, provides additional details. clumanager 1.2.x (Red Hat Cluster Suite 3) had a disk tie-breaker that allowed operation if the network went down as long as both nodes were still communicating over the shared partition.
More complex tie-breaker schemes exist, such as QDisk (part of linux-cluster). QDisk allows arbitrary heuristics to be specified. These allow each node to determine its own fitness for participation in the cluster. It is often used as a simple IP tie-breaker, however. See the qdisk(5) manual page for more information.
CMAN has no internal tie-breakers for various reasons. However, tie-breakers can be implemented using the API. This API allows quorum device registration and updating. For an example, look at the QDisk source code.
You might need a tie-breaker if you:
  • Have a two node configuration with the fence devices on a different network path than the path used for cluster communication
  • Have a two node configuration where fencing is at the fabric level - especially for SCSI reservations
However, if you have a correct network and fencing configuration in your cluster, a tie-breaker only adds complexity, except in pathological cases.