High Availability Add-On Overview
Overview of the components of the High Availability Add-On
Chapter 1. High Availability Add-On Overview
1.1. Cluster Basics
- High availability
- Load balancing
- High performance
1.2. High Availability Add-On Introduction
- Cluster infrastructure — Provides fundamental functions for nodes to work together as a cluster: configuration file management, membership management, lock management, and fencing.
- High availability Service Management — Provides failover of services from one cluster node to another in case a node becomes inoperative.
- Cluster administration tools — Configuration and management tools for setting up, configuring, and managing a the High Availability Add-On. The tools are for use with the Cluster Infrastructure components, the high availability and Service Management components, and storage.
- Red Hat GFS2 (Global File System 2) — Part of the Resilient Storage Add-On, this provides a cluster file system for use with the High Availability Add-On. GFS2 allows multiple nodes to share storage at a block level as if the storage were connected locally to each cluster node. GFS2 cluster file system requires a cluster infrastructure.
- Cluster Logical Volume Manager (CLVM) — Part of the Resilient Storage Add-On, this provides volume management of cluster storage. CLVM support also requires cluster infrastructure.
- Load Balancer Add-On — Routing software that provides high availability load balancing and failover in layer 4 (TCP) and layer 7 (HTTP, HTTPS) services. The Load Balancer Add-On runs in a cluster of redundant virtual routers that uses load algorithms to distribute client requests to real servers, collectively acting as a virtual server. It is not necessary to use the Load Balancer Add-On in conjunction with Pacemaker.
1.3. Pacemaker Overview
- Cluster management
- Lock management
- Cluster configuration management
1.4. Pacemaker Architecture Components
- Cluster Information Base (CIB)
- The Pacemaker information daemon, which uses XML internally to distribute and synchronize current configuration and status information from the Designated Coordinator (DC) — a node assigned by Pacemaker to store and distribute cluster state and actions by means of the CIB — to all other cluster nodes.
- Cluster Resource Management Daemon (CRMd)
- Pacemaker cluster resource actions are routed through this daemon. Resources managed by CRMd can be queried by client systems, moved, instantiated, and changed when needed.Each cluster node also includes a local resource manager daemon (LRMd) that acts as an interface between CRMd and resources. LRMd passes commands from CRMd to agents, such as starting and stopping and relaying status information.
- Shoot the Other Node in the Head (STONITH)
- Often deployed in conjunction with a power switch, STONITH acts as a cluster resource in Pacemaker that processes fence requests, forcefully powering down nodes and removing them from the cluster to ensure data integrity. STONITH is configured in CIB and can be monitored as a normal cluster resource.
corosyncis the component - and a daemon of the same name - that serves the core membership and member-communication needs for high availability clusters. It is required for the High Availability Add-On to function.In addition to those membership and messaging functions,
- Manages quorum rules and determination.
- Provides messaging capabilities for applications that coordinate or operate across multiple members of the cluster and thus must communicate stateful or other information between instances.
1.5. Pacemaker Configuration and Management Tools
pcscan control all aspects of Pacemaker and the Corosync heartbeat daemon. A command-line based program,
pcscan perform the following cluster management tasks:
- Create and configure a Pacemaker/Corosync cluster
- Modify configuration of the cluster while it is running
- Remotely configure both Pacemaker and Corosync remotely as well as start, stop, and display status information of the cluster
- A graphical user interface to create and configure Pacemaker/Corosync clusters, with the same features and abilities as the command-line based
Chapter 2. Cluster Operation
2.1. Quorum Overview
votequorum, which allows administrators to configure a cluster with a number of votes assigned to each system in the cluster and ensuring that only when a majority of the votes are present, cluster operations are allowed to proceed.
votequorumcan be configured to have a tiebreaker policy, which administrators can configure to continue quorum using the remaining cluster nodes that are still in contact with the available cluster node that has the lowest node ID.
2.2. Fencing Overview
- Uninterruptible Power Supply (UPS) — a device containing a battery that can be used to fence devices in event of a power failure
- Power Distribution Unit (PDU) — a device with multiple power outlets used in data centers for clean power distribution as well as fencing and power isolation services
- Blade power control devices — dedicated systems installed in a data center configured to fence cluster nodes in the event of failure
- Lights-out devices — Network-connected devices that manage cluster node availability and can perform fencing, power on/off, and other services by administrators locally or remotely
Chapter 3. Red Hat High Availability Add-On Resources
3.1. Red Hat High Availability Add-On Resource Overview
3.2. Red Hat High Availability Add-On Resource Classes
- LSB — The Linux Standards Base agent abstracts the compliant services supported by the LSB, namely those services in
/etc/init.dand the associated return codes for successful and failed service states (started, stopped, running status).
- OCF — The Open Cluster Framework is superset of the LSB (Linux Standards Base) that sets standards for the creation and execution of server initialization scripts, input parameters for the scripts using environment variables, and more.
- systemd — The newest system services manager for Linux based systems, systemd uses sets of unit files rather than initialization scripts as does LSB and OCF. These units can be manually created by administrators or can even be created and managed by services themselves. Pacemaker manages these units in a similar way that it manages OCF or LSB init scripts.
- Upstart — Much like systemd, Upstart is an alternative system initialization manager for Linux. Upstart uses jobs, as opposed to units in systemd or init scripts.
- STONITH — A resource agent exclusively for fencing services and fence agents using STONITH.
- Nagios — Agents that abstract plug-ins for the Nagios system and infrastructure monitoring tool.
3.3. Monitoring Resources
3.4. Resource Constraints
- location constraints — A location constraint determines which nodes a resource can run on.
- order constraints — An order constraint determines the order in which the resources run.
- colocation constraints — A colocation constraint determines where resources will be placed relative to other resources.
3.5. Resource Groups
pcs resourcecommand, specifying the resources to include in the group. If the group does not exist, this command creates the group. If the group exists, this command adds additional resources to the group. The resources will start in the order you specify them with this command, and will stop in the reverse order of their starting order.
Appendix A. Upgrading From Red Hat Enterprise Linux 6 High Availability Add-On
A.1. Overview of Differences Between Releases
- Configuration Files — Previously, cluster configuration was found in the
/etc/cluster/cluster.conffile, while cluster configuration in release 7 is in
/etc/corosync/corosync.conffor membership and quorum configuration and
/var/lib/pacemaker/cib/cib.xmlfor cluster node and resource configuration.
- Executable Files — Previously, cluster commands were in
ccsby means of a command line,
lucifor graphical configuration. In Red Hat Enterprise Linux 7 High Availability Add-On, configuration is done by means of
pcsat the command line and the
pcsdWeb UI configuration at the desktop.
- Starting the Service — Previously, all services including those in High Availability Add-On were performed using the
servicecommand to start services and the
chkconfigcommand to configure services to start upon system boot. This had to be configured separately for all cluster services (
ricci. For example:
service rgmanager start chkconfig rgmanager onFor Red Hat Enterprise Linux 7 High Availability Add-On, the
systemctlcontrols both manual startup and automated boot-time startup, and all cluster services are grouped in the
pcsd.service. For example:
systemctl start pcsd.service systemctl enable pcsd.service pcs cluster start -all
- User Access — Previously, the root user or a user with proper permissions can access the
luciconfiguration interface. All access requires the
riccipassword for the node.In Red Hat Enterprise Linux 7 High Availability Add-On, the
pcsdWeb UI requires that you authenticate as user
hacluster, which is the common system user. The
rootuser can set the password for
- Creating Clusters, Nodes and Resources — Previously, creation of nodes were performed with the
ccsby means of a command line or with
lucigraphical interface. Creation of a cluster and adding nodes is a separate process. For example, to create a cluster and add a node by means of the command line, perform the following:
ccs -h node1.example.com --createcluster examplecluster ccs -h node1.example.com --addnode node2.example.comIn Red Hat Enterprise Linux 7 High Availability Add-On, adding of clusters, nodes, and resources are done by means of
pcsat the command line, or the
pcsdWeb UI. For example, to create a cluster by means of the command line, perform the following:
pcs cluster setup examplecluster node1 node2 ...
- Cluster removal — Previously, administrators removed a cluster by deleting nodes manually from the
luciinterface or deleting the
cluster.conffile from each nodeIn Red Hat Enterprise Linux 7 High Availability Add-On, administrators can remove a cluster by issuing the
pcs cluster destroycommand.
Appendix B. Revision History
|Revision 7.1-1||Wed Aug 7 2019|
|Revision 6.1-2||Thu Oct 4 2018|
|Revision 5.1-2||Wed Mar 14 2018|
|Revision 5.1-1||Wed Dec 13 2017|
|Revision 4.1-3||Tue Aug 1 2017|
|Revision 4.1-1||Wed May 10 2017|
|Revision 3.1-2||Mon Oct 17 2016|
|Revision 3.1-1||Wed Aug 17 2016|
|Revision 2.1-5||Mon Nov 9 2015|
|Revision 2.1-1||Tue Aug 18 2015|
|Revision 1.1-3||Tue Feb 17 2015|
|Revision 1.1-1||Thu Dec 04 2014|
|Revision 0.1-9||Tue Jun 03 2014|
|Revision 0.1-4||Wed Nov 27 2013|
|Revision 0.1-1||Wed Jan 16 2013|
- fencing, Fencing Overview
- High Availability Add-On
- difference between Release 6 and 7, Overview of Differences Between Releases
- quorum, Quorum Overview