Chapter 17. Configuring a high-availability cluster using system roles

With the ha_cluster system role, you can configure and manage a high-availability cluster that uses the Pacemaker high availability cluster resource manager.

Note

The High Availability Cluster (HA Cluster) role is available as a Technology Preview.

The HA system role does not currently support constraints. Running the role after constraints are configured manually will remove the constraints, as well as any configuration not supported by the role.

The HA system role does not currently support SBD.

17.1. ha_cluster system role variables

In an ha_cluster system role playbook, you define the variables for a high availability cluster according to the requirements of your cluster deployment.

The variables you can set for an ha_cluster system role are as follows.

ha_cluster_enable_repos
A boolean flag that enables the repositories containing the packages that are needed by the ha_cluster system role. When this is set to yes, the default value of this variable, you must have active subscription coverage for RHEL and the RHEL High Availability Add-On on the systems that you will use as your cluster members or the system role will fail.
ha_cluster_cluster_present

A boolean flag which, if set to yes, determines that HA cluster will be configured on the hosts according to the variables passed to the role. Any cluster configuration not specified in the role and not supported by the role will be lost.

If ha_cluster_cluster_present is set to no, all HA cluster configuration will be removed from the target hosts.

The default value of this variable is yes.

The following example playbook removes all cluster configuration on node1 and node2

- hosts: node1 node2
  vars:
    ha_cluster_cluster_present: no

  roles:
    - rhel-system-roles.ha_cluster
ha_cluster_start_on_boot
A boolean flag that determines whether cluster services will be configured to start on boot. The default value of this variable is yes.
ha_cluster_fence_agent_packages
List of fence agent packages to install. The default value of this variable is fence-agents-all, fence-virt.
ha_cluster_extra_packages

List of additional packages to be installed. The default value of this variable is no packages.

This variable can be used to install additional packages not installed automatically by the role, for example custom resource agents.

It is possible to specify fence agents as members of this list. However, ha_cluster_fence_agent_packages is the recommended role variable to use for specifying fence agents, so that its default value is overridden.

ha_cluster_hacluster_password
A string value that specifies the password of the hacluster user. The hacluster user has full access to a cluster. It is recommended that you vault encrypt the password, as described in Encrypting content with Ansible Vault. There is no default password value, and this variable must be specified.
ha_cluster_corosync_key_src

The path to Corosync authkey file, which is the authentication and encryption key for Corosync communication. It is highly recommended that you have a unique authkey value for each cluster. The key should be 256 bytes of random data.

If you specify a key for this variable, it is recommended that you vault encrypt the key, as described in Encrypting content with Ansible Vault.

If no key is specified, a key already present on the nodes will be used. If nodes do not have the same key, a key from one node will be distributed to other nodes so that all nodes have the same key. If no node has a key, a new key will be generated and distributed to the nodes.

If this variable is set, ha_cluster_regenerate_keys is ignored for this key.

The default value of this variable is null.

ha_cluster_pacemaker_key_src

The path to the Pacemaker authkey file, which is the authentication and encryption key for Pacemaker communication. It is highly recommended that you have a unique authkey value for each cluster. The key should be 256 bytes of random data.

If you specify a key for this variable, it is recommended that you vault encrypt the key, as described in Encrypting content with Ansible Vault.

If no key is specified, a key already present on the nodes will be used. If nodes do not have the same key, a key from one node will be distributed to other nodes so that all nodes have the same key. If no node has a key, a new key will be generated and distributed to the nodes.

If this variable is set, ha_cluster_regenerate_keys is ignored for this key.

The default value of this variable is null.

ha_cluster_fence_virt_key_src

The path to the fence-virt or fence-xvm pre-shared key file, which is the location of the authentication key for the fence-virt or fence-xvm fence agent.

If you specify a key for this variable, it is recommended that you vault encrypt the key, as described in Encrypting content with Ansible Vault.

If no key is specified, a key already present on the nodes will be used. If nodes do not have the same key, a key from one node will be distributed to other nodes so that all nodes have the same key. If no node has a key, a new key will be generated and distributed to the nodes. If the ha_cluster system role generates a new key in this fashion, you should copy the key to your nodes' hypervisor to ensure that fencing works.

If this variable is set, ha_cluster_regenerate_keys is ignored for this key.

The default value of this variable is null.

ha_cluster_pcsd_public_key_srcr, ha_cluster_pcsd_private_key_src

The path to the pcsd TLS certificate and private key. If this is not specified, a certificate-key pair already present on the nodes will be used. If a certificate-key pair is not present, a random new one will be generated.

If you specify a private key value for this variable, it is recommended that you vault encrypt the key, as described in Encrypting content with Ansible Vault.

If these variables are set, ha_cluster_regenerate_keys is ignored for this certificate-key pair.

The default value of these variables is null.

ha_cluster_regenerate_keys

A boolean flag which, when set to yes, determines that pre-shared keys and TLS certificates will be regenerated. For more information on when keys and certificates will be regenerated, see the descriptions of the ha_cluster_corosync_key_src, ha_cluster_pacemaker_key_src, ha_cluster_fence_virt_key_src, ha_cluster_pcsd_public_key_src, and ha_cluster_pcsd_private_key_src variables.

The default value of this variable is no.

ha_cluster_pcs_permission_list

Configures permissions to manage a cluster using pcsd. The items you configure with this variable are as follows:

  • type - user or group
  • name - user or group name
  • allow_list - Allowed actions for the specified user or group:

    • read - View cluster status and settings
    • write - Modify cluster settings except permissions and ACLs
    • grant - Modify cluster permissions and ACLs
    • full - Unrestricted access to a cluster including adding and removing nodes and access to keys and certificates

The structure of the ha_cluster_pcs_permission_list variable and its default values are as follows:

ha_cluster_pcs_permission_list:
  - type: group
    name: hacluster
    allow_list:
      - grant
      - read
      - write
ha_cluster_cluster_name
The name of the cluster. This is a string value with a default of my-cluster.
ha_cluster_cluster_properties

List of sets of cluster properties for Pacemaker cluster-wide configuration. Only one set of cluster properties is supported.

The structure of a set of cluster properties is as follows:

ha_cluster_cluster_properties:
  - attrs:
      - name: property1_name
        value: property1_value
      - name: property2_name
        value: property2_value

By default, no properties are set.

The following example playbook configures a cluster consisting of node1 and node2 and sets the stonith-enabled and no-quorum-policy cluster properties.

- hosts: node1 node2
  vars:
    ha_cluster_cluster_name: my-new-cluster
    ha_cluster_hacluster_password: password
    ha_cluster_cluster_properties:
      - attrs:
          - name: stonith-enabled
            value: 'true'
          - name: no-quorum-policy
            value: stop

  roles:
    - rhel-system-roles.ha_cluster
ha_cluster_resource_primitives

This variable defines pacemaker resources configured by the system role, including stonith resources, including stonith resources. The items you can configure for each resource are as follows:

  • id (mandatory) - ID of a resource.
  • agent (mandatory) - Name of a resource or stonith agent, for example ocf:pacemaker:Dummy or stonith:fence_xvm. It is mandatory to specify stonith: for stonith agents. For resource agents, it is possible to use a short name, such as Dummy, instead of ocf:pacemaker:Dummy. However, if several agents with the same short name are installed, the role will fail as it will be unable to decide which agent should be used. Therefore, it is recommended that you use full names when specifying a resource agent.
  • instance_attrs (optional) - List of sets of the resource’s instance attributes. Currently, only one set is supported. The exact names and values of attributes, as well as whether they are mandatory or not, depend on the resource or stonith agent.
  • meta_attrs (optional) - List of sets of the resource’s meta attributes. Currently, only one set is supported.
  • operations (optional) - List of the resource’s operations.

    • action (mandatory) - Operation action as defined by pacemaker and the resource or stonith agent.
    • attrs (mandatory) - Operation options, at least one option must be specified.

The structure of the resource definition that you configure with the ha_cluster system role is as follows.

  - id: resource-id
    agent: resource-agent
    instance_attrs:
      - attrs:
          - name: attribute1_name
            value: attribute1_value
          - name: attribute2_name
            value: attribute2_value
    meta_attrs:
      - attrs:
          - name: meta_attribute1_name
            value: meta_attribute1_value
          - name: meta_attribute2_name
            value: meta_attribute2_value
    operations:
      - action: operation1-action
        attrs:
          - name: operation1_attribute1_name
            value: operation1_attribute1_value
          - name: operation1_attribute2_name
            value: operation1_attribute2_value
      - action: operation2-action
        attrs:
          - name: operation2_attribute1_name
            value: operation2_attribute1_value
          - name: operation2_attribute2_name
            value: operation2_attribute2_value

By default, no resources are defined.

For an example ha_cluster system role system role playbook that includes resource configuration, see Configuring a high availability cluster with fencing and resources .

ha_cluster_resource_groups

This variable defines pacemaker resource groups configured by the system role. The items you can configure for each resource group are as follows:

  • id (mandatory) - ID of a group.
  • resources (mandatory) - List of the group’s resources. Each resource is referenced by its ID and the resources must be defined in the ha_cluster_resource_primitives variable. At least one resource must be listed.
  • meta_attrs (optional) - List of sets of the group’s meta attributes. Currently, only one set is supported.

The structure of the resource group definition that you configure with the ha_cluster system role is as follows.

ha_cluster_resource_groups:
  - id: group-id
    resource_ids:
      - resource1-id
      - resource2-id
    meta_attrs:
      - attrs:
          - name: group_meta_attribute1_name
            value: group_meta_attribute1_value
          - name: group_meta_attribute2_name
            value: group_meta_attribute2_value

By default, no resource groups are defined.

For an example ha_cluster system role system role playbook that includes resource group configuration, see Configuring a high availability cluster with fencing and resources .

ha_cluster_resource_clones

This variable defines pacemaker resource clones configured by the system role. The items you can configure for a resource clone are as follows:

  • resource_id (mandatory) - Resource to be cloned. The resource must be defined in the ha_cluster_resource_primitives variable or the ha_cluster_resource_groups variable.
  • promotable (optional) - Indicates whether the resource clone to be created is a promotable clone, indicated as yes or no.
  • id (optional) - Custom ID of the clone. If no ID is specified, it will be generated. A warning will be displayed if this option is not supported by the cluster.
  • meta_attrs (optional) - List of sets of the clone’s meta attributes. Currently, only one set is supported.

The structure of the resource clone definition that you configure with the ha_cluster system role is as follows.

ha_cluster_resource_clones:
  - resource_id: resource-to-be-cloned
    promotable: yes
    id: custom-clone-id
    meta_attrs:
      - attrs:
          - name: clone_meta_attribute1_name
            value: clone_meta_attribute1_value
          - name: clone_meta_attribute2_name
            value: clone_meta_attribute2_value

By default, no resource clones are defined.

For an example ha_cluster system role system role playbook that includes resource clone configuration, see Configuring a high availability cluster with fencing and resources .

17.2. Specifying an inventory for the ha_cluster system role

When configuring an HA cluster using the ha_cluster system role playbook, you configure the names and addresses of the nodes for the cluster in an inventory.

For each node in an inventory, you can optionally specify the following items:

  • node_name - the name of a node in a cluster.
  • pcs_address - an address used by pcs to communicate with the node. It can be a name, FQDN or an IP address and it can include a port number.
  • corosync_addresses - list of addresses used by Corosync. All nodes which form a particular cluster must have the same number of addresses and the order of the addresses matters.

The following example shows an inventory with targets node1 and node2. node1 and node2 must be either fully qualified domain names or must otherwise be able to connect to the nodes as when, for example, the names are resolvable through the /etc/hosts file.

all:
  hosts:
    node1:
      ha_cluster:
        node_name: node-A
        pcs_address: node1-address
        corosync_addresses:
          - 192.168.1.11
          - 192.168.2.11
    node2:
      ha_cluster:
        node_name: node-B
        pcs_address: node2-address:2224
        corosync_addresses:
          - 192.168.1.12
          - 192.168.2.12

17.3. Configuring a high availability cluster running no resources

The following procedure uses the ha_cluster system role, to create a high availability cluster with no fencing configured and which runs no resources.

Prerequisites

  • You have Red Hat Ansible Engine installed on the node from which you want to run the playbook.

    Note

    You do not have to have Ansible installed on the cluster member nodes.

  • You have the rhel-system-roles package installed on the system from which you want to run the playbook.

    For details about RHEL System Roles and how to apply them, see Getting started with RHEL System Roles.

  • The systems running RHEL that you will use as your cluster members must have active subscription coverage for RHEL and the RHEL High Availability Add-On.
Note

The ha_cluster system role replaces any existing cluster configuration on the specified nodes. Any settings not specified in the role will be lost.

Procedure

  1. Create an inventory file specifying the nodes in the cluster, as described in Specifying an inventory for the ha_cluster system role .
  2. Create a playbook file, for example new-cluster.yml.

    The following example playbook file configures a cluster with no fencing configured and which runs no resources. When creating your playbook file for production, it is recommended that you vault encrypt the password, as described in Encrypting content with Ansible Vault.

    - hosts: node1 node2
      vars:
        ha_cluster_cluster_name: my-new-cluster
        ha_cluster_hacluster_password: password
    
      roles:
        - rhel-system-roles.ha_cluster
  3. Save the file.
  4. Run the playbook:

    $ ansible-playbook -i inventory new-cluster.yml

17.4. Configuring a high availability cluster with fencing and resources

The following procedure uses the ha_cluster system role to create a high availability cluster that includes a fencing device, cluster resources, resource groups, and a cloned resource.

Prerequisites

  • You have Red Hat Ansible Engine installed on the node from which you want to run the playbook.

    Note

    You do not have to have Ansible Engine installed on the cluster member nodes.

  • You have the rhel-system-roles package installed on the system from which you want to run the playbook.

    For details about RHEL System Roles and how to apply them, see Getting started with RHEL System Roles.

  • The systems running RHEL that you will use as your cluster members must have active subscription coverage for RHEL and the RHEL High Availability Add-On.
Note

The ha_cluster system role replaces any existing cluster configuration on the specified nodes. Any settings not specified in the role will be lost.

Procedure

  1. Create an inventory file specifying the nodes in the cluster, as described in Specifying an inventory for the ha_cluster system role .
  2. Create a playbook file, for example new-cluster.yml:

    The following example playbook file configures a cluster that includes fencing, several resources, and a resource group. It also includes a resource clone for the resource group. When creating your playbook file for production, it is recommended that you vault encrypt the password, as described in Encrypting content with Ansible Vault.

    - hosts: node1 node2
      vars:
        ha_cluster_cluster_name: my-new-cluster
        ha_cluster_hacluster_password: password
        ha_cluster_resource_primitives:
          - id: xvm-fencing
            agent: 'stonith:fence_xvm'
            instance_attrs:
              - attrs:
                  - name: pcmk_host_list
                    value: node1 node2
          - id: simple-resource
            agent: 'ocf:pacemaker:Dummy'
          - id: resource-with-options
            agent: 'ocf:pacemaker:Dummy'
            instance_attrs:
              - attrs:
                  - name: fake
                    value: fake-value
                  - name: passwd
                    value: passwd-value
            meta_attrs:
              - attrs:
                  - name: target-role
                    value: Started
                  - name: is-managed
                    value: 'true'
            operations:
              - action: start
                attrs:
                  - name: timeout
                    value: '30s'
              - action: monitor
                attrs:
                  - name: timeout
                    value: '5'
                  - name: interval
                    value: '1min'
          - id: dummy-1
            agent: 'ocf:pacemaker:Dummy'
          - id: dummy-2
            agent: 'ocf:pacemaker:Dummy'
          - id: dummy-3
            agent: 'ocf:pacemaker:Dummy'
          - id: simple-clone
            agent: 'ocf:pacemaker:Dummy'
          - id: clone-with-options
            agent: 'ocf:pacemaker:Dummy'
        ha_cluster_resource_groups:
          - id: simple-group
            resource_ids:
              - dummy-1
              - dummy-2
            meta_attrs:
              - attrs:
                  - name: target-role
                    value: Started
                  - name: is-managed
                    value: 'true'
          - id: cloned-group
            resource_ids:
              - dummy-3
        ha_cluster_resource_clones:
          - resource_id: simple-clone
          - resource_id: clone-with-options
            promotable: yes
            id: custom-clone-id
            meta_attrs:
              - attrs:
                  - name: clone-max
                    value: '2'
                  - name: clone-node-max
                    value: '1'
          - resource_id: cloned-group
            promotable: yes
    
      roles:
        - rhel-system-roles.ha_cluster
  3. Save the file.
  4. Run the playbook:

    $ ansible-playbook -i inventory new-cluster.yml

17.5. Configuring an Apache HTTP server in a high availability cluster with the ha_cluster system role

This procedure configures an active/passive Apache HTTP server in a two-node Red Hat Enterprise Linux High Availability Add-On cluster using the ha_cluster system role.

Prerequisites

  • You have Red Hat Ansible Engine installed on the node from which you want to run the playbook.

    Note

    You do not have to have Ansible Engine installed on the cluster member nodes.

  • You have the rhel-system-roles package installed on the system from which you want to run the playbook.

    For details about RHEL System Roles and how to apply them, see Getting started with RHEL System Roles.

  • The systems running RHEL that you will use as your cluster members must have active subscription coverage for RHEL and the RHEL High Availability Add-On.
  • Your system includes a public virtual IP address, required for Apache.
  • Your system includes shared storage for the nodes in the cluster, using iSCSI, Fibre Channel, or other shared network block device.
  • You have configured an LVM logical volume with an ext4 files system, as described in Configuring an LVM volume with an ext4 file system in a Pacemaker cluster.
  • You have configured an Apache HTTP server, as described in Configuring an Apache HTTP Server.
  • Your system includes an APC power switch that will be used to fence the cluster nodes.
Note

The ha_cluster system role replaces any existing cluster configuration on the specified nodes. Any settings not specified in the role will be lost.

Procedure

  1. Create an inventory file specifying the nodes in the cluster, as described in Specifying an inventory for the ha_cluster system role .
  2. Create a playbook file, for example http-cluster.yml:

    The following example playbook file configures a previously-created Apache HTTP server in an active/passive two-node HA cluster

    This example uses an APC power switch with a host name of zapc.example.com. If the cluster does not use any other fence agents, you can optionally list only the fence agents your cluster requires when defining the ha_cluster_fence_agent_packages variable, as in this example.

    When creating your playbook file for production, it is recommended that you vault encrypt the password, as described in Encrypting content with Ansible Vault.

    - hosts: z1.example.com z2.example.com
      roles:
        - rhel-system-roles.ha_cluster
      vars:
        ha_cluster_hacluster_password: password
        ha_cluster_cluster_name: my_cluster
        ha_cluster_fence_agent_packages:
          - fence-agents-apc-snmp
        ha_cluster_resource_primitives:
          - id: myapc
            agent: stonith:fence_apc_snmp
            instance_attrs:
              - attrs:
                  - name: ipaddr
                    value: zapc.example.com
                  - name: pcmk_host_map
                    value: z1.example.com:1;z2.example.com:2
                  - name: login
                    value: apc
                  - name: passwd
                    value: apc
          - id: my_lvm
            agent: ocf:heartbeat:LVM-activate
            instance_attrs:
              - attrs:
                  - name: vgname
                    value: my_vg
                  - name: vg_access_mode
                    value: system_id
          - id: my_fs
            agent: Filesystem
            instance_attrs:
              - attrs:
                  - name: device
                    value: /dev/my_vg/my_lv
                  - name: directory
                    value: /var/www
                  - name: fstype
                    value: ext4
          - id: VirtualIP
            agent: IPaddr2
            instance_attrs:
              - attrs:
                  - name: ip
                    value: 198.51.100.3
                  - name: cidr_netmask
                    value: 24
          - id: Website
            agent: apache
            instance_attrs:
              - attrs:
                  - name: configfile
                    value: /etc/httpd/conf/httpd.conf
                  - name: statusurl
                    value: http://127.0.0.1/server-status
        ha_cluster_resource_groups:
          - id: apachegroup
            resource_ids:
              - my_lvm
              - my_fs
              - VirtualIP
              - Website
  3. Save the file.
  4. Run the playbook:

    $ ansible-playbook -i inventory http-cluster.yml

Verification steps

  1. From one of the nodes in the cluster, check the status of the cluster. Note that all four resources are running on the same node, z1.example.com.

    If you find that the resources you configured are not running, you can run the pcs resource debug-start resource command to test the resource configuration.

    [root@z1 ~]# pcs status
    Cluster name: my_cluster
    Last updated: Wed Jul 31 16:38:51 2013
    Last change: Wed Jul 31 16:42:14 2013 via crm_attribute on z1.example.com
    Stack: corosync
    Current DC: z2.example.com (2) - partition with quorum
    Version: 1.1.10-5.el7-9abe687
    2 Nodes configured
    6 Resources configured
    
    Online: [ z1.example.com z2.example.com ]
    
    Full list of resources:
     myapc  (stonith:fence_apc_snmp):       Started z1.example.com
     Resource Group: apachegroup
         my_lvm     (ocf::heartbeat:LVM):   Started z1.example.com
         my_fs      (ocf::heartbeat:Filesystem):    Started z1.example.com
         VirtualIP  (ocf::heartbeat:IPaddr2):       Started z1.example.com
         Website    (ocf::heartbeat:apache):        Started z1.example.com
  2. Once the cluster is up and running, you can point a browser to the IP address you defined as the IPaddr2 resource to view the sample display, consisting of the simple word "Hello".

    Hello
  3. To test whether the resource group running on z1.example.com fails over to node z2.example.com, put node z1.example.com in standby mode, after which the node will no longer be able to host resources.

    [root@z1 ~]# pcs node standby z1.example.com
  4. After putting node z1 in standby mode, check the cluster status from one of the nodes in the cluster. Note that the resources should now all be running on z2.

    [root@z1 ~]# pcs status
    Cluster name: my_cluster
    Last updated: Wed Jul 31 17:16:17 2013
    Last change: Wed Jul 31 17:18:34 2013 via crm_attribute on z1.example.com
    Stack: corosync
    Current DC: z2.example.com (2) - partition with quorum
    Version: 1.1.10-5.el7-9abe687
    2 Nodes configured
    6 Resources configured
    
    Node z1.example.com (1): standby
    Online: [ z2.example.com ]
    
    Full list of resources:
    
     myapc  (stonith:fence_apc_snmp):       Started z1.example.com
     Resource Group: apachegroup
         my_lvm     (ocf::heartbeat:LVM):   Started z2.example.com
         my_fs      (ocf::heartbeat:Filesystem):    Started z2.example.com
         VirtualIP  (ocf::heartbeat:IPaddr2):       Started z2.example.com
         Website    (ocf::heartbeat:apache):        Started z2.example.com

    The web site at the defined IP address should still display, without interruption.

  5. To remove z1 from standby mode, enter the following command.

    [root@z1 ~]# pcs node unstandby z1.example.com
    Note

    Removing a node from standby mode does not in itself cause the resources to fail back over to that node. This will depend on the resource-stickiness value for the resources. For information on the resource-stickiness meta attribute, see Configuring a resource to prefer its current node.

17.6. Additional resources