Red Hat Training

A Red Hat training course is available for RHEL 8

Chapter 18. Managing cluster nodes

There are a variety of pcs commands you can use to manage cluster nodes, including commands to start and stop cluster services and to add and remove cluster nodes.

18.1. Stopping cluster services

The following command stops cluster services on the specified node or nodes. As with the pcs cluster start, the --all option stops cluster services on all nodes and if you do not specify any nodes, cluster services are stopped on the local node only.

pcs cluster stop [--all | node] [...]

You can force a stop of cluster services on the local node with the following command, which performs a kill -9 command.

pcs cluster kill

18.2. Enabling and disabling cluster services

Use the following command to enables the cluster services, which configures the cluster services to run on startup on the specified node or nodes. Enabling allows nodes to automatically rejoin the cluster after they have been fenced, minimizing the time the cluster is at less than full strength. If the cluster services are not enabled, an administrator can manually investigate what went wrong before starting the cluster services manually, so that, for example, a node with hardware issues in not allowed back into the cluster when it is likely to fail again.

  • If you specify the --all option, the command enables cluster services on all nodes.
  • If you do not specify any nodes, cluster services are enabled on the local node only.
pcs cluster enable [--all | node] [...]

Use the following command to configure the cluster services not to run on startup on the specified node or nodes.

  • If you specify the --all option, the command disables cluster services on all nodes.
  • If you do not specify any nodes, cluster services are disabled on the local node only.
pcs cluster disable [--all | node] [...]

18.3. Adding cluster nodes

Use the following procedure to add a new node to an existing cluster. This procedure adds standard clusters nodes running corosync. For information on integrating non-corosync nodes into a cluster, see Integrating non-corosync nodes into a cluster: the pacemaker_remote service.

Note

It is highly recommended that you add nodes to existing clusters only during a production maintenance window. This allows you to perform appropriate resource and deployment testing for the new node and its fencing configuration.

In this example, the existing cluster nodes are clusternode-01.example.com, clusternode-02.example.com, and clusternode-03.example.com. The new node is newnode.example.com.

Procedure

On the new node to add to the cluster, perform the following tasks.

  1. Install the cluster packages. If the cluster uses SBD, the Booth ticket manager, or a quorum device, you must manually install the respective packages (sbd, booth-site, corosync-qdevice) on the new node as well.

    [root@newnode ~]# yum install -y pcs fence-agents-all

    In addition to the cluster packages, you will also need to install and configure all of the services that you are running in the cluster, which you have installed on the existing cluster nodes. For example, if you are running an Apache HTTP server in a Red Hat high availability cluster, you will need to install the server on the node you are adding, as well as the wget tool that checks the status of the server.

  2. If you are running the firewalld daemon, execute the following commands to enable the ports that are required by the Red Hat High Availability Add-On.

    # firewall-cmd --permanent --add-service=high-availability
    # firewall-cmd --add-service=high-availability
  3. Set a password for the user ID hacluster. It is recommended that you use the same password for each node in the cluster.

    [root@newnode ~]# passwd hacluster
    Changing password for user hacluster.
    New password:
    Retype new password:
    passwd: all authentication tokens updated successfully.
  4. Execute the following commands to start the pcsd service and to enable pcsd at system start.

    # systemctl start pcsd.service
    # systemctl enable pcsd.service

On a node in the existing cluster, perform the following tasks.

  1. Authenticate user hacluster on the new cluster node.

    [root@clusternode-01 ~]# pcs host auth newnode.example.com
    Username: hacluster
    Password:
    newnode.example.com: Authorized
  2. Add the new node to the existing cluster. This command also syncs the cluster configuration file corosync.conf to all nodes in the cluster, including the new node you are adding.

    [root@clusternode-01 ~]# pcs cluster node add newnode.example.com

On the new node to add to the cluster, perform the following tasks.

  1. Start and enable cluster services on the new node.

    [root@newnode ~]# pcs cluster start
    Starting Cluster...
    [root@newnode ~]# pcs cluster enable
  2. Ensure that you configure and test a fencing device for the new cluster node.

18.4. Removing cluster nodes

The following command shuts down the specified node and removes it from the cluster configuration file, corosync.conf, on all of the other nodes in the cluster.

pcs cluster node remove node

18.5. Adding a node to a cluster with multiple links

When adding a node to a cluster with multiple links, you must specify addresses for all links.

The following example adds the node rh80-node3 to a cluster, specifying IP address 192.168.122.203 for the first link and 192.168.123.203 as the second link.

# pcs cluster node add rh80-node3 addr=192.168.122.203 addr=192.168.123.203

18.7. Configuring a large cluster with many resources

If the cluster you are deploying consists of a large number of nodes and many resources, you may need to modify the default values of the following parameters for your cluster.

The cluster-ipc-limit cluster property

The cluster-ipc-limit cluster property is the maximum IPC message backlog before one cluster daemon will disconnect another. When a large number of resources are cleaned up or otherwise modified simultaneously in a large cluster, a large number of CIB updates arrive at once. This could cause slower clients to be evicted if the Pacemaker service does not have time to process all of the configuration updates before the CIB event queue threshold is reached.

The recommended value of cluster-ipc-limit for use in large clusters is the number of resources in the cluster multiplied by the number of nodes. This value can be raised if you see "Evicting client" messages for cluster daemon PIDs in the logs.

You can increase the value of cluster-ipc-limit from its default value of 500 with the pcs property set command. For example, for a ten-node cluster with 200 resources you can set the value of cluster-ipc-limit to 2000 with the following command.

# pcs property set cluster-ipc-limit=2000
The PCMK_ipc_buffer Pacemaker parameter

On very large deployments, internal Pacemaker messages may exceed the size of the message buffer. When this occurs, you will see a message in the system logs of the following format:

Compressed message exceeds X% of configured IPC limit (X bytes); consider setting PCMK_ipc_buffer to X or higher

When you see this message, you can increase the value of PCMK_ipc_buffer in the /etc/sysconfig/pacemaker configuration file on each node. For example, to increase the value of PCMK_ipc_buffer from its default value to 13396332 bytes, change the uncommented PCMK_ipc_buffer field in the /etc/sysconfig/pacemaker file on each node in the cluster as follows.

PCMK_ipc_buffer=13396332

To apply this change, run the following comand.

# systemctl restart pacemaker