Chapter 13. Replacing hosts

13.1. Replacing the primary hyperconverged host using ansible

Follow this section to replace the hyperconverged host that you used to perform all deployment operations.

Important

When self-signed encryption is enabled, replacing a node is a disruptive process that requires virtual machines and the Hosted Engine to be shut down.

  1. (Optional) If encryption using a Certificate Authority is enabled, follow the steps under Expanding Volumes in the Network Encryption chapter of the Red Hat Gluster Storage 3.4 Administration Guide.
  2. Move the server to be replaced into Maintenance mode.

    1. In the Administration Portal, click ComputeHosts and select the host to replace.
    2. Click ManagementMaintenance and click OK to move the host to Maintenance mode.
  3. Install the replacement host

    Follow the instructions in Deploying Red Hat Hyperconverged Infrastructure for Virtualization for Virtualization to install the physical machine and configure storage on the new host.

  4. Configure the replacement host

    Follow the instructions in Section 13.3, “Preparing a replacement hyperconverged host using ansible”.

  5. (Optional) If encryption with self-signed certificates is enabled:

    1. Generate the private key and self-signed certificate on the replacement host. See the Red Hat Gluster Storage Administration Guide for details: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html/administration_guide/chap-network_encryption#chap-Network_Encryption-Prereqs.
    2. On a healthy host, create a copy of the /etc/ssl/glusterfs.ca file.

      # cp /etc/ssl/glusterfs.ca /etc/ssl/glusterfs.ca.bk
    3. Append the new host’s certificate to the content of the original /etc/ssl/glusterfs.ca file.
    4. Distribute the /etc/ssl/glusterfs.ca file to all hosts in the cluster, including the new host.
    5. Run the following command on the replacement host to enable management encryption:

      # touch /var/lib/glusterd/secure-access
    6. Include the new host in the value of the auth.ssl-allow volume option by running the following command for each volume.

      # gluster volume set <volname> auth.ssl-allow "<old_host1>,<old_host2>,<new_host>"
    7. Restart the glusterd service on all hosts.

      # systemctl restart glusterd
    8. Follow the steps in Section 4.1, “Configuring TLS/SSL using self-signed certificates” to remount all gluster processes.
  6. Add the replacement host to the cluster.

    Run the following command from any host already in the cluster.

    # gluster peer probe <new_host>
  7. Move the Hosted Engine into Maintenance mode:

    # hosted-engine --set-maintenance --mode=global
  8. Stop the ovirt-engine service.

    # systemctl stop ovirt-engine
  9. Update the database.

    # hosted-engine --set-shared-config storage <new_host_IP>:/engine
  10. Start the ovirt-engine service.

    # systemctl start ovirt-engine
  11. Stop all virtual machines except the Hosted Engine.
  12. Move all storage domains except the Hosted Engine domain into Maintenance mode.
  13. Stop the Hosted Engine virtual machine.

    Run the following command on the existing server that hosts the Hosted Engine.

    # hosted-engine --vm-shutdown
  14. Stop high availability services on all hosts.

    # systemctl stop ovirt-ha-agent
    # systemctl stop ovirt-ha-broker
  15. Disconnect Hosted Engine storage from the hyperconverged host.

    Run the following command on the existing server that hosts the Hosted Engine.

    # hosted-engine --disconnect-storage
  16. Update the Hosted Engine configuration file.

    Edit the storage parameter in the /etc/ovirt-hosted-engine/hosted-engine.conf file to use the replacement host.

    storage=<new_server_IP>:/engine
    Note

    To configure the Hosted Engine for new hosts, use the command:

    # hosted-engine --set-shared-config storage <new_server_IP>:/engine
  17. Restart high availability services on all hosts.

    # systemctl restart ovirt-ha-agent
    # systemctl restart ovirt-ha-broker
  18. Reboot the existing and replacement hosts.

    Wait until all hosts are available before continuing.

  19. Take the Hosted Engine out of Maintenance mode.

    # hosted-engine --set-maintenance --mode=none
  20. Verify that the replacement host is used.

    On all hyperconverged hosts, verify that the engine volume is mounted from the replacement host by checking the IP address in the output of the mount command.

  21. Activate storage domains.

    Verify that storage domains mount using the IP address of the replacement host.

  22. Using the RHV Management UI, add the replacement host.

    Specify that the replacement host be used to host the Hosted Engine.

  23. Move the replacement host into Maintenance mode.

    # hosted-engine --set-maintenance --mode=global
  24. Reboot the replacement host.

    Wait until the host is back online before continuing.

  25. Activate the replacement host from the RHV Management UI.

    Ensure that all volumes are mounted using the IP address of the replacement host.

  26. Replace engine volume brick.

    Replace the brick on the old host that belongs to the engine volume with a new brick on the replacement host.

    1. Click StorageVolumes and select the volume.
    2. Click the Bricks subtab.
    3. Select the brick to replace, and then click Replace brick.
    4. Select the host that hosts the brick being replaced.
    5. In the Replace brick window, provide the path to the new brick.
  27. Remove the old host.

    1. Click ComputeHosts and select the old host.
    2. Click ManagementMaintenance to move the host to maintenance mode.
    3. Click Remove. The Remove Host(s) confirmation dialog appears.
    4. If there are still volume bricks on this host, or the host is non-responsive, check the Force Remove checkbox.
    5. Click OK.
    6. Detach the old host from the cluster.

      # gluster peer detach <old_host_IP> force
  28. On the replacement host, run the following command to remove metadata from the previous host.

    # hosted-engine --clean-metadata --host-id=<old_host_id> --force-clean

13.2. Replacing other hyperconverged hosts using ansible

There are two options for replacing a hyperconverged host that is not the first host:

  1. Replace the host with a new host that has a different fully-qualified domain name by following the instructions in Section 13.2.1, “Replacing a hyperconverged host to use a different FQDN”.
  2. Replace the host with a new host that has the same fully-qualified domain name by following the instructions in Section 13.2.2, “Replacing a hyperconverged host to use the same FQDN”.

Follow the instructions in whichever section is appropriate for your deployment.

13.2.1. Replacing a hyperconverged host to use a different FQDN

Important

When self-signed encryption is enabled, replacing a node is a disruptive process that requires virtual machines and the Hosted Engine to be shut down.

  1. Install the replacement host

    Follow the instructions in Deploying Red Hat Hyperconverged Infrastructure for Virtualization for Virtualization to install the physical machine.

  2. Stop any existing geo-replication sessions

    # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop

    For further information, see the Red Hat Gluster Storage Administration Guide: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html/administration_guide/sect-starting_geo-replication#Stopping_a_Geo-replication_Session.

  3. Move the host to be replaced into Maintenance mode

    Perform the following steps in the Administration Portal:

    1. Click ComputeHosts and select the hyperconverged host in the results list.
    2. Click ManagementMaintenance and click OK to move the host to Maintenance mode.
  4. Prepare the replacement host

    1. Configure key-based SSH authentication without a password

      Configure key-based SSH authentication without a password from a physical machine still in the cluster to the replacement host. For details, see https://access.redhat.com/documentation/en-us/red_hat_hyperconverged_infrastructure_for_virtualization/1.6/html/deploying_red_hat_hyperconverged_infrastructure_for_virtualization/task-configure-key-based-ssh-auth.

    2. Prepare the replacement host

      Follow the instructions in Section 13.3, “Preparing a replacement hyperconverged host using ansible”.

  5. Create replacement brick directories

    Ensure the new directories are owned by the vdsm user and the kvm group.

    # mkdir /gluster_bricks/engine/engine
    # chmod vdsm:kvm /gluster_bricks/engine/engine
    # mkdir /gluster_bricks/data/data
    # chmod vdsm:kvm /gluster_bricks/data/data
    # mkdir /gluster_bricks/vmstore/vmstore
    # chmod vdsm:kvm /gluster_bricks/vmstore/vmstore
  6. (Optional) If encryption is enabled

    1. Generate the private key and self-signed certificate on the new server using the steps in the Red Hat Gluster Storage Administration Guide: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html/administration_guide/chap-network_encryption#chap-Network_Encryption-Prereqs.

      If encryption using a Certificate Authority is enabled, follow the steps under Expanding Volumes in the Network Encryption chapter of the Red Hat Gluster Storage 3.4 Administration Guide.

    2. Add the new host’s certificate to existing certificates.

      1. On a healthy host, make a backup copy of the /etc/ssl/glusterfs.ca file.
      2. Add the new host’s certificate to the /etc/ssl/glusterfs.ca file on the healthy host.
      3. Distribute the updated /etc/ssl/glusterfs.ca file to all other hosts, including the new host.
    3. Enable management encryption

      Run the following command on the new host to enable management encryption:

      # touch /var/lib/glusterd/secure-access
    4. Include the new host in the value of the auth.ssl-allow volume option by running the following command for each volume.

      # gluster volume set <volname> auth.ssl-allow "<old_host1>,<old_host2>,<new_host>"
    5. Restart the glusterd service on all hosts

      # systemctl restart glusterd
    6. If encryption uses self-signed certificates, follow the steps in Section 4.1, “Configuring TLS/SSL using self-signed certificates” to remount all gluster processes.
  7. Add the new host to the existing cluster

    1. Run the following command from one of the healthy hosts:

      # gluster peer probe <new_host>
    2. Add the new host to the existing cluster

      1. Click ComputeHosts and then click New to open the New Host dialog.
      2. Provide a Name, Address, and Password for the new host.
      3. Uncheck the Automatically configure host firewall checkbox, as firewall rules are already configured by gdeploy.
      4. In the Hosted Engine tab of the New Host dialog, set the value of Choose hosted engine deployment action to Deploy.
      5. Click OK.
      6. When the host is available, click the name of the new host.
      7. Click the Network Interfaces subtab and then click Setup Host Networks. The Setup Host Networks dialog appears.
      8. Drag and drop the network you created for gluster to the IP associated with this host, and click OK.

        See the Red Hat Virtualization 4.3 Self-Hosted Engine Guide for further details: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.3/html/self-hosted_engine_guide/chap-installing_additional_hosts_to_a_self-hosted_environment.

  8. Configure and mount shared storage on the new host

    # cp /etc/fstab /etc/fstab.bk
    # echo "<new_host>:/gluster_shared_storage /var/run/gluster/shared_storage/ glusterfs defaults 0 0" >> /etc/fstab
    # mount /gluster_shared_storage
  9. Replace the old brick with the brick on the new host

    1. In the Administration Portal, click StorageVolumes and select the volume.
    2. Click the Bricks subtab.
    3. Select the brick that you want to replace and click Replace Brick. The Replace Brick dialog appears.
    4. Specify the Host and the Brick Directory of the new brick.
    5. Verify that brick heal completes successfully.
  10. Click ComputeHosts.
  11. Select the old host and click Remove.

    Use gluster peer status to verify that that the old host is no longer part of the cluster. If the old host is still present in the status output, run the following command to forcibly remove it:

    # gluster peer detach <old_host> force
  12. Clean old host metadata.

    # hosted-engine --clean-metadata --host-id=<old_host_id> --force-clean
  13. Set up new SSH keys for geo-replication of new brick.

    # gluster system:: execute gsec_create
  14. Recreate geo-replication session and distribute new SSH keys.

    # gluster volume geo-replication <MASTER_VOL> <SLAVE_HOST>::<SLAVE_VOL> create push-pem force
  15. Start the geo-replication session.

    # gluster volume geo-replication <MASTER_VOL> <SLAVE_HOST>::<SLAVE_VOL> start

13.2.2. Replacing a hyperconverged host to use the same FQDN

Important

When self-signed encryption is enabled, replacing a node is a disruptive process that requires virtual machines and the Hosted Engine to be shut down.

  1. (Optional) If encryption using a Certificate Authority is enabled, follow the steps under Expanding Volumes in the Network Encryption chapter of the Red Hat Gluster Storage 3.4 Administration Guide.
  2. Move the host to be replaced into Maintenance mode

    1. In the Administration Portal, click ComputeHosts and select the hyperconverged host.
    2. Click ManagementMaintenance.
    3. Click OK to move the host to Maintenance mode.
  3. Install the replacement host

    Follow the instructions in Deploying Red Hat Hyperconverged Infrastructure for Virtualization for Virtualization to install the physical machine and configure storage on the new host.

  4. Configure the replacement host

    Follow the instructions in Section 13.3, “Preparing a replacement hyperconverged host using ansible”.

  5. (Optional) If encryption with self-signed certificates is enabled

    1. Generate the private key and self-signed certificate on the replacement host. See the Red Hat Gluster Storage Administration Guide for details: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html/administration_guide/chap-network_encryption#chap-Network_Encryption-Prereqs.
    2. On a healthy host, make a backup copy of the /etc/ssl/glusterfs.ca file:

      # cp /etc/ssl/glusterfs.ca /etc/ssl/glusterfs.ca.bk
    3. Append the new host’s certificate to the content of the /etc/ssl/glusterfs.ca file.
    4. Distribute the /etc/ssl/glusterfs.ca file to all hosts in the cluster, including the new host.
    5. Run the following command on the replacement host to enable management encryption:

      # touch /var/lib/glusterd/secure-access
  6. Replace the host machine

    Follow the instructions in the Red Hat Gluster Storage Administration Guide to replace the host: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html/administration_guide/sect-replacing_hosts#Replacing_a_Host_Machine_with_the_Same_Hostname.

  7. Restart the glusterd service on all hosts

    # systemctl restart glusterd
  8. Verify that all hosts reconnect

    # gluster peer status
  9. (Optional) If encryption uses self-signed certificates, follow the steps in Section 4.1, “Configuring TLS/SSL using self-signed certificates” to remount all gluster processes.
  10. Verify that all hosts reconnect and that brick heal completes successfully

    # gluster peer status
  11. Refresh fingerprint

    1. In the Administration Portal, click ComputeHosts and select the new host.
    2. Click Edit.
    3. Click Advanced Parameters on the General tab.
    4. Click fetch to fetch the fingerprint from the host.
    5. Click OK.
  12. Click InstallationReinstall and provide the root password when prompted.
  13. On the Hosted Engine tab set the value of Choose hosted engine deployment action to Deploy.
  14. Attach the gluster network to the host

    1. Click ComputeHosts and click the name of the host.
    2. Click the Network Interfaces subtab and then click Setup Host Networks.
    3. Drag and drop the newly created network to the correct interface.
    4. Ensure that the Verify connectivity between Host and Engine checkbox is checked.
    5. Ensure that the Save network configuration checkbox is checked.
    6. Click OK to save.
  15. Verify the health of the network

    Click the Network Interfaces tab and check the state of the host’s network. If the network interface enters an "Out of sync" state or does not have an IP Address, click ManagementRefresh Capabilities.

13.3. Preparing a replacement hyperconverged host using ansible

Follow this process to replace a hyperconverged host in the cluster.

Prerequisites

  • Ensure that the host you intend to replace is not associated with the FQDN that you want to use for the new host.
  • Ensure that the new host is associated with the FQDN you want it to use.

Procedure

  1. Create node_prep_inventory.yml inventory file

    Create an inventory file called node_prep_inventory.yml, based on the following example.

    Replace host1 with the FQDN that you want to use for the new host, and device details with details appropriate for your host.

    Example node_prep_inventory.yml file

    hc_nodes:
      hosts:
        # New host
        newhost.example.com:
    
          # Dedupe & Compression config
          # If logicalsize >= 1000G then slabsize=32G else slabsize=2G
          #gluster_infra_vdo:
          #   - { name: 'vdo_sdc', device: '/dev/sdc', logicalsize: '3000G', emulate512: 'on', slabsize: '32G',
          #      blockmapcachesize:  '128M', readcachesize: '20M', readcache: 'enabled', writepolicy: 'auto' }
    
          # With Dedupe & Compression
          #gluster_infra_volume_groups:
          #  - vgname: gluster_vg_sdc
          #    pvname: /dev/mapper/vdo_sdc
    
          # Without Dedupe & Compression
          gluster_infra_volume_groups:
            - vgname: gluster_vg_sdc
              pvname: /dev/sdc
    
          gluster_infra_mount_devices:
            - path: /gluster_bricks/engine
              lvname: gluster_lv_engine
              vgname: gluster_vg_sdc
            - path: /gluster_bricks/data
              lvname: gluster_lv_data
              vgname: gluster_vg_sdc
            - path: /gluster_bricks/vmstore
              lvname: gluster_lv_vmstore
              vgname: gluster_vg_sdc
    
          gluster_infra_thinpools:
            - {vgname: 'gluster_vg_sdc', thinpoolname: 'thinpool_gluster_vg_sdc', thinpoolsize: '500G', poolmetadatasize: '4G'}
    
          # This is optional
          gluster_infra_cache_vars:
            - vgname: gluster_vg_sdc
              cachedisk: /dev/sde
              cachelvname: cachelv_thinpool_vg_sdc
              cachethinpoolname: thinpool_gluster_vg_sdc # cachethinpoolname is equal to the already created thinpool which you want to attach
              cachelvsize: '10G'
              cachemetalvsize: '2G'
              cachemetalvname: cache_thinpool_vg_sdc
              cachemode: writethrough
    
          gluster_infra_thick_lvs:
            - vgname: gluster_vg_sdc
              lvname: gluster_lv_engine
              size: 100G
    
          gluster_infra_lv_logicalvols:
            - vgname: gluster_vg_sdc
              thinpool: thinpool_gluster_vg_sdc
              lvname: gluster_lv_data
              lvsize: 500G
            - vgname: gluster_vg_sdc
              thinpool: thinpool_gluster_vg_sdc
              lvname: gluster_lv_vmstore
              lvsize: 500G
    
          # Mount the devices
          gluster_infra_mount_devices:
             - { path: '/gluster_bricks/data', vgname: gluster_vg_sdc, lvname: gluster_lv_data }
             - { path: '/gluster_bricks/vmstore', vgname: gluster_vg_sdc, lvname: gluster_lv_vmstore }
             - { path: '/gluster_bricks/engine', vgname: gluster_vg_sdc, lvname: gluster_lv_engine }
    
      # Common configurations
      vars:
        # Firewall setup
        gluster_infra_fw_ports:
           - 2049/tcp
           - 54321/tcp
           - 5900/tcp
           - 5900-6923/tcp
           - 5666/tcp
           - 16514/tcp
        gluster_infra_fw_permanent: true
        gluster_infra_fw_state: enabled
        gluster_infra_fw_zone: public
        gluster_infra_fw_services:
           - glusterfs
        gluster_infra_disktype: RAID6
        gluster_infra_diskcount: 12
        gluster_infra_stripe_unit_size: 128

  2. Create node_prep.yml playbook

    Create a node_prep.yml playbook file based on the following example.

    Example node_prep.yml playbook

    ---
    
    # Prepare Node for replace
    - name: Setup backend
      hosts: hc_nodes
      remote_user: root
      gather_facts: no
      any_errors_fatal: true
    
      roles:
         - gluster.infra
         - gluster.features

  3. Run node_prep.yml playbook

    # ansible-playbook -i node_prep_inventory.yml node_prep.yml