5.2. In-Service Software Upgrade from Red Hat Gluster Storage 3.4 to Red Hat Gluster Storage 3.5

Important

Upgrade all Red Hat Gluster Storage servers before updating clients.
In-service software upgrade refers to the ability to progressively update a Red Hat Gluster Storage Server cluster with a new version of the software without taking the volumes hosted on the cluster offline. In most cases normal I/O operations on the volume continue even when the cluster is being updated.
I/O that uses CTDB may pause for the duration of an upgrade or update. This affects clients using Gluster NFS or Samba.

Note

After performing inservice upgrade of NFS-Ganesha, the new configuration file is saved as "ganesha.conf.rpmnew" in /etc/ganesha folder. The old configuration file is not overwritten during the inservice upgrade process. However, post upgradation, you have to manually copy any new configuration changes from "ganesha.conf.rpmnew" to the existing ganesha.conf file in /etc/ganesha folder.

5.2.1. Pre-upgrade Tasks

Ensure you perform the following steps based on the set-up before proceeding with the in-service software upgrade process.

5.2.1.1. Upgrade Requirements for Red Hat Gluster Storage 3.5

The upgrade requirements to upgrade to Red Hat Gluster Storage 3.5 from the preceding update are as follows:
  • In-service software upgrade is supported for pure and distributed versions of arbiter, erasure-coded (disperse) and three way replicated volume types. It is not supported for a pure distributed volume.
  • If you want to use snapshots for your existing environment, each brick must be an independent thin provisioned logical volume (LV). If you do not plan to use snapshots, thickly provisioned volumes remain supported.
  • A Logical Volume that contains a brick must not be used for any other purpose.
  • Linear LVM and thin LV are supported with Red Hat Gluster Storage 3.4 and later. For more information, see https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html-single/logical_volume_manager_administration/index#LVM_components
  • When server-side quorum is enabled, ensure that bringing one node down does not violate server-side quorum. Add dummy peers to ensure the server-side quorum is not violated until the completion of rolling upgrade using the following command:
    # gluster peer probe dummynode

    Note

    If you have a geo-replication session, then to add a node follow the steps mentioned in the sectionStarting Geo-replication for a New Brick or New Node in the Red Hat Gluster Storage Administration Guide.
    For example, when the server-side quorum percentage is set to the default value (>50%), for a plain replicate volume with two nodes and one brick on each machine, a dummy node that does not contain any bricks must be added to the trusted storage pool to provide high availability of the volume using the command mentioned above.
    In a three node cluster, if the server-side quorum percentage is set to 77%, bringing down one node would violate the server-side quorum. In this scenario, you have to add two dummy nodes to meet server-side quorum.
  • Stop any geo-replication sessions running between the master and slave.
    # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop
  • Ensure that there are no pending self-heals before proceeding with in-service software upgrade using the following command:
    # gluster volume heal volname info
  • Ensure the Red Hat Gluster Storage server is registered to the required channels.
    On Red Hat Enterprise Linux 6:
    rhel-6-server-rpms
    rhel-scalefs-for-rhel-6-server-rpms
    rhs-3-for-rhel-6-server-rpms
    On Red Hat Enterprise Linux 7:
    rhel-7-server-rpms
    rh-gluster-3-for-rhel-7-server-rpms
    On Red Hat Enterprise Linux 8:
    rhel-8-for-x86_64-baseos-rpms
    rhel-8-for-x86_64-appstream-rpms
    rh-gluster-3-for-rhel-8-x86_64-rpms
    To subscribe to the channels, run the following command:
    # subscription-manager repos --enable=repo-name

5.2.1.2. Restrictions for In-Service Software Upgrade

The following lists some of the restrictions for in-service software upgrade:
  • In-service upgrade for NFS-Ganesha clusters is supported only from Red Hat Gluster Storage 3.4 and beyond. If you are upgrading from Red Hat Gluster Storage 3.1 and you use NFS-Ganesha, offline upgrade to Red Hat Gluster Storage 3.4. Then use in-service upgrade method to upgrade to Red Hat Gluster Storage 3.5.
  • Erasure coded (dispersed) volumes can be upgraded while in-service only if the disperse.optimistic-change-log ,disperse.eager-lock and disperse.other-eager-lock options are set to off. Wait for at least two minutes after disabling these options before attempting to upgrade to ensure that these configuration changes take effect for I/O operations.
  • Post upgrade the values can be changed to resemble the pre-upgrade values on erasure coded volumes, but ensure that disperse.optimistic-change-log and disperse.other-eager-lock options are set to on.
  • Ensure that the system workload is low before performing the in-service software upgrade, so that the self-heal process does not have to heal too many entries during the upgrade. Also, with high system workload healing is time-consuming.
  • Do not perform any volume operations on the Red Hat Gluster Storage server.
  • Do not change hardware configurations.
  • Do not run mixed versions of Red Hat Gluster Storage for an extended period of time. For example, do not have a mixed environment of Red Hat Gluster Storage 3.3, Red Hat Gluster Storage 3.4, and Red Hat Gluster Storage 3.5 for a prolonged time.
  • Do not combine different upgrade methods.
  • It is not recommended to use in-service software upgrade for migrating to thin provisioned volumes, but to use offline upgrade scenario instead. For more information see, Section 5.1, “Offline Upgrade to Red Hat Gluster Storage 3.5”

5.2.1.3. Configuring repo for Upgrading using ISO

To configure the repo to upgrade using ISO, execute the following steps:

Note

Upgrading Red Hat Gluster Storage using ISO can be performed only from the immediately preceding release. This means that upgrading to Red Hat Gluster Storage 3.5 using ISO can only be done from Red Hat Gluster Storage 3.4. For a complete list of supported Red Hat Gluster Storage releases, see Section 1.5, “Red Hat Gluster Storage Software Components and Versions”.
  1. Mount the ISO image file under any directory using the following command:
    # mount -o loop <ISO image file> <mount-point>
    For example:
    # mount -o loop rhgs-3.5-rhel-7-x86_64-dvd-1.iso /mnt
  2. Set the repo options in a file in the following location:
     /etc/yum.repos.d/<file_name.repo>
  3. Add the following information to the repo file:
    [local]
    name=local
    baseurl=file:///mnt
    enabled=1
    gpgcheck=0

5.2.1.4. Preparing and Monitoring the Upgrade Activity

Before proceeding with the in-service software upgrade, prepare and monitor the following processes:
  • Check the peer and volume status to ensure that all peers are connected and there are no active volume tasks.
    # gluster peer status
    # gluster volume status
  • Check the rebalance status using the following command:
    # gluster volume rebalance r2 status
    Node   Rebalanced-files  size      scanned   failures    skipped   status   run time in secs
    ---------   -----------    ---------   --------   ---------  ------  --------  --------------
    10.70.43.198         0       0Bytes       99       0           0    completed     1.00
    10.70.43.148         49      196Bytes    100       0           0    completed     3.00
  • If you need to upgrade an erasure coded (dispersed) volume, set the disperse.optimistic-change-log, disperse.eager-lock and disperse.other-eager-lock options to off. Wait for at least two minutes after disabling these options before attempting to upgrade to ensure that these configuration changes take effect for I/O operations.
    # gluster volume set volname disperse.optimistic-change-log off
    # gluster volume set volname disperse.eager-lock off
    # gluster volume set volname disperse.other-eager-lock off
  • Ensure that there are no pending self-heals by using the following command:
    # gluster volume heal volname info
    The following example shows no pending self-heals.
    # gluster volume heal drvol info
    Gathering list of entries to be healed on volume drvol has been successful
    
    Brick 10.70.37.51:/rhs/brick1/dir1
    Number of entries: 0
    
    Brick 10.70.37.78:/rhs/brick1/dir1
    Number of entries: 0
    
    Brick 10.70.37.51:/rhs/brick2/dir2
    Number of entries: 0
    
    Brick 10.70.37.78:/rhs/brick2/dir2
    Number of entries: 0

5.2.2. Service Impact of In-Service Upgrade

In-service software upgrade impacts the following services. Ensure you take the required precautionary measures.
Gluster NFS (Deprecated)

Warning

Gluster-NFS is considered deprecated as of Red Hat Gluster Storage 3.5. Red Hat no longer recommends the use of Gluster-NFS, and does not support its use in new deployments and existing deployments that upgrade to Red Hat Gluster Storage 3.5.3.
When you use Gluster NFS (Deprecated) to mount a volume, any new or outstanding file operations on that file system will hang uninterruptedly during in-service software upgrade until the server is upgraded.

Samba / CTDB

Ongoing I/O on Samba shares will fail as the Samba shares will be temporarily unavailable during the in-service software upgrade, hence it is recommended to stop the Samba service using the following command:

# service ctdb stop   ;Stopping CTDB will also stop the SMB service.

Distribute Volume

In-service software upgrade is not supported for distributed volume. If you have a distributed volume in the cluster, stop that volume for the duration of the upgrade.

# gluster volume stop <VOLNAME>
Virtual Machine Store

The virtual machine images are likely to be modified constantly. The virtual machine listed in the output of the volume heal command does not imply that the self-heal of the virtual machine is incomplete. It could mean that the modifications on the virtual machine are happening constantly.

Hence, if you are using a gluster volume for storing virtual machine images, then it is recommended to power-off all virtual machine instances before in-service software upgrade.

Important

It is recommended to turn off all virtual machine instances to minimize the changes that occur during an In-Service Upgrade. If you choose to keep the VM instances up and running , the time taken to heal will increase and at times may cause a situation where the pending heal remains a constant number and takes a long time to close to 0.

5.2.3. In-Service Software Upgrade

Note

Only offline mode is supported for samba.
The following steps have to be performed on each node of the replica pair:
  1. Back up the following configuration directory and files in a location that is not on the operating system partition.
    • /var/lib/glusterd
    • /etc/samba
    • /etc/ctdb
    • /etc/glusterfs
    • /var/lib/samba
    • /var/lib/ctdb
    • /var/run/gluster/shared_storage/nfs-ganesha

    Note

    With the release of 3.5 Batch Update 3, the mount point of shared storage is changed from /var/run/gluster/ to /run/gluster/ .
    If you use NFS-Ganesha, back up the following files from all nodes:
    • /run/gluster/shared_storage/nfs-ganesha/exports/export.*.conf
    • /etc/ganesha/ganesha.conf
    • /etc/ganesha/ganesha-ha.conf
  2. Important

    If you are updating from RHEL 8.3 to RHEL 8.4, then follow these additional steps.
    1. On one node in the cluster, edit /etc/corosync/corosync.conf. Add a line "token: 3000" to the totem stanza, for example:
      totem {
                  version: 2
                  secauth: off
                  cluster_name: rhel8-cluster
                  transport: knet
                  token: 3000
                }
    2. Run `pcs cluster sync`. It is optional to verify /etc/corosync/corosync.conf on all nodes has the new token: 3000 line.
    3. Run `pcs cluster reload corosync`
    4. Run `corosync-cmapctl | grep totem.token, confirm that "runtime.config.totem.token (u32) = 3000" as output.
  3. If the node is part of an NFS-Ganesha cluster, place the node in standby mode.
    # pcs node standby
  4. Ensure that there are no pending self-heal operations.
    # gluster volume heal volname info
  5. If this node is part of an NFS-Ganesha cluster:
    1. Disable the PCS cluster and verify that it has stopped.
      # pcs cluster disable
      # pcs status
    2. Stop the nfs-ganesha service.
      # systemctl stop nfs-ganesha
  6. Stop all gluster services on the node and verify that they have stopped.
    # systemctl stop glusterd
    # pkill glusterfs
    # pkill glusterfsd
    # pgrep gluster
  7. Verify that your system is not using the legacy Red Hat Classic update software.
    # migrate-rhs-classic-to-rhsm --status
    If your system uses this legacy software, migrate to Red Hat Subscription Manager and verify that your status has changed when migration is complete.
    # migrate-rhs-classic-to-rhsm --rhn-to-rhsm
    # migrate-rhs-classic-to-rhsm --status
  8. Update the server using the following command:
    # yum update

    Important

    Run the following command to install nfs-ganesha-selinux on Red Hat Enterprise Linux 7:
    # yum install nfs-ganesha-selinux
    Run the following command to install nfs-ganesha-selinux on Red Hat Enterprise Linux 8:
       # dnf install glusterfs-ganesha
  9. If the volumes are thick provisioned, and you plan to use snapshots, perform the following steps to migrate to thin provisioned volumes:

    Note

    Migrating from thick provisioned volume to thin provisioned volume during in-service software upgrade takes a significant amount of time based on the data you have in the bricks. If you do not plan to use snapshots, you can skip this step. However, if you plan to use snapshots on your existing environment, the offline method to upgrade is recommended. For more information regarding offline upgrade, see Section 5.1, “Offline Upgrade to Red Hat Gluster Storage 3.5”
    Contact a Red Hat Support representative before migrating from thick provisioned volumes to thin provisioned volumes using in-service software upgrade.
    1. Unmount all the bricks associated with the volume by executing the following command:
      # umount mount_point
    2. Remove the LVM associated with the brick by executing the following command:
      # lvremove logical_volume_name
      For example:
      # lvremove /dev/RHS_vg/brick1
    3. Remove the volume group by executing the following command:
      # vgremove -ff volume_group_name
      For example:
      # vgremove -ff RHS_vg
    4. Remove the physical volume by executing the following command:
      # pvremove -ff physical_volume
    5. If the physical volume (PV) is not created, then create the PV for a RAID 6 volume by executing the following command, else proceed with the next step:
      # pvcreate --dataalignment 2560K /dev/vdb
    6. Create a single volume group from the PV by executing the following command:
      # vgcreate volume_group_name disk
      For example:
      # vgcreate RHS_vg /dev/vdb
    7. Create a thinpool using the following command:
      # lvcreate -L size --poolmetadatasize md_size --chunksize chunk_size -T pool_device
      For example:
      # lvcreate -L 2T --poolmetadatasize 16G --chunksize 256  -T /dev/RHS_vg/thin_pool
    8. Create a thin volume from the pool by executing the following command:
      # lvcreate -V size -T pool_device -n thinvol_name
      For example:
      # lvcreate -V 1.5T -T /dev/RHS_vg/thin_pool -n thin_vol
    9. Create filesystem in the new volume by executing the following command:
      # mkfs.xfs -i size=512 thin_vol
      For example:
      # mkfs.xfs -i size=512 /dev/RHS_vg/thin_vol
      The back-end is now converted to a thin provisioned volume.
    10. Mount the thin provisioned volume to the brick directory and setup the extended attributes on the bricks. For example:
      # setfattr -n trusted.glusterfs.volume-id \ -v 0x$(grep volume-id /var/lib/glusterd/vols/volname/info \ | cut -d= -f2 | sed 's/-//g') $brick
  10. Disable glusterd.
    # systemctl disable glusterd
    This prevents it starting during boot time, so that you can ensure the node is healthy before it rejoins the cluster.
  11. Reboot the server.
    # shutdown -r now "Shutting down for upgrade to Red Hat Gluster Storage 3.5"
  12. Important

    Perform this step only for each thick provisioned volume that has been migrated to thin provisioned volume in the previous step.
    Change the Automatic File Replication extended attributes from another node, so that the heal process is executed from a brick in the replica subvolume to the thin provisioned brick.
    1. Create a FUSE mount point to edit the extended attributes.
      # mount -t glusterfs HOSTNAME_or_IPADDRESS:/VOLNAME /MOUNTDIR
    2. Create a new directory on the mount point, and ensure that a directory with such a name is not already present.
      # mkdir /MOUNTDIR/name-of-nonexistent-dir
    3. Delete the directory and set the extended attributes.
      # rmdir /MOUNTDIR/name-of-nonexistent-dir
      # setfattr -n trusted.non-existent-key -v abc /MOUNTDIR
      # setfattr -x trusted.non-existent-key /MOUNTDIR
    4. Ensure that the extended attributes of the brick in the replica subvolume is not set to zero.
      # getfattr -d -m. -e hex brick_path
      In the following example, the extended attribute trusted.afr.repl3-client-1 for /dev/RHS_vg/brick2 is not set to zero:
      # getfattr -d -m. -e hex /dev/RHS_vg/brick2
      getfattr: Removing leading '/' from absolute path names
      # file: /dev/RHS_vg/brick2
      trusted.afr.dirty=0x000000000000000000000000
      trusted.afr.repl3-client-1=0x000000000000000400000002
      trusted.gfid=0x00000000000000000000000000000001
      trusted.glusterfs.dht=0x000000010000000000000000ffffffff
      trusted.glusterfs.volume-id=0x924c2e2640d044a687e2c370d58abec9
  13. Start the glusterd service.
    # systemctl start glusterd
  14. Verify that you have upgraded to the latest version of Red Hat Gluster Storage.
    # gluster --version
  15. Ensure that all bricks are online.
    # gluster volume status
    For example:
    # gluster volume status
    Status of volume: r2
    
    Gluster process                                         Port    Online  Pid
    ------------------------------------------------------------------------------
    Brick 10.70.43.198:/brick/r2_0                          49152   Y       32259
    Brick 10.70.42.237:/brick/r2_1                          49152   Y       25266
    Brick 10.70.43.148:/brick/r2_2                          49154   Y       2857
    Brick 10.70.43.198:/brick/r2_3                          49153   Y       32270
    NFS Server on localhost                                 2049    Y       25280
    Self-heal Daemon on localhost                           N/A     Y       25284
    NFS Server on 10.70.43.148                              2049    Y       2871
    Self-heal Daemon on 10.70.43.148                        N/A     Y       2875
    NFS Server on 10.70.43.198                              2049    Y       32284
    Self-heal Daemon on 10.70.43.198                        N/A     Y       32288
    
    Task Status of Volume r2
    ------------------------------------------------------------------------------
    There are no active volume tasks
  16. Start self-heal on the volume.
    # gluster volume heal volname
  17. Ensure that self-heal on the volume is complete.
    # gluster volume heal volname info
    The following example shows a completed self heal operation.
     # gluster volume heal drvol info
    Gathering list of entries to be healed on volume drvol has been successful
    
    Brick 10.70.37.51:/rhs/brick1/dir1
    Number of entries: 0
    
    Brick 10.70.37.78:/rhs/brick1/dir1
    Number of entries: 0
    
    Brick 10.70.37.51:/rhs/brick2/dir2
    Number of entries: 0
    
    Brick 10.70.37.78:/rhs/brick2/dir2
    Number of entries: 0
  18. Verify that shared storage is mounted.
    # mount | grep /run/gluster/shared_storage
  19. If this node is part of an NFS-Ganesha cluster:
    1. If the system is managed by SELinux, set the ganesha_use_fusefs Boolean to on.
      # setsebool -P ganesha_use_fusefs on
    2. Start the NFS-Ganesha service.
      # systemctl start nfs-ganesha
    3. Enable and start the cluster.
      # pcs cluster enable
      # pcs cluster start
    4. Release the node from standby mode.
      # pcs node unstandby
    5. Verify that the pcs cluster is running, and that the volume is being exported correctly after upgrade.
      # pcs status
      # showmount -e
      NFS-ganesha enters a short grace period after performing these steps. I/O operations halt during this grace period. Wait until you see NFS Server Now NOT IN GRACE in the ganesha.log file before continuing.
  20. Optionally, enable the glusterd service to start at boot time.
    # systemctl enable glusterd
  21. Repeat the above steps on the other node of the replica pair. In the case of a distributed-replicate setup, repeat the above steps on all the replica pairs.
  22. Important

    If you are updating from RHEL 8.3 to RHEL 8.4, then follow these additional steps.
    1. Restore the totem-token timeout to the original value, for example, by deleting the token: 3000 line from /etc/corosync/corosync.conf
    2. Run `pcs cluster sync`. It is optional to verify /etc/corosync/corosync.conf on all nodes has the new token: 3000 line.
    3. Run `pcs cluster reload corosync`.
    4. Run `corosync-cmapctl | grep totem.token to verify the changes.
  23. When all nodes have been upgraded, run the following command to update the op-version of the cluster. This helps to prevent any compatibility issues within the cluster.
    # gluster volume set all cluster.op-version 70200

    Note

    70200 is the cluster.op-version value for Red Hat Gluster Storage 3.5. Ater upgrading the cluster-op version, enable the granular-entry-heal for the volume via the given command:
    gluster volume heal $VOLNAME granular-entry-heal enable
    The feature is now enabled by default post upgrade to Red Hat Gluster Storage 3.5, but this will come into affect only after bumping up the op-version. Refer to Section 1.5, “Red Hat Gluster Storage Software Components and Versions” for the correct cluster.op-version value for other versions.

    Note

    If you want to enable snapshots, see Managing Snapshots in the Red Hat Gluster Storage 3.5 Administration Guide.

    Important

    Remount the Red Hat Gluster Storage volume on the client side, in the case of updating the Red Hat Gluster Storage nodes enabled with network encryption using TLS/SSL. For more information, refer to Configuring Network Encryption in Red Hat Gluster Storage
  24. If the client-side quorum was disabled before upgrade, then upgrade it by executing the following command:
    # gluster volume set volname cluster.quorum-type auto
  25. If a dummy node was created earlier, then detach it by executing the following command:
    # gluster peer detach <dummy_node name>
  26. If the geo-replication session between master and slave was disabled before upgrade, then configure the meta volume and restart the session:
    1. Run the following command:
      # gluster volume set all cluster.enable-shared-storage enable
    2. If you use a non-root user to perform geo-replication, run this command on the primary slave to set permissions for the non-root group.
      # gluster-mountbroker setup <mount_root> <group>
    3. Run the following command:
      # gluster volume geo-replication Volume1 example.com::slave-vol config use_meta_volume true
    4. Run the following command:
      # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL start
  27. If you disabled the disperse.optimistic-change-log, disperse.eager-lock and disperse.other-eager-lock options in order to upgrade an erasure-coded (dispersed) volume, re-enable these settings.
    # gluster volume set volname disperse.optimistic-change-log on
    # gluster volume set volname disperse.eager-lock on
    # gluster volume set volname disperse.other-eager-lock on

5.2.4. Special Consideration for In-Service Software Upgrade

The following sections describe the in-service software upgrade steps for a CTDB setup.

Note

Only offline mode is supported.

5.2.4.1. Upgrading the Native Client

All clients must use the same version of glusterfs-fuse. Red Hat strongly recommends that you upgrade servers before upgrading clients. If you are upgrading from Red Hat Gluster Storage 3.1 Update 2 or earlier, you must upgrade servers and clients simultaneously. For more information regarding upgrading native client, refer the steps mentioned below.
Before updating the Native Client, subscribe the clients to the channels mentioned in Section 6.2.1, “Installing Native Client”.

Warning

If you want to access a volume being provided by a server using Red Hat Gluster Storage 3.1.3 or higher, your client must also be using Red Hat Gluster Storage 3.1.3 or higher. Accessing these volumes from earlier client versions can result in data becoming unavailable and problems with directory operations. This requirement exists because Red Hat Gluster Storage 3.1.3 changed how the Distributed Hash Table works in order to improve directory consistency and remove the effects.
  1. Unmount gluster volumes

    Unmount any gluster volumes prior to upgrading the native client.
    # umount /mnt/glusterfs
  2. Upgrade the client

    Run the yum update command to upgrade the native client:
    # yum update glusterfs glusterfs-fuse
  3. Remount gluster volumes