IMPORTANT update for upgrading Red Hat Gluster Storage from 3.0.x to 3.1.x

Solution Verified - Updated -

Environment

  • Red Hat Gluster Storage 3.1.x
  • Red Hat Enterprise Linux 6.x

Issue

  • How to upgrade Red Hat Gluster Storage from 3.0.x to 3.1.x?
  • Why Geo-replication & self-heal is not working properly after upgrading Red Hat Gluster storage from 3.0.x to 3.1.x?
  • Why the brick volfile is missing all the options for the latest Red Hat Gluster Storage version after the upgrade?
  • Why 0-dict: dict|match|action is NULL [Invalid argument] logs are flooding after Red Hat Gluster Storage upgrade?
  • Why the below errors are seen during the upgrade of Red Hat Gluster Storage ?

    0-glusterd: geo-replication module not working as desired
    gsyncd version checking is failed
    

Resolution

  • An issue has been identified with the direct upgrade from Red Hat Gluster Storage 3.0.x to Red Hat Gluster Storage 3.1.x. This issue is being tracked in this bugzilla#1353470

Updating RHGS 3.0.x to 3.1.x using In-service method

Warning : SMB and CTDB in-service upgrade is not supported, Follow the offline upgrade if SMB and CTDB are in use.

  1. Stop any geo-replication sessions before you begin. Note that slave nodes must be updated before master nodes when geo-replication is in use.

    # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop
    
  2. Stop the gluster services on the storage server using the following commands.

    # service glusterd stop (RHEL6) or #systemctl stop glusterd (RHEL 7)
    # pkill glusterfs
    # pkill glusterfsd
    
  3. Update the server using the following command

    # yum update
    

    Wait for the update to complete.

    a. Run the following command to check your log file for the relevant messages, ensuring that you wait until the command line prompt reappears

    # grep -irns "geo-replication module not working as desired" /var/log/glusterfs/etc-glusterfs-glusterd.vol.log | wc -l
    

    b. If the output is 0, continue with the rest of the procedure from Step 4.

    If the output is greater than zero, follow these steps.

    i. Check whether glusterd is running.

    # ps aux | grep glusterd
    

    If glusterd is running, stop glusterd.

    ii. Run the following command.

    # glusterd --xlator-option *.upgrade=on -N
    
  4. Reboot the server if a kernel update was included as part of the update process in the previous step.

  5. If there is no kerenl upgrade was included , Start glusterd .

     #service glusterd start (RHEL 6 )     or    #systemctl start glusterd (RHEL7)
    
  6. To verify that you have upgraded to the desired version, run the following command.

     # gluster --version
    
  7. Ensure that all the bricks are online.

    # gluster volume status
    
  8. Start self-heal on the volumes.

    #gluster volume heal volname
    
  9. Verify the self-heal is completed on the replica.

    #gluster volume heal volname info 
    
  10. Repeat Step 2 - Step 9 on the other node of replica pair.

  11. When all nodes have been upgraded, run the following command to update the op-version of the cluster. This helps to prevent any compatibility issues within the cluster.

    #gluster volume set all cluster.op-version 30712
    

    The op-version 30712 is for RHGS 3.1.3 , if the cluster was updated to a different version , refer this knowledge base and set the appropriate op-version for the cluster.

  12. If geo-replication was in use, Start the geo-replication session

    #gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL start | force
    

Updating RHGS 3.0.x to 3.1.x using Offline method

  1. Make a complete backup using a reliable backup solution.

  2. When it is certain that the backup is working , Stop the volumes.

    # gluster volume stop volname
    
  3. Stop the gluster services on the storage server using the following commands

    # service glusterd stop (RHEL6) or #systemctl stop glusterd (RHEL 7)
    # pkill glusterfs
    # pkill glusterfsd
    
  4. Update the server using the following command

     # yum update
    

    Wait for the update to complete.

    a. Run the following command to check your log file for the relevant messages, ensuring that you wait until the command line prompt reappears

    # grep -irns "geo-replication module not working as desired" /var/log/glusterfs/etc-glusterfs-glusterd.vol.log | wc -l
    

    b. If the output is 0, continue with the rest of the procedure from Step 4.

    If the output is greater than zero, follow these steps.

    i. Check whether glusterd is running.

    # ps aux | grep glusterd
    

    If glusterd is running, stop glusterd.

    ii. Run the following command.

    # glusterd --xlator-option *.upgrade=on -N
    
  5. Start glusterd service.

     #service glusterd start (RHEL 6)     or    #systemctl start glusterd (RHEL7)
    
  6. When all nodes have been upgraded, run the following command to update the op-version of the cluster. This helps to prevent any compatibility issues within the cluster.

    #gluster volume set all cluster.op-version 30712
    

    The op-version 30712 is for RHGS 3.1.3 , if the cluster is updated to a different version , refer this knowledge base and set the appropriate op-version for the cluster.

  7. Start the volumes with the following command:

    #gluster volume start volname
    

Already upgraded to RHGS 3.1.x and vol files does not contain options mentioned in diagnostic steps

Note : If the above mentioned in-service or offline method is not followed for the upgrade , It might cause issues to some basic functionality. Follow the below procedure to resolve this.

  1. Start and stop a volume profile to re-generate the volfile

    #gluster volume profile <volname> start
    #gluster volume profile <volname> stop
    

    Check verify whether the volfile has the below options

    volume volname-index
       type features/index
       option xattrop-pending-watchlist trusted.afr.volname-
       option xattrop-dirty-watchlist trusted.afr.dirty
       option index-base /<brickpath>/.glusterfs/indices
       subvolumes volname-barrier
    end-volume
    
    Note : Vol files are present at /var/lib/glusterd/vols/<VOLNAME>
    
  2. Stop the gluster volumes

    #gluster volume stop <volname>
    
  3. Navigate to the brick directories and execute the attached generate-index-files.sh script. This can be run in parrellel on all the brick directories and on all the nodes. Once the script is initiated on all the brick directories, proceed to the Step 4, no need to wait for the script to complete, let it fix the indices in the background .This will resolve any existing issues with the self-heal.

    #./generate-index-files.sh <path-to-brick> <volname> replicate
    
  4. Start the volumes

    #gluster volume start <volname>
    
  5. Make sure the generate-index-files.sh script is completed and no files are pending to heal.

    #gluster v heal <volname> info
    Verify this output against the content of directory $brick/.glusterfs/indices/xattrop/
    
  6. If the split-brain is existing , refer this knowledge base and resolve manually.

Root Cause

  • During the upgrade process, we expect the yum update to run glusterd --xlator-option *.upgrade=on -N . But it gets failed while executing gsyncd --version via runner interface. The new volfile with the options of the upgraded version is not getting generated as the glusterd is not started with the mentioned option, This eventually causes some issues for the functionality of the upgraded version of Red Hat Gluster Storage.

Diagnostic Steps

  • After the upgrade check whether the brick volfile has the below options

    volume volname-index
     type features/index
     option xattrop-pending-watchlist trusted.afr.volname-
     option xattrop-dirty-watchlist trusted.afr.dirty
     option index-base /<brickpath>/.glusterfs/indices
     subvolumes volname-barrier
    end-volume
    
  • Check the /var/log/glusterfs/etc-glusterfs-glusterd.vol.log , whether it contains any of the following messages

    1. geo-replication module not working as desired

    2. gsyncd version checking is failed

  • Check the brick logs /var/log/glusterfs/bricks/<brick-path>.log whether it contains below, while any read/write happening

    0-dict: dict|match|action is NULL [Invalid argument]
    

Attachments

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments