Chapter 4. Upgrading a Red Hat Ceph Storage Cluster

This section describes how to upgrade to a new major or minor version of Red Hat Ceph Storage.

Previously, Red Hat did not provide the ceph-ansible package for Ubuntu. In Red Hat Ceph Storage version 3 and later, you can use the Ansible automation application to upgrade a Ceph cluster from an Ubuntu node.

Use the Ansible rolling_update.yml playbook located in the /usr/share/ceph-ansible/infrastructure-playbooks/ directory from the administration node to upgrade between two major or minor versions of Red Hat Ceph Storage, or to apply asynchronous updates.

Ansible upgrades the Ceph nodes in the following order:

  • Monitor nodes
  • MGR nodes
  • OSD nodes
  • MDS nodes
  • Ceph Object Gateway nodes
  • All other Ceph client nodes
Note

Red Hat Ceph Storage 3 introduces several changes in Ansible configuration files located in the /usr/share/ceph-ansible/group_vars/ directory; certain parameters were renamed or removed. Therefore, make backup copies of the all.yml and osds.yml files before creating new copies from the all.yml.sample and osds.yml.sample files after upgrading to version 3. For more details about the changes, see Appendix H, Changes in Ansible Variables Between Version 2 and 3.

Note

Red Hat Ceph Storage 3.1 and later introduces new Ansible playbooks to optimize storage for performance when using Object Gateway and high speed NVMe based SSDs (and SATA SSDs). The playbooks do this by placing journals and bucket indexes together on SSDs, which can increase performance compared to having all journals on one device. These playbooks are designed to be used when installing Ceph. Existing OSDs continue to work and need no extra steps during an upgrade. There is no way to upgrade a Ceph cluster while simultaneously reconfiguring OSDs to optimize storage in this way. To use different devices for journals or bucket indexes requires reprovisioning OSDs. For more information see Using NVMe with LVM optimally in Ceph Object Gateway for Production.

Important

The rolling_update.yml playbook includes the serial variable that adjusts the number of nodes to be updated simultaneously. Red Hat strongly recommends to use the default value (1), which ensures that Ansible will upgrade cluster nodes one by one.

Important

When using the rolling_update.yml playbook to upgrade to any Red Hat Ceph Storage 3.x version, users who use the Ceph File System (CephFS) must manually update the Metadata Server (MDS) cluster. This is due to a known issue.

Comment out the MDS hosts in /etc/ansible/hosts before upgrading the entire cluster using ceph-ansible rolling-upgrade.yml, and then upgrade MDS manually. In the /etc/ansible/hosts file:

 #[mdss]
 #host-abc

For more details about this known issue, including how to update the MDS cluster, refer to the Red Hat Ceph Storage 3.0 Release Notes.

Important

When upgrading a Red Hat Ceph Storage cluster from a previous version to 3.2, the Ceph Ansible configuration will default the object store type to BlueStore. If you still want to use FileStore as the OSD object store, then explicitly set the Ceph Ansible configuration to FileStore. This ensures newly deployed and replaced OSDs are using FileStore.

Important

When using the rolling_update.yml playbook to upgrade to any Red Hat Ceph Storage 3.x version, and if you are using a multisite Ceph Object Gateway configuration, then you do not have to manually update the all.yml file to specify the multisite configuration.

Prerequisites

  • If the Ceph nodes are not connected to the Red Hat Content Delivery Network (CDN) and you used an ISO image to install Red Hat Ceph Storage, update the local repository with the latest version of Red Hat Ceph Storage. See Section 2.4, “Enabling the Red Hat Ceph Storage Repositories” for details.
  • If upgrading from Red Hat Ceph Storage 2.x to 3.x, on the Ansible administration node and the RBD mirroring node, enable the Red Hat Ceph Storage 3 Tools repository:

    [root@admin ~]$ sudo bash -c 'umask 0077; echo deb https://customername:customerpasswd@rhcs.download.redhat.com/3-updates/Tools $(lsb_release -sc) main | tee /etc/apt/sources.list.d/Tools.list'
    [root@admin ~]$ sudo bash -c 'wget -O - https://www.redhat.com/security/fd431d51.txt | apt-key add -'
    [root@admin ~]$ sudo apt-get update
  • If upgrading from RHCS 2.x to 3.x, or from RHCS 3.x to the latest version, on the Ansible administration node, ensure the latest version of the ceph-ansible package is installed.

    [root@admin ~]$ sudo apt-get install ceph-ansible
  • In the rolling_update.yml playbook, change the health_osd_check_retries and health_osd_check_delay values to 50 and 30 respectively.

    health_osd_check_retries: 50
    health_osd_check_delay: 30

    With these values set, for each OSD node, Ansible will wait up to 25 minutes, and will check the storage cluster health every 30 seconds, waiting before continuing the upgrade process.

    Note

    Adjust the health_osd_check_retries option value up or down based on the used storage capacity of the storage cluster. For example, if you are using 218 TB out of 436 TB, basically using 50% of the storage capacity, then set the health_osd_check_retries option to 50.

  • If the cluster you want to upgrade contains Ceph Block Device images that use the exclusive-lock feature, ensure that all Ceph Block Device users have permissions to blacklist clients:

    ceph auth caps client.<ID> mon 'allow r, allow command "osd blacklist"' osd '<existing-OSD-user-capabilities>'

4.1. Upgrading the Storage Cluster

Procedure

Use the following commands from the Ansible administration node.

  1. As the root user, navigate to the /usr/share/ceph-ansible/ directory:

    [root@admin ~]# cd /usr/share/ceph-ansible/
  2. Skip this step when upgrading from Red Hat Ceph Storage version 3.x to the latest version. Back up the group_vars/all.yml and group_vars/osds.yml files.

    [root@admin ceph-ansible]# cp group_vars/all.yml group_vars/all_old.yml
    [root@admin ceph-ansible]# cp group_vars/osds.yml group_vars/osds_old.yml
    [root@admin ceph-ansible]# cp group_vars/clients.yml group_vars/clients_old.yml
  3. Skip this step when upgrading from Red Hat Ceph Storage version 3.x to the latest version. When upgrading from Red Hat Ceph Storage 2.x to 3.x, create new copies of the group_vars/all.yml.sample, group_vars/osds.yml.sample and group_vars/clients.yml.sample files, and rename them to group_vars/all.yml, group_vars/osds.yml, and group_vars/clients.yml respectively. Open and edit them accordingly. For details, see Appendix H, Changes in Ansible Variables Between Version 2 and 3 and Section 3.2, “Installing a Red Hat Ceph Storage Cluster” .

    [root@admin ceph-ansible]# cp group_vars/all.yml.sample group_vars/all.yml
    [root@admin ceph-ansible]# cp group_vars/osds.yml.sample group_vars/osds.yml
    [root@admin ceph-ansible]# cp group_vars/clients.yml.sample group_vars/clients.yml
  4. Skip this step when upgrading from Red Hat Ceph Storage version 3.x to the latest version. When upgrading from Red Hat Ceph Storage 2.x to 3.x, open the group_vars/clients.yml file, and uncomment the following lines:

    keys:
      - { name: client.test, caps: { mon: "allow r", osd: "allow class-read object_prefix rbd_children, allow rwx pool=test" },  mode: "{{ ceph_keyring_permissions }}" }
    1. Replace client.test with the real client name, and add the client key to the client definition line, for example:

      key: "ADD-KEYRING-HERE=="

      Now the whole line example would look similar to this:

      - { name: client.test, key: "AQAin8tUMICVFBAALRHNrV0Z4MXupRw4v9JQ6Q==", caps: { mon: "allow r", osd: "allow class-read object_prefix rbd_children, allow rwx pool=test" },  mode: "{{ ceph_keyring_permissions }}" }
      Note

      To get the client key, run the ceph auth get-or-create command to view the key for the named client.

  5. In the group_vars/all.yml file, uncomment the upgrade_ceph_packages option and set it to True.

    upgrade_ceph_packages: True
  6. Add the fetch_directory parameter to the group_vars/all.yml file.

    fetch_directory: <full_directory_path>

    Replace:

    • <full_directory_path> with a writable location, such as the Ansible user’s home directory.
  7. If the cluster you want to upgrade contains any Ceph Object Gateway nodes, add the radosgw_interface parameter to the group_vars/all.yml file.

    radosgw_interface: <interface>

    Replace:

    • <interface> with the interface that the Ceph Object Gateway nodes listen to.
  8. Starting with Red Hat Ceph Storage 3.2, the default OSD object store is BlueStore. To keep the traditional OSD object store, you must explicitly set the osd_objectstore option to filestore in the group_vars/all.yml file.

    osd_objectstore: filestore
    Note

    With the osd_objectstore option set to filestore, replacing an OSD will use FileStore, instead of BlueStore.

  9. In the Ansible inventory file located at /etc/ansible/hosts, add the Ceph Manager (ceph-mgr) nodes under the [mgrs] section. Colocate the Ceph Manager daemon with Monitor nodes. Skip this step when upgrading from version 3.x to the latest version.

    [mgrs]
    <monitor-host-name>
    <monitor-host-name>
    <monitor-host-name>
  10. Copy rolling_update.yml from the infrastructure-playbooks directory to the current directory.

    [root@admin ceph-ansible]# cp infrastructure-playbooks/rolling_update.yml .
  11. Create the /var/log/ansible/ directory and assign the appropriate permissions for the ansible user:

    [root@admin ceph-ansible]# mkdir /var/log/ansible
    [root@admin ceph-ansible]# chown ansible:ansible  /var/log/ansible
    [root@admin ceph-ansible]# chmod 755 /var/log/ansible
    1. Edit the /usr/share/ceph-ansible/ansible.cfg file, updating the log_path value as follows:

      log_path = /var/log/ansible/ansible.log
  12. As the Ansible user, run the playbook:

    [user@admin ceph-ansible]$ ansible-playbook rolling_update.yml

    To use the playbook only for a particular group of nodes on the Ansible inventory file, use the --limit option. For details, see Section 3.8, “Understanding the limit option”.

  13. While logged in as the root user on the RBD mirroring daemon node, upgrade rbd-mirror manually:

    $ sudo apt-get upgrade rbd-mirror

    Restart the daemon:

    # systemctl restart  ceph-rbd-mirror@<client-id>
  14. Verify that the cluster health is OK. ..Log into a monitor node as the root user and run the ceph status command.
[root@monitor ~]# ceph -s
  1. If working in an OpenStack environment, update all the cephx users to use the RBD profile for pools. The following commands must be run as the root user:

    • Glance users

      ceph auth caps client.glance mon 'profile rbd' osd 'profile rbd pool=<glance-pool-name>'

      Example

      [root@monitor ~]# ceph auth caps client.glance mon 'profile rbd' osd 'profile rbd pool=images'

    • Cinder users

      ceph auth caps client.cinder mon 'profile rbd' osd 'profile rbd pool=<cinder-volume-pool-name>, profile rbd pool=<nova-pool-name>, profile rbd-read-only pool=<glance-pool-name>'

      Example

      [root@monitor ~]# ceph auth caps client.cinder mon 'profile rbd' osd 'profile rbd pool=volumes, profile rbd pool=vms, profile rbd-read-only pool=images'

    • OpenStack general users

      ceph auth caps client.openstack mon 'profile rbd' osd 'profile rbd-read-only pool=<cinder-volume-pool-name>, profile rbd pool=<nova-pool-name>, profile rbd-read-only pool=<glance-pool-name>'

      Example

      [root@monitor ~]# ceph auth caps client.openstack mon 'profile rbd' osd 'profile rbd-read-only pool=volumes, profile rbd pool=vms, profile rbd-read-only pool=images'

      Important

      Do these CAPS updates before performing any live client migrations. This allows clients to use the new libraries running in memory, causing the old CAPS settings to drop from cache and applying the new RBD profile settings.