Chapter 4. Upgrading a Red Hat Ceph Storage Cluster

This section describes how to upgrade to a new major or minor version of Red Hat Ceph Storage.

Important

Please contact Red Hat support prior to upgrading, if you have a large Ceph Object Gateway storage cluster with millions of objects present in buckets.

For more details refer to the Red Hat Ceph Storage 3.0 Release Notes, under the Slow OSD startup after upgrading to Red Hat Ceph Storage 3.0 heading.

Use the Ansible rolling_update.yml playbook located in the /usr/share/ceph-ansible/infrastructure-playbooks/ directory from the administration node to upgrade between two major or minor versions of Red Hat Ceph Storage, or to apply asynchronous updates.

Ansible upgrades the Ceph nodes in the following order:

  • Monitor nodes
  • MGR nodes
  • OSD nodes
  • MDS nodes
  • Ceph Object Gateway nodes
  • All other Ceph client nodes
Note

Red Hat Ceph Storage 3 introduces several changes in Ansible configuration files located in the /usr/share/ceph-ansible/group_vars/ directory; certain parameters were renamed or removed. Therefore, make backup copies of the all.yml and osds.yml files before creating new copies from the all.yml.sample and osds.yml.sample files after upgrading to version 3. For more details about the changes, see Appendix H, Changes in Ansible Variables Between Version 2 and 3.

Note

Red Hat Ceph Storage 3.1 introduces new Ansible playbooks to optimize storage for performance when using Object Gateway and high speed NVMe based SSDs (and SATA SSDs). The playbooks do this by placing journals and bucket indexes together on SSDs, which can increase performance compared to having all journals on one device. These playbooks are designed to be used when installing Ceph. Existing OSDs continue to work and need no extra steps during an upgrade. There is no way to upgrade a Ceph cluster while simultaneously reconfiguring OSDs to optimize storage in this way. To use different devices for journals or bucket indexes requires reprovisioning OSDs. For more information see Using NVMe with LVM optimally in Ceph Object Gateway for Production.

Important

The rolling_update.yml playbook includes the serial variable that adjusts the number of nodes to be updated simultaneously. Red Hat strongly recommends to use the default value (1), which ensures that Ansible will upgrade cluster nodes one by one.

Important

When using the rolling_update.yml playbook to upgrade to Red Hat Ceph Storage 3.0 and from version 3.0 to other zStream releases of 3.0, users who use the Ceph File System (CephFS) must manually update the Metadata Server (MDS) cluster. This is due to a known issue.

Comment the MDS hosts in /etc/ansible/hosts before upgrading the entire cluster using ceph-ansible rolling-upgrade.yml, and then upgrade MDS manually. In the /etc/ansible/hosts file:

 #[mdss]
 #host-abc

For more details about this known issue, including how to update the MDS cluster, refer to the Red Hat Ceph Storage 3.0 Release Notes.

Prerequisites

  • On all nodes in the cluster, enable the rhel-7-server-extras-rpms repository.

    # subscription-manager repos --enable=rhel-7-server-extras-rpms
  • If the Ceph nodes are not connected to the Red Hat Content Delivery Network (CDN) and you used an ISO image to install Red Hat Ceph Storage, update the local repository with the latest version of Red Hat Ceph Storage. See Section 2.5, “Enabling the Red Hat Ceph Storage Repositories” for details.
  • If upgrading from Red Hat Ceph Storage 2.x to 3.x, on the Ansible administration node and the RBD mirroring node, enable the Red Hat Ceph Storage 3 Tools repository and Ansible repository:

    [root@admin ~]# subscription-manager repos --enable=rhel-7-server-rhceph-3-tools-rpms --enable=rhel-7-server-ansible-2.4-rpms
  • If upgrading from Red Hat Ceph Storage 3.0 to 3.1 and using Red Hat Ceph Storage Dashboard, before upgrading the cluster, purge the old cephmetrics installation from the cluster. This avoids an issue where the dashboard won’t display data after upgrade.

    1. If the cephmetrics-ansible package isn’t already updated, update it:

      [root@admin ~]# yum update cephmetrics-ansible
    2. Change to the /usr/share/cephmetrics-ansible/ directory.

      [root@admin ~]# cd /usr/share/cephmetrics-ansible
    3. Purge the existing cephmetrics installation.

      [root@admin cephmetrics-ansible]# ansible-playbook -v purge.yml
    4. Install the updated Red Hat Ceph Storage Dashboard

      [root@admin cephmetrics-ansible]# ansible-playbook -v playbook.yml
  • If the Ceph nodes are not connected to the Red Hat Content Delivery Network (CDN) and you used an ISO image to install Red Hat Ceph Storage, update the local repository with the latest version of Red Hat Ceph Storage. See Section 2.5, “Enabling the Red Hat Ceph Storage Repositories” for details.
  • On the Ansible administration node, ensure the latest version of the ansible and ceph-ansible packages are installed.

    [root@admin ~]# yum update ansible ceph-ansible
  • In the rolling_update.yml playbook, change the health_osd_check_retries and health_osd_check_delay values to 40 and 30 respectively.

    health_osd_check_retries: 40
    health_osd_check_delay: 30

    For each OSD node, Ansible will wait up to 20 minutes. Also, Ansible will check the cluster health every 30 seconds, waiting before continuing the upgrade process.

  • If the cluster you want to upgrade contains Ceph Block Device images that use the exclusive-lock feature, ensure that all Ceph Block Device users have permissions to blacklist clients:

    ceph auth caps client.<ID> mon 'allow r, allow command "osd blacklist"' osd '<existing-OSD-user-capabilities>'

Procedure

Use the following commands from the Ansible administration node.

  1. Navigate to the /usr/share/ceph-ansible/ directory:

    [user@admin ~]$ cd /usr/share/ceph-ansible/
  2. Back up the group_vars/all.yml and group_vars/osds.yml files. Skip this step when upgrading from version 3.x to the latest version.

    [root@admin ceph-ansible]# cp group_vars/all.yml group_vars/all_old.yml
    [root@admin ceph-ansible]# cp group_vars/osds.yml group_vars/osds_old.yml
  3. Create new copies of the group_vars/all.yml.sample and group_vars/osds.yml.sample named group_vars/all.yml and group_vars/osds.yml respectively and edit them according to you deployment. Skip this step when upgrading from version 3.x to the latest version. For details, see Appendix H, Changes in Ansible Variables Between Version 2 and 3 and Section 3.2, “Installing a Red Hat Ceph Storage Cluster” .

    [root@admin ceph-ansible]# cp group_vars/all.yml.sample group_vars/all.yml
    [root@admin ceph-ansible]# cp group_vars/osds.yml.sample group_vars/osds.yml
  4. In the group_vars/all.yml file, uncomment the upgrade_ceph_packages option and set it to True.

    upgrade_ceph_packages: True
  5. In the group_vars/all.yml file, set ceph_rhcs_version to 3.

    ceph_rhcs_version: 3
    Note

    Having the ceph_rhcs_version option set to 3 will pull in the latest version of Red Hat Ceph Storage 3.

  6. Add the fetch_directory parameter to the group_vars/all.yml file.

    fetch_directory: <full_directory_path>

    Replace:

    • <full_directory_path> with a writable location, such as the Ansible user’s home directory.
  7. If the cluster you want to upgrade contains any Ceph Object Gateway nodes, add the radosgw_interface parameter to the group_vars/all.yml file.

    radosgw_interface: <interface>

    Replace:

    • <interface> with the interface that the Ceph Object Gateway nodes listen to.
  8. In the Ansible inventory file located at /etc/ansible/hosts, add the Ceph Manager (ceph-mgr) nodes under the [mgrs] section. Colocate the Ceph Manager daemon with Monitor nodes. Skip this step when upgrading from version 3.x to the latest version.

    [mgrs]
    <monitor-host-name>
    <monitor-host-name>
    <monitor-host-name>
  9. Copy rolling_update.yml from the infrastructure-playbooks directory to the current directory.

    [root@admin ceph-ansible]# cp infrastructure-playbooks/rolling_update.yml .
  10. Create the /var/log/ansible/ directory and assign the appropriate permissions for the ansible user:

    [root@admin ceph-ansible]# mkdir /var/log/ansible
    [root@admin ceph-ansible]# chown ansible:ansible  /var/log/ansible
    [root@admin ceph-ansible]# chmod 755 /var/log/ansible
    1. Edit the /usr/share/ceph-ansible/ansible.cfg file, updating the log_path value as follows:

      log_path = /var/log/ansible/ansible.log
  11. Run the playbook:

    [user@admin ceph-ansible]$ ansible-playbook rolling_update.yml

    To use the playbook only for a particular group of nodes on the Ansible inventory file, use the --limit option. For details, see Section 3.7, “Understanding the limit option”.

  12. From the RBD mirroring daemon node, upgrade rbd-mirror manually:

    # yum upgrade rbd-mirror

    Restart the daemon:

    # systemctl restart  ceph-rbd-mirror@<client-id>
  13. Verify that the cluster health is OK.

    [root@monitor ~]# ceph -s