Chapter 3. Upgrade a Red Hat Ceph Storage cluster using cephadm

As a storage administrator, you can use the cephadm Orchestrator to upgrade Red Hat Ceph Storage 5.0 and later.

The automated upgrade process follows Ceph best practices. For example:

  • The upgrade order starts with Ceph Managers, Ceph Monitors, then other daemons.
  • Each daemon is restarted only after Ceph indicates that the cluster will remain available.

The storage cluster health status is likely to switch to HEALTH_WARNING during the upgrade. When the upgrade is complete, the health status should switch back to HEALTH_OK.

Upgrading directly from Red Hat Ceph Storage 5 to Red Hat Ceph Storage 7 is supported.

Warning

Upgrading to Red Hat Ceph Storage 5.2 on Ceph Object Gateway storage clusters (single-site or multi-site) is supported but you must set the ceph config set mgr mgr/cephadm/no_five_one_rgw true --force option prior to upgrading your storage cluster.

For more information, see the knowledge base article Support Restrictions for upgrades for RADOS Gateway (RGW) on Red Hat Red Hat Ceph Storage 5.2 and Known issues section in the Red Hat Ceph Storage 5.2 Release Notes.

Note

ceph-ansible is currently not supported with Red Hat Ceph Storage 5. This means that once you have migrated your storage cluster to Red Hat Ceph Storage 5, you must use cephadm and cephadm-ansible to perform subsequent updates.

Important

Red Hat Enterprise Linux 9 and later does not support the cephadm-ansible playbook.

Note

You do not get a message once the upgrade is successful. Run ceph versions and ceph orch ps commands to verify the new image ID and the version of the storage cluster.

3.1. Upgrading the Red Hat Ceph Storage cluster

You can use ceph orch upgrade command for upgrading a Red Hat Ceph Storage 5.0 cluster.

Prerequisites

  • A running Red Hat Ceph Storage cluster 5.
  • Red Hat Enterprise Linux 8.4 EUS or later.
  • Root-level access to all the nodes.
  • Ansible user with sudo and passwordless ssh access to all nodes in the storage cluster.
  • At least two Ceph Manager nodes in the storage cluster: one active and one standby.
Note

Red Hat Ceph Storage 5 also includes a health check function that returns a DAEMON_OLD_VERSION warning if it detects that any of the daemons in the storage cluster are running multiple versions of RHCS. The warning is triggered when the daemons continue to run multiple versions of Red Hat Ceph Storage beyond the time value set in the mon_warn_older_version_delay option. By default, the mon_warn_older_version_delay option is set to 1 week. This setting allows most upgrades to proceed without falsely seeing the warning. If the upgrade process is paused for an extended time period, you can mute the health warning:

ceph health mute DAEMON_OLD_VERSION --sticky

After the upgrade has finished, unmute the health warning:

ceph health unmute DAEMON_OLD_VERSION

Procedure

  1. Update the cephadm and cephadm-ansible package:

    Example

    [root@admin ~]# dnf update cephadm
    [root@admin ~]# dnf update cephadm-ansible

  2. Navigate to the /usr/share/cephadm-ansible/ directory:

    Example

    [root@admin ~]# cd /usr/share/cephadm-ansible

  3. If the buckets are created or have the num_shards = 0, manually reshard the buckets, before planning an upgrade to Red Hat Ceph Storage 5.3:

    Syntax

    radosgw-admin bucket reshard --num-shards 11 --bucket BUCKET_NAME

    Example

    [ceph: root@host01 /]# radosgw-admin bucket reshard --num-shards 11 --bucket mybucket

  4. Run the preflight playbook with the upgrade_ceph_packages parameter set to true on the bootstrapped host in the storage cluster:

    Syntax

    ansible-playbook -i INVENTORY_FILE cephadm-preflight.yml --extra-vars "ceph_origin=rhcs upgrade_ceph_packages=true"

    Example

    [ceph-admin@admin cephdm-ansible]$ ansible-playbook -i /etc/ansible/hosts cephadm-preflight.yml --extra-vars "ceph_origin=rhcs upgrade_ceph_packages=true"

    This package upgrades cephadm on all the nodes.

  5. Log into the cephadm shell:

    Example

    [root@host01 ~]# cephadm shell

  6. Ensure all the hosts are online and that the storage cluster is healthy:

    Example

    [ceph: root@host01 /]# ceph -s

  7. Set the OSD noout, noscrub, and nodeep-scrub flags to prevent OSDs from getting marked out during upgrade and to avoid unnecessary load on the cluster:

    Example

    [ceph: root@host01 /]# ceph osd set noout
    [ceph: root@host01 /]# ceph osd set noscrub
    [ceph: root@host01 /]# ceph osd set nodeep-scrub

  8. Check service versions and the available target containers:

    Syntax

    ceph orch upgrade check IMAGE_NAME

    Example

    [ceph: root@host01 /]# ceph orch upgrade check registry.redhat.io/rhceph/rhceph-5-rhel8:latest

    Note

    The image name is applicable for both Red Hat Enterprise Linux 8 and Red Hat Enterprise Linux 9.

  9. Upgrade the storage cluster:

    Syntax

    ceph orch upgrade start IMAGE_NAME

    Example

    [ceph: root@host01 /]# ceph orch upgrade start registry.redhat.io/rhceph/rhceph-5-rhel8:latest

    Note

    To perform a staggered upgrade, see Performing a staggered upgrade.

    While the upgrade is underway, a progress bar appears in the ceph status output.

    Example

    [ceph: root@host01 /]# ceph status
    [...]
    progress:
        Upgrade to 16.2.0-146.el8cp (1s)
          [............................]

  10. Verify the new IMAGE_ID and VERSION of the Ceph cluster:

    Example

    [ceph: root@host01 /]# ceph versions
    [ceph: root@host01 /]# ceph orch ps

    Note

    If you are not using the cephadm-ansible playbooks, after upgrading your Ceph cluster, you must upgrade the ceph-common package and client libraries on your client nodes.

    Example

    [root@client01 ~] dnf update ceph-common

    Verify you have the latest version:

    Example

    [root@client01 ~] ceph --version

  11. When the upgrade is complete, unset the noout, noscrub, and nodeep-scrub flags:

    Example

    [ceph: root@host01 /]# ceph osd unset noout
    [ceph: root@host01 /]# ceph osd unset noscrub
    [ceph: root@host01 /]# ceph osd unset nodeep-scrub

3.2. Upgrading the Red Hat Ceph Storage cluster in a disconnected environment

You can upgrade the storage cluster in a disconnected environment by using the --image tag.

You can use ceph orch upgrade command for upgrading a Red Hat Ceph Storage 5 cluster.

Important

Red Hat Enterprise Linux 9 and later does not support the cephadm-ansible playbook.

Prerequisites

  • A running Red Hat Ceph Storage cluster 5.
  • Red Hat Enterprise Linux 8.4 EUS or later.
  • Root-level access to all the nodes.
  • Ansible user with sudo and passwordless ssh access to all nodes in the storage cluster.
  • At least two Ceph Manager nodes in the storage cluster: one active and one standby.
  • Register the nodes to CDN and attach subscriptions.
  • Check for the customer container images in a disconnected environment and change the configuration, if required. See the Changing configurations of custom container images for disconnected installations section in the Red Hat Ceph Storage Installation Guide for more details.

By default, the monitoring stack components are deployed based on the primary Ceph image. For disconnected environment of the storage cluster, you have to use the latest available monitoring stack component images.

Table 3.1. Custom image details for monitoring stack

Monitoring stack componentRed Hat Ceph Storage versionImage details

Prometheus

Red Hat Ceph Storage 5.0 and 5.1

registry.redhat.io/openshift4/ose-prometheus:v4.6

 

Red Hat Ceph Storage 5.2 onwards

registry.redhat.io/openshift4/ose-prometheus:v4.10

Grafana

All Red Hat Ceph Storage 5 versions

registry.redhat.io/rhceph/rhceph-5-dashboard-rhel8:latest

Node-exporter

Red Hat Ceph Storage 5.0

registry.redhat.io/openshift4/ose-prometheus-node-exporter:v4.5

 

Red Hat Ceph Storage 5.0z1 and 5.1

registry.redhat.io/openshift4/ose-prometheus-node-exporter:v4.6

 

Red Hat Ceph Storage 5.2 onwards

registry.redhat.io/openshift4/ose-prometheus-node-exporter:v4.10

AlertManager

Red Hat Ceph Storage 5.0

registry.redhat.io/openshift4/ose-prometheus-alertmanager:v4.5

 

Red Hat Ceph Storage 5.0z1 and 5.1

registry.redhat.io/openshift4/ose-prometheus-alertmanager:v4.6

 

Red Hat Ceph Storage 5.2 onwards

registry.redhat.io/openshift4/ose-prometheus-alertmanager:v4.10

HAProxy

Red Hat Ceph Storage 5.1 onwards

registry.redhat.io/rhceph/rhceph-haproxy-rhel8:latest

Keepalived

Red Hat Ceph Storage 5.1 onwards

registry.redhat.io/rhceph/keepalived-rhel8:latest

SNMP Gateway

Red Hat Ceph Storage 5.0 onwards

registry.redhat.io/rhceph/snmp-notifier-rhel8:latest

Procedure

  1. Update the cephadm and cephadm-ansible package.

    Example

    [root@admin ~]# dnf update cephadm
    [root@admin ~]# dnf update cephadm-ansible

  2. Run the preflight playbook with the upgrade_ceph_packages parameter set to true and the ceph_origin parameter set to custom on the bootstrapped host in the storage cluster:

    Syntax

    ansible-playbook -i INVENTORY_FILE cephadm-preflight.yml --extra-vars "ceph_origin=custom upgrade_ceph_packages=true"

    Example

    [ceph-admin@admin ~]$ ansible-playbook -i /etc/ansible/hosts cephadm-preflight.yml --extra-vars "ceph_origin=custom upgrade_ceph_packages=true"

    This package upgrades cephadm on all the nodes.

  3. Log into the cephadm shell:

    Example

    [root@host01 ~]# cephadm shell

  4. Ensure all the hosts are online and that the storage cluster is healthy:

    Example

    [ceph: root@host01 /]# ceph -s

  5. Set the OSD noout, noscrub, and nodeep-scrub flags to prevent OSDs from getting marked out during upgrade and to avoid unnecessary load on the cluster:

    Example

    [ceph: root@host01 /]# ceph osd set noout
    [ceph: root@host01 /]# ceph osd set noscrub
    [ceph: root@host01 /]# ceph osd set nodeep-scrub

  6. Check service versions and the available target containers:

    Syntax

    ceph orch upgrade check IMAGE_NAME

    Example

    [ceph: root@host01 /]# ceph orch upgrade check LOCAL_NODE_FQDN:5000/rhceph/rhceph-5-rhel8

  7. Upgrade the storage cluster:

    Syntax

    ceph orch upgrade start IMAGE_NAME

    Example

    [ceph: root@host01 /]# ceph orch upgrade start LOCAL_NODE_FQDN:5000/rhceph/rhceph-5-rhel8

    While the upgrade is underway, a progress bar appears in the ceph status output.

    Example

    [ceph: root@host01 /]# ceph status
    [...]
    progress:
        Upgrade to 16.2.0-115.el8cp (1s)
          [............................]

  8. Verify the new IMAGE_ID and VERSION of the Ceph cluster:

    Example

    [ceph: root@host01 /]# ceph version
    [ceph: root@host01 /]# ceph versions
    [ceph: root@host01 /]# ceph orch ps

  9. When the upgrade is complete, unset the noout, noscrub, and nodeep-scrub flags:

    Example

    [ceph: root@host01 /]# ceph osd unset noout
    [ceph: root@host01 /]# ceph osd unset noscrub
    [ceph: root@host01 /]# ceph osd unset nodeep-scrub

Additional Resources

3.3. Staggered upgrade

As a storage administrator, you can upgrade Red Hat Ceph Storage components in phases rather than all at once. Starting with Red Hat Ceph Storage 5.2, the ceph orch upgrade command enables you to specify options to limit which daemons are upgraded by a single upgrade command.

Note

If you want to upgrade from a version that does not support staggered upgrades, you must first manually upgrade the Ceph Manager (ceph-mgr) daemons. For more information on performing a staggered upgrade from previous releases, see Performing a staggered upgrade from previous releases.

3.3.1. Staggered upgrade options

Starting with Red Hat Ceph Storage 5.2, the ceph orch upgrade command supports several options to upgrade cluster components in phases. The staggered upgrade options include:

  • --daemon_types: The --daemon_types option takes a comma-separated list of daemon types and will only upgrade daemons of those types. Valid daemon types for this option include mgr, mon, crash, osd, mds, rgw, rbd-mirror, cephfs-mirror, iscsi, and nfs.
  • --services: The --services option is mutually exclusive with --daemon-types, only takes services of one type at a time, and will only upgrade daemons belonging to those services. For example, you cannot provide an OSD and RGW service simultaneously.
  • --hosts: You can combine the --hosts option with --daemon_types, --services, or use it on its own. The --hosts option parameter follows the same format as the command line options for orchestrator CLI placement specification.
  • --limit: The --limit option takes an integer greater than zero and provides a numerical limit on the number of daemons cephadm will upgrade. You can combine the --limit option with --daemon_types, --services, or --hosts. For example, if you specify to upgrade daemons of type osd on host01 with a limit set to 3, cephadm will upgrade up to three OSD daemons on host01.

3.3.2. Performing a staggered upgrade

As a storage administrator, you can use the ceph orch upgrade options to limit which daemons are upgraded by a single upgrade command.

Cephadm strictly enforces an order for the upgrade of daemons that is still present in staggered upgrade scenarios. The current upgrade order is:

  • Ceph Manager nodes
  • Ceph Monitor nodes
  • Ceph-crash daemons
  • Ceph OSD nodes
  • Ceph Metadata Server (MDS) nodes
  • Ceph Object Gateway (RGW) nodes
  • Ceph RBD-mirror node
  • CephFS-mirror node
  • Ceph iSCSI gateway node
  • Ceph NFS nodes
Note

If you specify parameters that upgrade daemons out of order, the upgrade command blocks and notes which daemons you need to upgrade before you proceed.

Example

[ceph: root@host01 /]# ceph orch upgrade start --image  registry.redhat.io/rhceph/rhceph-5-rhel8:latest --hosts host02

Error EINVAL: Cannot start upgrade. Daemons with types earlier in upgrade order than daemons on given host need upgrading.
Please first upgrade mon.ceph-host01
NOTE: Enforced upgrade order is: mgr -> mon -> crash -> osd -> mds -> rgw -> rbd-mirror -> cephfs-mirror -> iscsi -> nfs

Prerequisites

  • A cluster running Red Hat Ceph Storage 5.2 or later.
  • Root-level access to all the nodes.
  • At least two Ceph Manager nodes in the storage cluster: one active and one standby.

Procedure

  1. Log into the cephadm shell:

    Example

    [root@host01 ~]# cephadm shell

  2. Ensure all the hosts are online and that the storage cluster is healthy:

    Example

    [ceph: root@host01 /]# ceph -s

  3. Set the OSD noout, noscrub, and nodeep-scrub flags to prevent OSDs from getting marked out during upgrade and to avoid unnecessary load on the cluster:

    Example

    [ceph: root@host01 /]# ceph osd set noout
    [ceph: root@host01 /]# ceph osd set noscrub
    [ceph: root@host01 /]# ceph osd set nodeep-scrub

  4. Check service versions and the available target containers:

    Syntax

    ceph orch upgrade check IMAGE_NAME

    Example

    [ceph: root@host01 /]# ceph orch upgrade check registry.redhat.io/rhceph/rhceph-5-rhel8:latest

  5. Upgrade the storage cluster:

    1. To upgrade specific daemon types on specific hosts:

      Syntax

      ceph orch upgrade start --image IMAGE_NAME --daemon-types DAEMON_TYPE1,DAEMON_TYPE2 --hosts HOST1,HOST2

      Example

      [ceph: root@host01 /]# ceph orch upgrade start --image registry.redhat.io/rhceph/rhceph-5-rhel8:latest --daemon-types mgr,mon --hosts host02,host03

    2. To specify specific services and limit the number of daemons to upgrade:

      Syntax

      ceph orch upgrade start --image IMAGE_NAME --services SERVICE1,SERVICE2 --limit LIMIT_NUMBER

      Example

      [ceph: root@host01 /]# ceph orch upgrade start --image registry.redhat.io/rhceph/rhceph-5-rhel8:latest --services rgw.example1,rgw1.example2 --limit 2

      Note

      In staggered upgrade scenarios, if using a limiting parameter, the monitoring stack daemons, including Prometheus and node-exporter, are refreshed after the upgrade of the Ceph Manager daemons. As a result of the limiting parameter, Ceph Manager upgrades take longer to complete. The versions of monitoring stack daemons might not change between Ceph releases, in which case, they are only redeployed.

      Note

      Upgrade commands with limiting parameters validates the options before beginning the upgrade, which can require pulling the new container image. As a result, the upgrade start command might take a while to return when you provide limiting parameters.

  6. To see which daemons you still need to upgrade, run the ceph orch upgrade check or ceph versions command:

    Example

    [ceph: root@host01 /]# ceph orch upgrade check --image registry.redhat.io/rhceph/rhceph-5-rhel8:latest

  7. To complete the staggered upgrade, verify the upgrade of all remaining services:

    Syntax

    ceph orch upgrade start --image IMAGE_NAME

    Example

    [ceph: root@host01 /]# ceph orch upgrade start --image registry.redhat.io/rhceph/rhceph-5-rhel8:latest

  8. When the upgrade is complete, unset the noout, noscrub, and nodeep-scrub flags:

    Example

    [ceph: root@host01 /]# ceph osd unset noout
    [ceph: root@host01 /]# ceph osd unset noscrub
    [ceph: root@host01 /]# ceph osd unset nodeep-scrub

Verification

  • Verify the new IMAGE_ID and VERSION of the Ceph cluster:

    Example

    [ceph: root@host01 /]# ceph versions
    [ceph: root@host01 /]# ceph orch ps

Additional Resources

3.3.3. Performing a staggered upgrade from previous releases

Starting with Red Hat Ceph Storage 5.2, you can perform a staggered upgrade on your storage cluster by providing the necessary arguments. If you want to upgrade from a version that does not support staggered upgrades, you must first manually upgrade the Ceph Manager (ceph-mgr) daemons. Once you have upgraded the Ceph Manager daemons, you can pass the limiting parameters to complete the staggered upgrade.

Important

Verify you have at least two running Ceph Manager daemons before attempting this procedure.

Prerequisites

  • A cluster running Red Hat Ceph Storage 5.0 or later.
  • At least two Ceph Manager nodes in the storage cluster: one active and one standby.

Procedure

  1. Log into the Cephadm shell:

    Example

    [root@host01 ~]# cephadm shell

  2. Determine which Ceph Manager is active and which are standby:

    Example

    [ceph: root@host01 /]# ceph -s
      cluster:
        id:     266ee7a8-2a05-11eb-b846-5254002d4916
        health: HEALTH_OK
    
    
      services:
        mon: 2 daemons, quorum host01,host02 (age 92s)
        mgr: host01.ndtpjh(active, since 16h), standbys: host02.pzgrhz

  3. Manually upgrade each standby Ceph Manager daemon:

    Syntax

    ceph orch daemon redeploy mgr.ceph-HOST.MANAGER_ID --image IMAGE_ID

    Example

    [ceph: root@host01 /]# ceph orch daemon redeploy mgr.ceph-host02.pzgrhz --image registry.redhat.io/rhceph/rhceph-5-rhel8:latest

  4. Fail over to the upgraded standby Ceph Manager:

    Example

    [ceph: root@host01 /]# ceph mgr fail

  5. Check that the standby Ceph Manager is now active:

    Example

    [ceph: root@host01 /]# ceph -s
      cluster:
        id:     266ee7a8-2a05-11eb-b846-5254002d4916
        health: HEALTH_OK
    
    
      services:
        mon: 2 daemons, quorum host01,host02 (age 1h)
        mgr: host02.pzgrhz(active, since 25s), standbys: host01.ndtpjh

  6. Verify that the active Ceph Manager is upgraded to the new version:

    Syntax

    ceph tell mgr.ceph-HOST.MANAGER_ID version

    Example

    [ceph: root@host01 /]# ceph tell mgr.host02.pzgrhz version
    {
        "version": "16.2.8-12.el8cp",
        "release": "pacific",
        "release_type": "stable"
    }

  7. Repeat steps 2 - 6 to upgrade the remaining Ceph Managers to the new version.
  8. Check that all Ceph Managers are upgraded to the new version:

    Example

    [ceph: root@host01 /]# ceph mgr versions
    {
        "ceph version 16.2.8-12.el8cp (600e227816517e2da53d85f2fab3cd40a7483372) pacific (stable)": 2
    }

  9. Once you upgrade all your Ceph Managers, you can specify the limiting parameters and complete the remainder of the staggered upgrade.

Additional Resources

3.4. Monitoring and managing upgrade of the storage cluster

After running the ceph orch upgrade start command to upgrade the Red Hat Ceph Storage cluster, you can check the status, pause, resume, or stop the upgrade process. The health of the cluster changes to HEALTH_WARNING during an upgrade. If the host of the cluster is offline, the upgrade is paused.

Note

You have to upgrade one daemon type after the other. If a daemon cannot be upgraded, the upgrade is paused.

Prerequisites

  • A running Red Hat Ceph Storage cluster 5.
  • Root-level access to all the nodes.
  • At least two Ceph Manager nodes in the storage cluster: one active and one standby.
  • Upgrade for the storage cluster initiated.

Procedure

  1. Determine whether an upgrade is in process and the version to which the cluster is upgrading:

    Example

    [ceph: root@node0 /]# ceph orch upgrade status

    Note

    You do not get a message once the upgrade is successful. Run ceph versions and ceph orch ps commands to verify the new image ID and the version of the storage cluster.

  2. Optional: Pause the upgrade process:

    Example

    [ceph: root@node0 /]# ceph orch upgrade pause

  3. Optional: Resume a paused upgrade process:

    Example

    [ceph: root@node0 /]# ceph orch upgrade resume

  4. Optional: Stop the upgrade process:

    Example

    [ceph: root@node0 /]# ceph orch upgrade stop

3.5. Troubleshooting upgrade error messages

The following table shows some cephadm upgrade error messages. If the cephadm upgrade fails for any reason, an error message appears in the storage cluster health status.

Error MessageDescription

UPGRADE_NO_STANDBY_MGR

Ceph requires both active and standby manager daemons to proceed, but there is currently no standby.

UPGRADE_FAILED_PULL

Ceph was unable to pull the container image for the target version. This can happen if you specify a version or container image that does not exist (e.g., 1.2.3), or if the container registry is not reachable from one or more hosts in the cluster.