Chapter 6. Management of OSDs using the Ceph Orchestrator

As a storage administrator, you can use the Ceph Orchestrators to manage OSDs of a Red Hat Ceph Storage cluster.

6.1. Ceph OSDs

When a Red Hat Ceph Storage cluster is up and running, you can add OSDs to the storage cluster at runtime.

A Ceph OSD generally consists of one ceph-osd daemon for one storage drive and its associated journal within a node. If a node has multiple storage drives, then map one ceph-osd daemon for each drive.

Red Hat recommends checking the capacity of a cluster regularly to see if it is reaching the upper end of its storage capacity. As a storage cluster reaches its near full ratio, add one or more OSDs to expand the storage cluster’s capacity.

When you want to reduce the size of a Red Hat Ceph Storage cluster or replace the hardware, you can also remove an OSD at runtime. If the node has multiple storage drives, you might also need to remove one of the ceph-osd daemon for that drive. Generally, it’s a good idea to check the capacity of the storage cluster to see if you are reaching the upper end of its capacity. Ensure that when you remove an OSD that the storage cluster is not at its near full ratio.

Important

Do not let a storage cluster reach the full ratio before adding an OSD. OSD failures that occur after the storage cluster reaches the near full ratio can cause the storage cluster to exceed the full ratio. Ceph blocks write access to protect the data until you resolve the storage capacity issues. Do not remove OSDs without considering the impact on the full ratio first.

6.2. Ceph OSD node configuration

Configure Ceph OSDs and their supporting hardware similarly as a storage strategy for the pool(s) that will use the OSDs. Ceph prefers uniform hardware across pools for a consistent performance profile. For best performance, consider a CRUSH hierarchy with drives of the same type or size.

If you add drives of dissimilar size, adjust their weights accordingly. When you add the OSD to the CRUSH map, consider the weight for the new OSD. Hard drive capacity grows approximately 40% per year, so newer OSD nodes might have larger hard drives than older nodes in the storage cluster, that is, they might have a greater weight.

Before doing a new installation, review the Requirements for Installing Red Hat Ceph Storage chapter in the Installation Guide.

6.3. Automatically tuning OSD memory

The OSD daemons adjust the memory consumption based on the osd_memory_target configuration option. The option osd_memory_target sets OSD memory based upon the available RAM in the system.

If Red Hat Ceph Storage is deployed on dedicated nodes that do not share memory with other services, cephadm automatically adjusts the per-OSD consumption based on the total amount of RAM and the number of deployed OSDs.

Important

By default, the osd_memory_target_autotune parameter is set to true in Red Hat Ceph Storage 6.0.

Syntax

ceph config set osd osd_memory_target_autotune true

Once the storage cluster is upgraded to Red Hat Ceph Storage 6.0, for cluster maintenance such as addition of OSDs or replacement of OSDs, Red Hat recommends setting osd_memory_target_autotune parameter to true to autotune osd memory as per system memory.

Cephadm starts with a fraction mgr/cephadm/autotune_memory_target_ratio, which defaults to 0.7 of the total RAM in the system, subtract off any memory consumed by non-autotuned daemons such as non-OSDS and for OSDs for which osd_memory_target_autotune is false, and then divide by the remaining OSDs.

The osd_memory_target parameter is calculated as follows:

Syntax

osd_memory_target = TOTAL_RAM_OF_THE_OSD * (1048576) * (autotune_memory_target_ratio) / NUMBER_OF_OSDS_IN_THE_OSD_NODE - (SPACE_ALLOCATED_FOR_OTHER_DAEMONS)

SPACE_ALLOCATED_FOR_OTHER_DAEMONS may optionally include the following daemon space allocations:

  • Alertmanager: 1 GB
  • Grafana: 1 GB
  • Ceph Manager: 4 GB
  • Ceph Monitor: 2 GB
  • Node-exporter: 1 GB
  • Prometheus: 1 GB

For example, if a node has 24 OSDs and has 251 GB RAM space, then osd_memory_target is 7860684936.

The final targets are reflected in the configuration database with options. You can view the limits and the current memory consumed by each daemon from the ceph orch ps output under MEM LIMIT column.

Note

In Red Hat Ceph Storage 6.0, the default setting of osd_memory_target_autotune true is unsuitable for hyperconverged infrastructures where compute and Ceph storage services are colocated. In a hyperconverged infrastructure, the autotune_memory_target_ratio can be set to 0.2 to reduce the memory consumption of Ceph.

Example

[ceph: root@host01 /]# ceph config set mgr mgr/cephadm/autotune_memory_target_ratio 0.2

You can manually set a specific memory target for an OSD in the storage cluster.

Example

[ceph: root@host01 /]# ceph config set osd.123 osd_memory_target 7860684936

You can manually set a specific memory target for an OSD host in the storage cluster.

Syntax

ceph config set osd/host:HOSTNAME osd_memory_target TARGET_BYTES

Example

[ceph: root@host01 /]# ceph config set osd/host:host01 osd_memory_target 1000000000

Note

Enabling osd_memory_target_autotune overwrites existing manual OSD memory target settings. To prevent daemon memory from being tuned even when the osd_memory_target_autotune option or other similar options are enabled, set the _no_autotune_memory label on the host.

Syntax

ceph orch host label add HOSTNAME _no_autotune_memory

You can exclude an OSD from memory autotuning by disabling the autotune option and setting a specific memory target.

Example

[ceph: root@host01 /]# ceph config set osd.123 osd_memory_target_autotune false
[ceph: root@host01 /]# ceph config set osd.123 osd_memory_target 16G

6.4. Listing devices for Ceph OSD deployment

You can check the list of available devices before deploying OSDs using the Ceph Orchestrator. The commands are used to print a list of devices discoverable by Cephadm. A storage device is considered available if all of the following conditions are met:

  • The device must have no partitions.
  • The device must not have any LVM state.
  • The device must not be mounted.
  • The device must not contain a file system.
  • The device must not contain a Ceph BlueStore OSD.
  • The device must be larger than 5 GB.
Note

Ceph will not provision an OSD on a device that is not available.

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • Hosts are added to the cluster.
  • All manager and monitor daemons are deployed.

Procedure

  1. Log into the Cephadm shell:

    Example

    [root@host01 ~]# cephadm shell

  2. List the available devices to deploy OSDs:

    Syntax

    ceph orch device ls [--hostname=HOSTNAME_1 HOSTNAME_2] [--wide] [--refresh]

    Example

    [ceph: root@host01 /]# ceph orch device ls --wide --refresh

    Using the --wide option provides all details relating to the device, including any reasons that the device might not be eligible for use as an OSD. This option does not support NVMe devices.

  3. Optional: To enable Health, Ident, and Fault fields in the output of ceph orch device ls, run the following commands:

    Note

    These fields are supported by libstoragemgmt library and currently supports SCSI, SAS, and SATA devices.

    1. As root user outside the Cephadm shell, check your hardware’s compatibility with libstoragemgmt library to avoid unplanned interruption to services:

      Example

      [root@host01 ~]# cephadm shell lsmcli ldl

      In the output, you see the Health Status as Good with the respective SCSI VPD 0x83 ID.

      Note

      If you do not get this information, then enabling the fields might cause erratic behavior of devices.

    2. Log back into the Cephadm shell and enable libstoragemgmt support:

      Example

      [root@host01 ~]# cephadm shell
      [ceph: root@host01 /]# ceph config set mgr mgr/cephadm/device_enhanced_scan true

      Once this is enabled, ceph orch device ls gives the output of Health field as Good.

Verification

  • List the devices:

    Example

    [ceph: root@host01 /]# ceph orch device ls

6.5. Zapping devices for Ceph OSD deployment

You need to check the list of available devices before deploying OSDs. If there is no space available on the devices, you can clear the data on the devices by zapping them.

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • Hosts are added to the cluster.
  • All manager and monitor daemons are deployed.

Procedure

  1. Log into the Cephadm shell:

    Example

    [root@host01 ~]# cephadm shell

  2. List the available devices to deploy OSDs:

    Syntax

    ceph orch device ls [--hostname=HOSTNAME_1 HOSTNAME_2] [--wide] [--refresh]

    Example

    [ceph: root@host01 /]# ceph orch device ls --wide --refresh

  3. Clear the data of a device:

    Syntax

    ceph orch device zap HOSTNAME FILE_PATH --force

    Example

    [ceph: root@host01 /]# ceph orch device zap host02 /dev/sdb --force

Verification

  • Verify the space is available on the device:

    Example

    [ceph: root@host01 /]# ceph orch device ls

    You will see that the field under Available is Yes.

Additional Resources

6.6. Deploying Ceph OSDs on all available devices

You can deploy all OSDS on all the available devices. Cephadm allows the Ceph Orchestrator to discover and deploy the OSDs on any available and unused storage device.

To deploy OSDs all available devices, run the command without the unmanaged parameter and then re-run the command with the parameter to prevent from creating future OSDs.

Note

The deployment of OSDs with --all-available-devices is generally used for smaller clusters. For larger clusters, use the OSD specification file.

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • Hosts are added to the cluster.
  • All manager and monitor daemons are deployed.

Procedure

  1. Log into the Cephadm shell:

    Example

    [root@host01 ~]# cephadm shell

  2. List the available devices to deploy OSDs:

    Syntax

    ceph orch device ls [--hostname=HOSTNAME_1 HOSTNAME_2] [--wide] [--refresh]

    Example

    [ceph: root@host01 /]# ceph orch device ls --wide --refresh

  3. Deploy OSDs on all available devices:

    Example

    [ceph: root@host01 /]# ceph orch apply osd --all-available-devices

    The effect of ceph orch apply is persistent which means that the Orchestrator automatically finds the device, adds it to the cluster, and creates new OSDs. This occurs under the following conditions:

    • New disks or drives are added to the system.
    • Existing disks or drives are zapped.
    • An OSD is removed and the devices are zapped.

      You can disable automatic creation of OSDs on all the available devices by using the --unmanaged parameter.

      Example

      [ceph: root@host01 /]# ceph orch apply osd --all-available-devices --unmanaged=true

      Setting the parameter --unmanaged to true disables the creation of OSDs and also there is no change if you apply a new OSD service.

      Note

      The command ceph orch daemon add creates new OSDs, but does not add an OSD service.

Verification

  • List the service:

    Example

    [ceph: root@host01 /]# ceph orch ls

  • View the details of the node and devices:

    Example

    [ceph: root@host01 /]# ceph osd tree

Additional Resources

6.7. Deploying Ceph OSDs on specific devices and hosts

You can deploy all the Ceph OSDs on specific devices and hosts using the Ceph Orchestrator.

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • Hosts are added to the cluster.
  • All manager and monitor daemons are deployed.

Procedure

  1. Log into the Cephadm shell:

    Example

    [root@host01 ~]# cephadm shell

  2. List the available devices to deploy OSDs:

    Syntax

    ceph orch device ls [--hostname=HOSTNAME_1 HOSTNAME_2] [--wide] [--refresh]

    Example

    [ceph: root@host01 /]# ceph orch device ls --wide --refresh

  3. Deploy OSDs on specific devices and hosts:

    Syntax

    ceph orch daemon add osd HOSTNAME:DEVICE_PATH

    Example

    [ceph: root@host01 /]# ceph orch daemon add osd host02:/dev/sdb

    To deploy ODSs on a raw physical device, without an LVM layer, use the --method raw option.

    Syntax

    ceph orch daemon add osd --method raw HOSTNAME:DEVICE_PATH

    Example

    [ceph: root@host01 /]# ceph orch daemon add osd --method raw host02:/dev/sdb

    Note

    If you have separate DB or WAL devices, the ratio of block to DB or WAL devices MUST be 1:1.

Verification

  • List the service:

    Example

    [ceph: root@host01 /]# ceph orch ls osd

  • View the details of the node and devices:

    Example

    [ceph: root@host01 /]# ceph osd tree

  • List the hosts, daemons, and processes:

    Syntax

    ceph orch ps --service_name=SERVICE_NAME

    Example

    [ceph: root@host01 /]# ceph orch ps --service_name=osd

Additional Resources

6.8. Advanced service specifications and filters for deploying OSDs

Service Specification of type OSD is a way to describe a cluster layout using the properties of disks. It gives the user an abstract way to tell Ceph which disks should turn into an OSD with the required configuration without knowing the specifics of device names and paths. For each device and each host, define a yaml file or a json file.

General settings for OSD specifications

  • service_type: 'osd': This is mandatory to create OSDS
  • service_id: Use the service name or identification you prefer. A set of OSDs is created using the specification file. This name is used to manage all the OSDs together and represent an Orchestrator service.
  • placement: This is used to define the hosts on which the OSDs need to be deployed.

    You can use on the following options:

    • host_pattern: '*' - A host name pattern used to select hosts.
    • label: 'osd_host' - A label used in the hosts where OSD need to be deployed.
    • hosts: 'host01', 'host02' - An explicit list of host names where OSDs needs to be deployed.
  • selection of devices: The devices where OSDs are created. This allows us to separate an OSD from different devices. You can create only BlueStore OSDs which have three components:

    • OSD data: contains all the OSD data
    • WAL: BlueStore internal journal or write-ahead Log
    • DB: BlueStore internal metadata
  • data_devices: Define the devices to deploy OSD. In this case, OSDs are created in a collocated schema. You can use filters to select devices and folders.
  • wal_devices: Define the devices used for WAL OSDs. You can use filters to select devices and folders.
  • db_devices: Define the devices for DB OSDs. You can use the filters to select devices and folders.
  • encrypted: An optional parameter to encrypt information on the OSD which can set to either True or False
  • unmanaged: An optional parameter, set to False by default. You can set it to True if you do not want the Orchestrator to manage the OSD service.
  • block_wal_size: User-defined value, in bytes.
  • block_db_size: User-defined value, in bytes.
  • osds_per_device: User-defined value for deploying more than one OSD per device.
  • method: An optional parameter to specify if an OSD is created with an LVM layer or not. Set to raw if you want to create OSDs on raw physical devices that do not include an LVM layer. If you have separate DB or WAL devices, the ratio of block to DB or WAL devices MUST be 1:1.

Filters for specifying devices

Filters are used in conjunction with the data_devices, wal_devices and db_devices parameters.

Name of the filter

Description

Syntax

Example

Model

Target specific disks. You can get details of the model by running lsblk -o NAME,FSTYPE,LABEL,MOUNTPOINT,SIZE,MODEL command or smartctl -i /DEVIVE_PATH

Model: DISK_MODEL_NAME

Model: MC-55-44-XZ

Vendor

Target specific disks

Vendor: DISK_VENDOR_NAME

Vendor: Vendor Cs

Size Specification

Includes disks of an exact size

size: EXACT

size: '10G'

Size Specification

Includes disks size of which is within the range

size: LOW:HIGH

size: '10G:40G'

Size Specification

Includes disks less than or equal to in size

size: :HIGH

size: ':10G'

Size Specification

Includes disks equal to or greater than in size

size: LOW:

size: '40G:'

Rotational

Rotational attribute of the disk. 1 matches all disks that are rotational and 0 matches all the disks that are non-rotational. If rotational =0, then OSD is configured with SSD or NVME. If rotational=1 then the OSD is configured with HDD.

rotational: 0 or 1

rotational: 0

All

Considers all the available disks

all: true

all: true

Limiter

When you have specified valid filters but want to limit the amount of matching disks you can use the ‘limit’ directive. It should be used only as a last resort.

limit: NUMBER

limit: 2

Note

To create an OSD with non-collocated components in the same host, you have to specify the different types of devices used and the devices should be on the same host.

Note

The devices used for deploying OSDs must be supported by libstoragemgmt.

Additional Resources

6.9. Deploying Ceph OSDs using advanced service specifications

The service specification of type OSD is a way to describe a cluster layout using the properties of disks. It gives the user an abstract way to tell Ceph which disks should turn into an OSD with the required configuration without knowing the specifics of device names and paths.

You can deploy the OSD for each device and each host by defining a yaml file or a json file.

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • Hosts are added to the cluster.
  • All manager and monitor daemons are deployed.

Procedure

  1. On the monitor node, create the osd_spec.yaml file:

    Example

    [root@host01 ~]# touch osd_spec.yaml

  2. Edit the osd_spec.yaml file to include the following details:

    Syntax

    service_type: osd
    service_id: SERVICE_ID
    placement:
      host_pattern: '*' # optional
    data_devices: # optional
      model: DISK_MODEL_NAME # optional
      paths:
      - /DEVICE_PATH
    osds_per_device: NUMBER_OF_DEVICES # optional
    db_devices: # optional
      size: # optional
      all: true # optional
      paths:
       - /DEVICE_PATH
    encrypted: true

    1. Simple scenarios: In these cases, all the nodes have the same set-up.

      Example

      service_type: osd
      service_id: osd_spec_default
      placement:
        host_pattern: '*'
      data_devices:
        all: true
        paths:
        - /dev/sdb
      encrypted: true

      Example

      service_type: osd
      service_id: osd_spec_default
      placement:
        host_pattern: '*'
      data_devices:
        size: '80G'
      db_devices:
        size: '40G:'
        paths:
         - /dev/sdc

    2. Simple scenario: In this case, all the nodes have the same setup with OSD devices created in raw mode, without an LVM layer.

      Example

      service_type: osd
      service_id: all-available-devices
      encrypted: "true"
      method: raw
      placement:
        host_pattern: "*"
      data_devices:
        all: "true"

    3. Advanced scenario: This would create the desired layout by using all HDDs as data_devices with two SSD assigned as dedicated DB or WAL devices. The remaining SSDs are data_devices that have the NVMEs vendors assigned as dedicated DB or WAL devices.

      Example

      service_type: osd
      service_id: osd_spec_hdd
      placement:
        host_pattern: '*'
      data_devices:
        rotational: 0
      db_devices:
        model: Model-name
        limit: 2
      ---
      service_type: osd
      service_id: osd_spec_ssd
      placement:
        host_pattern: '*'
      data_devices:
        model: Model-name
      db_devices:
        vendor: Vendor-name

    4. Advanced scenario with non-uniform nodes: This applies different OSD specs to different hosts depending on the host_pattern key.

      Example

      service_type: osd
      service_id: osd_spec_node_one_to_five
      placement:
        host_pattern: 'node[1-5]'
      data_devices:
        rotational: 1
      db_devices:
        rotational: 0
      ---
      service_type: osd
      service_id: osd_spec_six_to_ten
      placement:
        host_pattern: 'node[6-10]'
      data_devices:
        model: Model-name
      db_devices:
        model: Model-name

    5. Advanced scenario with dedicated WAL and DB devices:

      Example

      service_type: osd
      service_id: osd_using_paths
      placement:
        hosts:
          - host01
          - host02
      data_devices:
        paths:
          - /dev/sdb
      db_devices:
        paths:
          - /dev/sdc
      wal_devices:
        paths:
          - /dev/sdd

    6. Advanced scenario with multiple OSDs per device:

      Example

      service_type: osd
      service_id: multiple_osds
      placement:
        hosts:
          - host01
          - host02
      osds_per_device: 4
      data_devices:
        paths:
          - /dev/sdb

    7. For pre-created volumes, edit the osd_spec.yaml file to include the following details:

      Syntax

      service_type: osd
      service_id: SERVICE_ID
      placement:
        hosts:
          - HOSTNAME
      data_devices: # optional
        model: DISK_MODEL_NAME # optional
        paths:
        - /DEVICE_PATH
      db_devices: # optional
        size: # optional
        all: true # optional
        paths:
         - /DEVICE_PATH

      Example

      service_type: osd
      service_id: osd_spec
      placement:
        hosts:
          - machine1
      data_devices:
        paths:
          - /dev/vg_hdd/lv_hdd
      db_devices:
        paths:
          - /dev/vg_nvme/lv_nvme

    8. For OSDs by ID, edit the osd_spec.yaml file to include the following details:

      Note

      This configuration is applicable for Red Hat Ceph Storage 5.3z1 and later releases. For earlier releases, use pre-created lvm.

      Syntax

      service_type: osd
      service_id: OSD_BY_ID_HOSTNAME
      placement:
        hosts:
          - HOSTNAME
      data_devices: # optional
        model: DISK_MODEL_NAME # optional
        paths:
        - /DEVICE_PATH
      db_devices: # optional
        size: # optional
        all: true # optional
        paths:
         - /DEVICE_PATH

      Example

      service_type: osd
      service_id: osd_by_id_host01
      placement:
        hosts:
          - host01
      data_devices:
        paths:
          - /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-5
      db_devices:
        paths:
          - /dev/disk/by-id/nvme-nvme.1b36-31323334-51454d55204e564d65204374726c-00000001

    9. For OSDs by path, edit the osd_spec.yaml file to include the following details:

      Note

      This configuration is applicable for Red Hat Ceph Storage 5.3z1 and later releases. For earlier releases, use pre-created lvm.

      Syntax

      service_type: osd
      service_id: OSD_BY_PATH_HOSTNAME
      placement:
        hosts:
          - HOSTNAME
      data_devices: # optional
        model: DISK_MODEL_NAME # optional
        paths:
        - /DEVICE_PATH
      db_devices: # optional
        size: # optional
        all: true # optional
        paths:
         - /DEVICE_PATH

      Example

      service_type: osd
      service_id: osd_by_path_host01
      placement:
        hosts:
          - host01
      data_devices:
        paths:
          - /dev/disk/by-path/pci-0000:0d:00.0-scsi-0:0:0:4
      db_devices:
        paths:
          - /dev/disk/by-path/pci-0000:00:02.0-nvme-1

  3. Mount the YAML file under a directory in the container:

    Example

    [root@host01 ~]# cephadm shell --mount osd_spec.yaml:/var/lib/ceph/osd/osd_spec.yaml

  4. Navigate to the directory:

    Example

    [ceph: root@host01 /]# cd /var/lib/ceph/osd/

  5. Before deploying OSDs, do a dry run:

    Note

    This step gives a preview of the deployment, without deploying the daemons.

    Example

    [ceph: root@host01 osd]# ceph orch apply -i osd_spec.yaml --dry-run

  6. Deploy OSDs using service specification:

    Syntax

    ceph orch apply -i FILE_NAME.yml

    Example

    [ceph: root@host01 osd]# ceph orch apply -i osd_spec.yaml

Verification

  • List the service:

    Example

    [ceph: root@host01 /]# ceph orch ls osd

  • View the details of the node and devices:

    Example

    [ceph: root@host01 /]# ceph osd tree

Additional Resources

6.10. Removing the OSD daemons using the Ceph Orchestrator

You can remove the OSD from a cluster by using Cephadm.

Removing an OSD from a cluster involves two steps:

  1. Evacuates all placement groups (PGs) from the cluster.
  2. Removes the PG-free OSDs from the cluster.

The --zap option removed the volume groups, logical volumes, and the LVM metadata.

Note

After removing OSDs, if the drives the OSDs were deployed on once again become available, cephadm` might automatically try to deploy more OSDs on these drives if they match an existing drivegroup specification. If you deployed the OSDs you are removing with a spec and do not want any new OSDs deployed on the drives after removal, modify the drivegroup specification before removal. While deploying OSDs, if you have used --all-available-devices option, set unmanaged: true to stop it from picking up new drives at all. For other deployments, modify the specification. See the Deploying Ceph OSDs using advanced service specifications for more details.

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • Hosts are added to the cluster.
  • Ceph Monitor, Ceph Manager and Ceph OSD daemons are deployed on the storage cluster.

Procedure

  1. Log into the Cephadm shell:

    Example

    [root@host01 ~]# cephadm shell

  2. Check the device and the node from which the OSD has to be removed:

    Example

    [ceph: root@host01 /]# ceph osd tree

  3. Remove the OSD:

    Syntax

    ceph orch osd rm OSD_ID [--replace] [--force] --zap

    Example

    [ceph: root@host01 /]# ceph orch osd rm 0 --zap

    Note

    If you remove the OSD from the storage cluster without an option, such as --replace, the device is removed from the storage cluster completely. If you want to use the same device for deploying OSDs, you have to first zap the device before adding it to the storage cluster.

  4. Optional: To remove multiple OSDs from a specific node, run the following command:

    Syntax

    ceph orch osd rm OSD_ID OSD_ID --zap

    Example

    [ceph: root@host01 /]# ceph orch osd rm 2 5 --zap

  5. Check the status of the OSD removal:

    Example

    [ceph: root@host01 /]# ceph orch osd rm status
    OSD  HOST   STATE                    PGS  REPLACE  FORCE  ZAP   DRAIN STARTED AT
    9    host01 done, waiting for purge    0  False    False  True  2023-06-06 17:50:50.525690
    10   host03 done, waiting for purge    0  False    False  True  2023-06-06 17:49:38.731533
    11   host02 done, waiting for purge    0  False    False  True  2023-06-06 17:48:36.641105

    When no PGs are left on the OSD, it is decommissioned and removed from the cluster.

Verification

  • Verify the details of the devices and the nodes from which the Ceph OSDs are removed:

    Example

    [ceph: root@host01 /]# ceph osd tree

Additional Resources

6.11. Replacing the OSDs using the Ceph Orchestrator

When disks fail, you can replace the physical storage device and reuse the same OSD ID to avoid having to reconfigure the CRUSH map.

You can replace the OSDs from the cluster using the --replace option.

Note

If you want to replace a single OSD, see Deploying Ceph OSDs on specific devices and hosts. If you want to deploy OSDs on all available devices, see Deploying Ceph OSDs on all available devices.

This option preserves the OSD ID using the ceph orch rm command. The OSD is not permanently removed from the CRUSH hierarchy, but is assigned the destroyed flag. This flag is used to determine the OSD IDs that can be reused in the next OSD deployment. The destroyed flag is used to determine which OSD id is reused in the next OSD deployment.

Similar to rm command, replacing an OSD from a cluster involves two steps:

  • Evacuating all placement groups (PGs) from the cluster.
  • Removing the PG-free OSD from the cluster.

If you use OSD specification for deployment, the OSD ID of the disk being replaced is automatically assigned to the newly added disk as soon as it is inserted.

Note

After removing OSDs, if the drives the OSDs were deployed on once again become available, cephadm might automatically try to deploy more OSDs on these drives if they match an existing drivegroup specification. If you deployed the OSDs you are removing with a spec and do not want any new OSDs deployed on the drives after removal, modify the drivegroup specification before removal. While deploying OSDs, if you have used --all-available-devices option, set unmanaged: true to stop it from picking up new drives at all. For other deployments, modify the specification. See the Deploying Ceph OSDs using advanced service specifications for more details.

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • Hosts are added to the cluster.
  • Monitor, Manager, and OSD daemons are deployed on the storage cluster.
  • A new OSD that replaces the removed OSD must be created on the same host from which the OSD was removed.

Procedure

  1. Log into the Cephadm shell:

    Example

    [root@host01 ~]# cephadm shell

  2. Ensure to dump and save a mapping of your OSD configurations for future references:

    Example

    [ceph: root@node /]# ceph osd metadata -f plain | grep device_paths
    "device_paths": "sde=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:1,sdi=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:1:0:1",
    "device_paths": "sde=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:1,sdf=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:1:0:1",
    "device_paths": "sdd=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:2,sdg=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:1:0:2",
    "device_paths": "sdd=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:2,sdh=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:1:0:2",
    "device_paths": "sdd=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:2,sdk=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:1:0:2",
    "device_paths": "sdc=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:3,sdl=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:1:0:3",
    "device_paths": "sdc=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:3,sdj=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:1:0:3",
    "device_paths": "sdc=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:3,sdm=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:1:0:3",
    [.. output omitted ..]

  3. Check the device and the node from which the OSD has to be replaced:

    Example

    [ceph: root@host01 /]# ceph osd tree

  4. Remove the OSD from the cephadm managed cluster:

    Important

    If the storage cluster has health_warn or other errors associated with it, check and try to fix any errors before replacing the OSD to avoid data loss.

    Syntax

    ceph orch osd rm OSD_ID --replace [--force]

    The --force option can be used when there are ongoing operations on the storage cluster.

    Example

    [ceph: root@host01 /]# ceph orch osd rm 0 --replace

  5. Recreate the new OSD by applying the following OSD specification:

    Example

    service_type: osd
    service_id: osd
    placement:
      hosts:
      - myhost
    data_devices:
      paths:
      - /path/to/the/device

  6. Check the status of the OSD replacement:

    Example

    [ceph: root@host01 /]# ceph orch osd rm status

  7. Stop the orchestrator to apply any existing OSD specification:

    Example

    [ceph: root@node /]# ceph orch pause
    [ceph: root@node /]# ceph orch status
    Backend: cephadm
    Available: Yes
    Paused: Yes

  8. Zap the OSD devices that have been removed:

    Example

    [ceph: root@node /]# ceph orch device zap node.example.com /dev/sdi --force
    zap successful for /dev/sdi on node.example.com
    
    [ceph: root@node /]# ceph orch device zap node.example.com /dev/sdf --force
    zap successful for /dev/sdf on node.example.com

  9. Resume the Orcestrator from pause mode

    Example

    [ceph: root@node /]# ceph orch resume

  10. Check the status of the OSD replacement:

    Example

    [ceph: root@node /]# ceph osd tree
    ID  CLASS  WEIGHT   TYPE NAME      STATUS  REWEIGHT  PRI-AFF
    -1         0.77112  root default
    -3         0.77112      host node
     0    hdd  0.09639          osd.0      up   1.00000  1.00000
     1    hdd  0.09639          osd.1      up   1.00000  1.00000
     2    hdd  0.09639          osd.2      up   1.00000  1.00000
     3    hdd  0.09639          osd.3      up   1.00000  1.00000
     4    hdd  0.09639          osd.4      up   1.00000  1.00000
     5    hdd  0.09639          osd.5      up   1.00000  1.00000
     6    hdd  0.09639          osd.6      up   1.00000  1.00000
     7    hdd  0.09639          osd.7      up   1.00000  1.00000
     [.. output omitted ..]

Verification

  • Verify the details of the devices and the nodes from which the Ceph OSDs are replaced:

    Example

    [ceph: root@host01 /]# ceph osd tree

    You can see an OSD with the same id as the one you replaced running on the same host.

  • Verify that the db_device for the new deployed OSDs is the replaced db_device:

    Example

    [ceph: root@host01 /]# ceph osd metadata 0 | grep bluefs_db_devices
    "bluefs_db_devices": "nvme0n1",
    
    [ceph: root@host01 /]# ceph osd metadata 1 | grep bluefs_db_devices
    "bluefs_db_devices": "nvme0n1",

Additional Resources

6.12. Replacing the OSDs with pre-created LVM

After purging the OSD with the ceph-volume lvm zap command, if the directory is not present, then you can replace the OSDs with the OSd service specification file with the pre-created LVM.

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • Failed OSD

Procedure

  1. Log into the Cephadm shell:

    Example

    [root@host01 ~]# cephadm shell

  2. Remove the OSD:

    Syntax

    ceph orch osd rm OSD_ID [--replace]

    Example

    [ceph: root@host01 /]# ceph orch osd rm 8 --replace
    Scheduled OSD(s) for removal

  3. Verify the OSD is destroyed:

    Example

    [ceph: root@host01 /]# ceph osd tree
    
    ID   CLASS  WEIGHT   TYPE NAME        STATUS     REWEIGHT  PRI-AFF
     -1         0.32297  root default
     -9         0.05177      host host10
      3    hdd  0.01520          osd.3           up   1.00000  1.00000
     13    hdd  0.02489          osd.13          up   1.00000  1.00000
     17    hdd  0.01169          osd.17          up   1.00000  1.00000
    -13         0.05177      host host11
      2    hdd  0.01520          osd.2           up   1.00000  1.00000
     15    hdd  0.02489          osd.15          up   1.00000  1.00000
     19    hdd  0.01169          osd.19          up   1.00000  1.00000
     -7         0.05835      host host12
     20    hdd  0.01459          osd.20          up   1.00000  1.00000
     21    hdd  0.01459          osd.21          up   1.00000  1.00000
     22    hdd  0.01459          osd.22          up   1.00000  1.00000
     23    hdd  0.01459          osd.23          up   1.00000  1.00000
     -5         0.03827      host host04
      1    hdd  0.01169          osd.1           up   1.00000  1.00000
      6    hdd  0.01129          osd.6           up   1.00000  1.00000
      7    hdd  0.00749          osd.7           up   1.00000  1.00000
      9    hdd  0.00780          osd.9           up   1.00000  1.00000
     -3         0.03816      host host05
      0    hdd  0.01169          osd.0           up   1.00000  1.00000
      8    hdd  0.01129          osd.8    destroyed         0  1.00000
     12    hdd  0.00749          osd.12          up   1.00000  1.00000
     16    hdd  0.00769          osd.16          up   1.00000  1.00000
    -15         0.04237      host host06
      5    hdd  0.01239          osd.5           up   1.00000  1.00000
     10    hdd  0.01540          osd.10          up   1.00000  1.00000
     11    hdd  0.01459          osd.11          up   1.00000  1.00000
    -11         0.04227      host host07
      4    hdd  0.01239          osd.4           up   1.00000  1.00000
     14    hdd  0.01529          osd.14          up   1.00000  1.00000
     18    hdd  0.01459          osd.18          up   1.00000  1.00000

  4. Zap and remove the OSD using the ceph-volume command:

    Syntax

    ceph-volume lvm zap --osd-id OSD_ID

    Example

    [ceph: root@host01 /]# ceph-volume lvm zap --osd-id 8
    
    Zapping: /dev/vg1/data-lv2
    Closing encrypted path /dev/mapper/l4D6ql-Prji-IzH4-dfhF-xzuf-5ETl-jNRcXC
    Running command: /usr/sbin/cryptsetup remove /dev/mapper/l4D6ql-Prji-IzH4-dfhF-xzuf-5ETl-jNRcXC
    Running command: /usr/bin/dd if=/dev/zero of=/dev/vg1/data-lv2 bs=1M count=10 conv=fsync
     stderr: 10+0 records in
    10+0 records out
     stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.034742 s, 302 MB/s
    Zapping successful for OSD: 8

  5. Check the OSD topology:

    Example

    [ceph: root@host01 /]# ceph-volume lvm list

  6. Recreate the OSD with a specification file corresponding to that specific OSD topology:

    Example

    [ceph: root@host01 /]# cat osd.yml
    service_type: osd
    service_id: osd_service
    placement:
      hosts:
      - host03
    data_devices:
      paths:
      - /dev/vg1/data-lv2
    db_devices:
      paths:
       - /dev/vg1/db-lv1

  7. Apply the updated specification file:

    Example

    [ceph: root@host01 /]# ceph orch apply -i osd.yml
    Scheduled osd.osd_service update...

  8. Verify the OSD is back:

    Example

    [ceph: root@host01 /]# ceph -s
    [ceph: root@host01 /]# ceph osd tree

6.13. Replacing the OSDs in a non-colocated scenario

When the an OSD fails in a non-colocated scenario, you can replace the WAL/DB devices. The procedure is the same for DB and WAL devices. You need to edit the paths under db_devices for DB devices and paths under wal_devices for WAL devices.

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • Daemons are non-colocated.
  • Failed OSD

Procedure

  1. Identify the devices in the cluster:

    Example

    [root@host01 ~]# lsblk
    
    NAME                                                                                                  MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
    sda                                                                                                     8:0    0   20G  0 disk
    ├─sda1                                                                                                  8:1    0    1G  0 part /boot
    └─sda2                                                                                                  8:2    0   19G  0 part
      ├─rhel-root                                                                                         253:0    0   17G  0 lvm  /
      └─rhel-swap                                                                                         253:1    0    2G  0 lvm  [SWAP]
    sdb                                                                                                     8:16   0   10G  0 disk
    └─ceph--5726d3e9--4fdb--4eda--b56a--3e0df88d663f-osd--block--3ceb89ec--87ef--46b4--99c6--2a56bac09ff0 253:2    0   10G  0 lvm
    sdc                                                                                                     8:32   0   10G  0 disk
    └─ceph--d7c9ab50--f5c0--4be0--a8fd--e0313115f65c-osd--block--37c370df--1263--487f--a476--08e28bdbcd3c 253:4    0   10G  0 lvm
    sdd                                                                                                     8:48   0   10G  0 disk
    ├─ceph--1774f992--44f9--4e78--be7b--b403057cf5c3-osd--db--31b20150--4cbc--4c2c--9c8f--6f624f3bfd89    253:7    0  2.5G  0 lvm
    └─ceph--1774f992--44f9--4e78--be7b--b403057cf5c3-osd--db--1bee5101--dbab--4155--a02c--e5a747d38a56    253:9    0  2.5G  0 lvm
    sde                                                                                                     8:64   0   10G  0 disk
    sdf                                                                                                     8:80   0   10G  0 disk
    └─ceph--412ee99b--4303--4199--930a--0d976e1599a2-osd--block--3a99af02--7c73--4236--9879--1fad1fe6203d 253:6    0   10G  0 lvm
    sdg                                                                                                     8:96   0   10G  0 disk
    └─ceph--316ca066--aeb6--46e1--8c57--f12f279467b4-osd--block--58475365--51e7--42f2--9681--e0c921947ae6 253:8    0   10G  0 lvm
    sdh                                                                                                     8:112  0   10G  0 disk
    ├─ceph--d7064874--66cb--4a77--a7c2--8aa0b0125c3c-osd--db--0dfe6eca--ba58--438a--9510--d96e6814d853    253:3    0    5G  0 lvm
    └─ceph--d7064874--66cb--4a77--a7c2--8aa0b0125c3c-osd--db--26b70c30--8817--45de--8843--4c0932ad2429    253:5    0    5G  0 lvm
    sr0

  2. Log into the Cephadm shell:

    Example

    [root@host01 ~]# cephadm shell

  3. Identify the OSDs and their DB device:

    Example

    [ceph: root@host01 /]# ceph-volume lvm list /dev/sdh
    
    
    ====== osd.2 =======
    
      [db]          /dev/ceph-d7064874-66cb-4a77-a7c2-8aa0b0125c3c/osd-db-0dfe6eca-ba58-438a-9510-d96e6814d853
    
          block device              /dev/ceph-5726d3e9-4fdb-4eda-b56a-3e0df88d663f/osd-block-3ceb89ec-87ef-46b4-99c6-2a56bac09ff0
          block uuid                GkWLoo-f0jd-Apj2-Zmwj-ce0h-OY6J-UuW8aD
          cephx lockbox secret
          cluster fsid              fa0bd9dc-e4c4-11ed-8db4-001a4a00046e
          cluster name              ceph
          crush device class
          db device                 /dev/ceph-d7064874-66cb-4a77-a7c2-8aa0b0125c3c/osd-db-0dfe6eca-ba58-438a-9510-d96e6814d853
          db uuid                   6gSPoc-L39h-afN3-rDl6-kozT-AX9S-XR20xM
          encrypted                 0
          osd fsid                  3ceb89ec-87ef-46b4-99c6-2a56bac09ff0
          osd id                    2
          osdspec affinity          non-colocated
          type                      db
          vdo                       0
          devices                   /dev/sdh
    
    ====== osd.5 =======
    
      [db]          /dev/ceph-d7064874-66cb-4a77-a7c2-8aa0b0125c3c/osd-db-26b70c30-8817-45de-8843-4c0932ad2429
    
          block device              /dev/ceph-d7c9ab50-f5c0-4be0-a8fd-e0313115f65c/osd-block-37c370df-1263-487f-a476-08e28bdbcd3c
          block uuid                Eay3I7-fcz5-AWvp-kRcI-mJaH-n03V-Zr0wmJ
          cephx lockbox secret
          cluster fsid              fa0bd9dc-e4c4-11ed-8db4-001a4a00046e
          cluster name              ceph
          crush device class
          db device                 /dev/ceph-d7064874-66cb-4a77-a7c2-8aa0b0125c3c/osd-db-26b70c30-8817-45de-8843-4c0932ad2429
          db uuid                   mwSohP-u72r-DHcT-BPka-piwA-lSwx-w24N0M
          encrypted                 0
          osd fsid                  37c370df-1263-487f-a476-08e28bdbcd3c
          osd id                    5
          osdspec affinity          non-colocated
          type                      db
          vdo                       0
          devices                   /dev/sdh

  4. In the osds.yaml file, set unmanaged parameter to true, else cephadm redeploys the OSDs:

    Example

    [ceph: root@host01 /]# cat osds.yml
    service_type: osd
    service_id: non-colocated
    unmanaged: true
    placement:
      host_pattern: 'ceph*'
    data_devices:
      paths:
       - /dev/sdb
       - /dev/sdc
       - /dev/sdf
       - /dev/sdg
    db_devices:
      paths:
       - /dev/sdd
       - /dev/sdh

  5. Apply the updated specification file:

    Example

    [ceph: root@host01 /]# ceph orch apply -i osds.yml
    
    Scheduled osd.non-colocated update...

  6. Check the status:

    Example

    [ceph: root@host01 /]# ceph orch ls
    
    NAME           PORTS        RUNNING  REFRESHED  AGE  PLACEMENT
    alertmanager   ?:9093,9094      1/1  9m ago     4d   count:1
    crash                           3/4  4d ago     4d   *
    grafana        ?:3000           1/1  9m ago     4d   count:1
    mgr                             1/2  4d ago     4d   count:2
    mon                             3/5  4d ago     4d   count:5
    node-exporter  ?:9100           3/4  4d ago     4d   *
    osd.non-colocated                 8  4d ago     5s   <unmanaged>
    prometheus     ?:9095           1/1  9m ago     4d   count:1

  7. Remove the OSDs. Ensure to use the --zap option to remove hte backend services and the --replace option to retain the OSD IDs:

    Example

    [ceph: root@host01 /]# ceph orch osd rm 2 5 --zap --replace
    Scheduled OSD(s) for removal

  8. Check the status:

    Example

    [ceph: root@host01 /]# ceph osd df tree | egrep -i "ID|host02|osd.2|osd.5"
    
    ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP  META     AVAIL   %USE   VAR   PGS  STATUS     TYPE NAME
    -5         0.04877         -   55 GiB   15 GiB  4.1 MiB   0 B   60 MiB  40 GiB  27.27  1.17    -                 host02
     2    hdd  0.01219   1.00000   15 GiB  5.0 GiB  996 KiB   0 B   15 MiB  10 GiB  33.33  1.43    0  destroyed          osd.2
     5    hdd  0.01219   1.00000   15 GiB  5.0 GiB  1.0 MiB   0 B   15 MiB  10 GiB  33.33  1.43    0  destroyed          osd.5

  9. Edit the osds.yaml specification file to change unmanaged parameter to false and replace the path to the DB device if it has changed after the device got physically replaced:

    Example

    [ceph: root@host01 /]# cat osds.yml
    service_type: osd
    service_id: non-colocated
    unmanaged: false
    placement:
      host_pattern: 'ceph01*'
    data_devices:
      paths:
       - /dev/sdb
       - /dev/sdc
       - /dev/sdf
       - /dev/sdg
    db_devices:
      paths:
       - /dev/sdd
       - /dev/sde

    In the above example, /dev/sdh is replaced with /dev/sde.

    Important

    If you use the same host specification file to replace the faulty DB device on a single OSD node, modify the host_pattern option to specify only the OSD node, else the deployment fails and you cannot find the new DB device on other hosts.

  10. Reapply the specification file with the --dry-run option to ensure the OSDs shall be deployed with the new DB device:

    Example

    [ceph: root@host01 /]# ceph orch apply -i osds.yml --dry-run
    WARNING! Dry-Runs are snapshots of a certain point in time and are bound
    to the current inventory setup. If any of these conditions change, the
    preview will be invalid. Please make sure to have a minimal
    timeframe between planning and applying the specs.
    ####################
    SERVICESPEC PREVIEWS
    ####################
    +---------+------+--------+-------------+
    |SERVICE  |NAME  |ADD_TO  |REMOVE_FROM  |
    +---------+------+--------+-------------+
    +---------+------+--------+-------------+
    ################
    OSDSPEC PREVIEWS
    ################
    +---------+-------+-------+----------+----------+-----+
    |SERVICE  |NAME   |HOST   |DATA      |DB        |WAL  |
    +---------+-------+-------+----------+----------+-----+
    |osd      |non-colocated  |host02  |/dev/sdb  |/dev/sde  |-    |
    |osd      |non-colocated  |host02  |/dev/sdc  |/dev/sde  |-    |
    +---------+-------+-------+----------+----------+-----+

  11. Apply the specification file:

    Example

    [ceph: root@host01 /]# ceph orch apply -i osds.yml
    Scheduled osd.non-colocated update...

  12. Check the OSDs are redeployed:

    Example

    [ceph: root@host01 /]# ceph osd df tree | egrep -i "ID|host02|osd.2|osd.5"
    
    ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP  META     AVAIL   %USE   VAR   PGS  STATUS  TYPE NAME
    -5         0.04877         -   55 GiB   15 GiB  4.5 MiB   0 B   60 MiB  40 GiB  27.27  1.17    -              host host02
     2    hdd  0.01219   1.00000   15 GiB  5.0 GiB  1.1 MiB   0 B   15 MiB  10 GiB  33.33  1.43    0      up          osd.2
     5    hdd  0.01219   1.00000   15 GiB  5.0 GiB  1.1 MiB   0 B   15 MiB  10 GiB  33.33  1.43    0      up          osd.5

Verification

  1. From the OSD host where the OSDS are redeployed, verify if they are on the new DB device:

    Example

    [ceph: root@host01 /]# ceph-volume lvm list /dev/sde
    
    ====== osd.2 =======
    
      [db]          /dev/ceph-15ce813a-8a4c-46d9-ad99-7e0845baf15e/osd-db-1998a02e-5e67-42a9-b057-e02c22bbf461
    
          block device              /dev/ceph-a4afcb78-c804-4daf-b78f-3c7ad1ed0379/osd-block-564b3d2f-0f85-4289-899a-9f98a2641979
          block uuid                ITPVPa-CCQ5-BbFa-FZCn-FeYt-c5N4-ssdU41
          cephx lockbox secret
          cluster fsid              fa0bd9dc-e4c4-11ed-8db4-001a4a00046e
          cluster name              ceph
          crush device class
          db device                 /dev/ceph-15ce813a-8a4c-46d9-ad99-7e0845baf15e/osd-db-1998a02e-5e67-42a9-b057-e02c22bbf461
          db uuid                   HF1bYb-fTK7-0dcB-CHzW-xvNn-dCym-KKdU5e
          encrypted                 0
          osd fsid                  564b3d2f-0f85-4289-899a-9f98a2641979
          osd id                    2
          osdspec affinity          non-colocated
          type                      db
          vdo                       0
          devices                   /dev/sde
    
    ====== osd.5 =======
    
      [db]          /dev/ceph-15ce813a-8a4c-46d9-ad99-7e0845baf15e/osd-db-6c154191-846d-4e63-8c57-fc4b99e182bd
    
          block device              /dev/ceph-b37c8310-77f9-4163-964b-f17b4c29c537/osd-block-b42a4f1f-8e19-4416-a874-6ff5d305d97f
          block uuid                0LuPoz-ao7S-UL2t-BDIs-C9pl-ct8J-xh5ep4
          cephx lockbox secret
          cluster fsid              fa0bd9dc-e4c4-11ed-8db4-001a4a00046e
          cluster name              ceph
          crush device class
          db device                 /dev/ceph-15ce813a-8a4c-46d9-ad99-7e0845baf15e/osd-db-6c154191-846d-4e63-8c57-fc4b99e182bd
          db uuid                   SvmXms-iWkj-MTG7-VnJj-r5Mo-Moiw-MsbqVD
          encrypted                 0
          osd fsid                  b42a4f1f-8e19-4416-a874-6ff5d305d97f
          osd id                    5
          osdspec affinity          non-colocated
          type                      db
          vdo                       0
          devices                   /dev/sde

6.14. Stopping the removal of the OSDs using the Ceph Orchestrator

You can stop the removal of only the OSDs that are queued for removal. This resets the initial state of the OSD and takes it off the removal queue.

If the OSD is in the process of removal, then you cannot stop the process.

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • Hosts are added to the cluster.
  • Monitor, Manager and OSD daemons are deployed on the cluster.
  • Remove OSD process initiated.

Procedure

  1. Log into the Cephadm shell:

    Example

    [root@host01 ~]# cephadm shell

  2. Check the device and the node from which the OSD was initiated to be removed:

    Example

    [ceph: root@host01 /]# ceph osd tree

  3. Stop the removal of the queued OSD:

    Syntax

    ceph orch osd rm stop OSD_ID

    Example

    [ceph: root@host01 /]# ceph orch osd rm stop 0

  4. Check the status of the OSD removal:

    Example

    [ceph: root@host01 /]# ceph orch osd rm status

Verification

  • Verify the details of the devices and the nodes from which the Ceph OSDs were queued for removal:

    Example

    [ceph: root@host01 /]# ceph osd tree

Additional Resources

6.15. Activating the OSDs using the Ceph Orchestrator

You can activate the OSDs in the cluster in cases where the operating system of the host was reinstalled.

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • Hosts are added to the cluster.
  • Monitor, Manager and OSD daemons are deployed on the storage cluster.

Procedure

  1. Log into the Cephadm shell:

    Example

    [root@host01 ~]# cephadm shell

  2. After the operating system of the host is reinstalled, activate the OSDs:

    Syntax

    ceph cephadm osd activate HOSTNAME

    Example

    [ceph: root@host01 /]# ceph cephadm osd activate host03

Verification

  • List the service:

    Example

    [ceph: root@host01 /]# ceph orch ls

  • List the hosts, daemons, and processes:

    Syntax

    ceph orch ps --service_name=SERVICE_NAME

    Example

    [ceph: root@host01 /]# ceph orch ps --service_name=osd

6.16. Observing the data migration

When you add or remove an OSD to the CRUSH map, Ceph begins rebalancing the data by migrating placement groups to the new or existing OSD(s). You can observe the data migration using ceph-w command.

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • Recently added or removed an OSD.

Procedure

  1. To observe the data migration:

    Example

    [ceph: root@host01 /]# ceph -w

  2. Watch as the placement group states change from active+clean to active, some degraded objects, and finally active+clean when migration completes.
  3. To exit the utility, press Ctrl + C.

6.17. Recalculating the placement groups

Placement groups (PGs) define the spread of any pool data across the available OSDs. A placement group is built upon the given redundancy algorithm to be used. For a 3-way replication, the redundancy is defined to use three different OSDs. For erasure-coded pools, the number of OSDs to use is defined by the number of chunks.

When defining a pool the number of placement groups defines the grade of granularity the data is spread with across all available OSDs. The higher the number the better the equalization of capacity load can be. However, since handling the placement groups is also important in case of reconstruction of data, the number is significant to be carefully chosen upfront. To support calculation a tool is available to produce agile environments.

During the lifetime of a storage cluster a pool may grow above the initially anticipated limits. With the growing number of drives a recalculation is recommended. The number of placement groups per OSD should be around 100. When adding more OSDs to the storage cluster the number of PGs per OSD will lower over time. Starting with 120 drives initially in the storage cluster and setting the pg_num of the pool to 4000 will end up in 100 PGs per OSD, given with the replication factor of three. Over time, when growing to ten times the number of OSDs, the number of PGs per OSD will go down to ten only. Because a small number of PGs per OSD will tend to an unevenly distributed capacity, consider adjusting the PGs per pool.

Adjusting the number of placement groups can be done online. Recalculating is not only a recalculation of the PG numbers, but will involve data relocation, which will be a lengthy process. However, the data availability will be maintained at any time.

Very high numbers of PGs per OSD should be avoided, because reconstruction of all PGs on a failed OSD will start at once. A high number of IOPS is required to perform reconstruction in a timely manner, which might not be available. This would lead to deep I/O queues and high latency rendering the storage cluster unusable or will result in long healing times.

Additional Resources

  • See the PG calculator for calculating the values by a given use case.
  • See the Erasure Code Pools chapter in the Red Hat Ceph Storage Strategies Guide for more information.