Red Hat Ceph Storage 5 replacing shared DB device for multiple OSDs

Updated -

Replacing the db_device shared with multiple OSDs using the Ceph Orchestrator

When shared db_device disks fail, you can replace the physical storage device and you need to redeploy all OSDs related to the replaced device as well.

NOTE: After removing OSDs, if the drives the OSDs where deployed on once again become available, Orchestrator might automatically try to deploy more OSDs on these drive if they match an existing drivegroup specification. With multiple OSDs sharing a db_device, Orchestrator will only be able to configure and partition the db_device for all OSDs that are already cleaned and re useable.

Prerequisistes

  • A running Red Hat Ceph Storage cluster
  • Hosts are added to the cluster
  • Monitor, Manager and OSD daemon are deployed on the storage cluster
  • A new db_device that replaces the removed db_device must be located on the same host which the db_device was removed

Procedure

  • Log into the Cephadm shell:

    cephadm shell 
    
  • Ensure to dump and save a mapping of your OSD configurations for future references

    [ceph: root@node /]# ceph osd metadata -f plain | grep device_paths
    "device_paths": "sde=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:1,sdi=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:1:0:1",
    "device_paths": "sde=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:1,sdf=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:1:0:1",
    "device_paths": "sdd=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:2,sdg=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:1:0:2",
    "device_paths": "sdd=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:2,sdh=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:1:0:2",
    "device_paths": "sdd=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:2,sdk=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:1:0:2",
    "device_paths": "sdc=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:3,sdl=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:1:0:3",
    "device_paths": "sdc=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:3,sdj=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:1:0:3",
    "device_paths": "sdc=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:3,sdm=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:1:0:3",
    [.. output omitted ..]
    
  • Check the device and the node from which the OSD has to be replaced:

    [ceph: root@node /]# ceph osd tree
    ID  CLASS  WEIGHT   TYPE NAME      STATUS  REWEIGHT  PRI-AFF
    -1         0.77112  root default                            
    -3         0.77112      host node                           
     0    hdd  0.09639          osd.0    down   1.00000  1.00000
     1    hdd  0.09639          osd.1    down   1.00000  1.00000
     2    hdd  0.09639          osd.2      up   1.00000  1.00000
     3    hdd  0.09639          osd.3      up   1.00000  1.00000
     4    hdd  0.09639          osd.4      up   1.00000  1.00000
     5    hdd  0.09639          osd.5      up   1.00000  1.00000
     6    hdd  0.09639          osd.6      up   1.00000  1.00000
     7    hdd  0.09639          osd.7      up   1.00000  1.00000
     [.. output omitted ..]
    
  • Replace the OSD:

    [ceph: root@node /]# ceph osd rm 0
    removed osd.0
    [ceph: root@node /]# ceph osd rm 1
    removed osd.1
    
  • Check the status of the OSD replacement:

    [ceph: root@node /]# ceph orch osd rm status  
    No OSD remove/replace operations reported
    
  • Stop the Orchestrator for applying any existing OSD specification

    [ceph: root@node /]# ceph orch pause
    [ceph: root@node /]# ceph orch status
    Backend: cephadm
    Available: Yes
    Paused: Yes
    
  • Zap the OSD devices that have been removed

    [ceph: root@node /]# ceph orch device zap node.example.com /dev/sdi --force
    zap successful for /dev/sdi on node.example.com
    [ceph: root@node /]# ceph orch device zap node.example.com /dev/sdf --force
    zap successful for /dev/sdf on node.example.com
    
  • Resume the Orcestrator from pause mode

    [ceph: root@node /]# ceph orch resume 
    
  • Check the status of the OSD replacement:

    [ceph: root@node /]# ceph osd tree
    ID  CLASS  WEIGHT   TYPE NAME      STATUS  REWEIGHT  PRI-AFF
    -1         0.77112  root default                            
    -3         0.77112      host node                           
     0    hdd  0.09639          osd.0      up   1.00000  1.00000
     1    hdd  0.09639          osd.1      up   1.00000  1.00000
     2    hdd  0.09639          osd.2      up   1.00000  1.00000
     3    hdd  0.09639          osd.3      up   1.00000  1.00000
     4    hdd  0.09639          osd.4      up   1.00000  1.00000
     5    hdd  0.09639          osd.5      up   1.00000  1.00000
     6    hdd  0.09639          osd.6      up   1.00000  1.00000
     7    hdd  0.09639          osd.7      up   1.00000  1.00000
     [.. output omitted ..]
    
  • Verify that the db_device for the new deployed OSDs is your replaced db_device

    [ceph: root@node /]# ceph osd metadata 0 | grep bluefs_db_devices
    "bluefs_db_devices": "nvme0n1",
    [ceph: root@node /]# ceph osd metadata 1 | grep bluefs_db_devices
    "bluefs_db_devices": "nvme0n1",
    

Comments