12.3. Removing the old disk from the system and installing the replacement disk

On the container host with the OSD that you want to replace, remove the old disk from the system and install the replacement disk.

Prerequisites:

The ceph-volume command is present in the Ceph container but is not installed on the overcloud node. Create an alias so that the ceph-volume command runs the ceph-volume binary inside the Ceph container. Then use the ceph-volume command to clean the new disk and add it as an OSD.

Procedure

  1. Ensure that the failed OSD is not running:

    systemctl stop ceph-osd@27
  2. Identify the image ID of the ceph container image and store it in an environment variable called IMG:

    IMG=$(podman images | grep ceph | awk {'print $3'})
  3. Alias the ceph-volume command so that it runs inside the $IMG Ceph container, with the ceph-volume entry point and relevant directories:

    alias ceph-volume="podman run --rm --privileged --net=host --ipc=host -v /run/lock/lvm:/run/lock/lvm:z -v /var/run/udev/:/var/run/udev/:z -v /dev:/dev -v /etc/ceph:/etc/ceph:z -v /var/lib/ceph/:/var/lib/ceph/:z -v /var/log/ceph/:/var/log/ceph/:z --entrypoint=ceph-volume $IMG --cluster ceph"
  4. Verify that the aliased command runs successfully:

    ceph-volume lvm list
  5. Check that your new OSD device is not already part of LVM. Use the pvdisplay command to inspect the device, and ensure that the VG Name field is empty. Replace <NEW_DEVICE> with the /dev/* path of your new OSD device:

    [root@overcloud-computehci-2 ~]# pvdisplay <NEW_DEVICE>
      --- Physical volume ---
      PV Name               /dev/sdj
      VG Name               ceph-0fb0de13-fc8e-44c8-99ea-911e343191d2
      PV Size               50.00 GiB / not usable 1.00 GiB
      Allocatable           yes (but full)
      PE Size               1.00 GiB
      Total PE              49
      Free PE               0
      Allocated PE          49
      PV UUID               kOO0If-ge2F-UH44-6S1z-9tAv-7ypT-7by4cp
    [root@overcloud-computehci-2 ~]#

    If the VG Name field is not empty, then the device belongs to a volume group that you must remove.

  6. If the device belongs to a volume group, use the lvdisplay command to check if there is a logical volume in the volume group. Replace <VOLUME_GROUP> with the value of the VG Name field that you retrieved from the pvdisplay command:

    [root@overcloud-computehci-2 ~]# lvdisplay | grep <VOLUME_GROUP>
      LV Path                /dev/ceph-0fb0de13-fc8e-44c8-99ea-911e343191d2/osd-data-a0810722-7673-43c7-8511-2fd9db1dbbc6
      VG Name                ceph-0fb0de13-fc8e-44c8-99ea-911e343191d2
    [root@overcloud-computehci-2 ~]#

    If the LV Path field is not empty, then the device contains a logical volume that you must remove.

  7. If the new device is part of a logical volume or volume group, remove the logical volume, volume group, and the device association as a physical volume within the LVM system.

    • Replace <LV_PATH> with the value of the LV Path field.
    • Replace <VOLUME_GROUP> with the value of the VG Name field.
    • Replace <NEW_DEVICE> with the /dev/* path of your new OSD device.

      [root@overcloud-computehci-2 ~]# lvremove --force <LV_PATH>
        Logical volume "osd-data-a0810722-7673-43c7-8511-2fd9db1dbbc6" successfully removed
      [root@overcloud-computehci-2 ~]# vgremove --force <VOLUME_GROUP>
        Volume group "ceph-0fb0de13-fc8e-44c8-99ea-911e343191d2" successfully removed
      [root@overcloud-computehci-2 ~]# pvremove <NEW_DEVICE>
        Labels on physical volume "/dev/sdj" successfully wiped.
  8. Ensure that the new OSD device is clean. In the following example, the device is /dev/sdj:

    [root@overcloud-computehci-2 ~]# ceph-volume lvm zap /dev/sdj
    --> Zapping: /dev/sdj
    --> --destroy was not specified, but zapping a whole device will remove the partition table
    Running command: /usr/sbin/wipefs --all /dev/sdj
    Running command: /bin/dd if=/dev/zero of=/dev/sdj bs=1M count=10
     stderr: 10+0 records in
    10+0 records out
    10485760 bytes (10 MB, 10 MiB) copied, 0.010618 s, 988 MB/s
    --> Zapping successful for: <Raw Device: /dev/sdj>
    [root@overcloud-computehci-2 ~]#
  9. Create the new OSD with the existing OSD ID by using the new device but pass --no-systemd so that ceph-volume does not attempt to start the OSD. This is not possible from within the container:

    ceph-volume lvm create --osd-id 27 --data /dev/sdj --no-systemd
  10. Start the OSD outside of the container:

    systemctl start ceph-osd@27