Chapter 6. Using the ceph-volume Utility to Deploy OSDs

The ceph-volume utility is a single purpose command-line tool to deploy logical volumes as OSDs. It uses a plugin-type framework to deploying OSDs with different device technologies. The ceph-volume utility follows a similar workflow of the ceph-disk utility for deploying OSDs, with a predictable, and robust way of preparing, activating, and starting OSDs. Currently, the ceph-volume utility only supports the lvm plugin, with the plan to support others technologies in the future.

Important

The ceph-disk command is deprecated.

6.1. Using the ceph-volume LVM Plugin

By making use of LVM tags, the lvm sub-command is able to store and re-discover by querying devices associated with OSDs so they can be activated. This includes support for lvm-based technologies like dm-cache as well.

When using ceph-volume, the use of dm-cache is transparent, and treats dm-cache like a logical volume. The performance gains and losses when using dm-cache will depend on the specific workload. Generally, random and sequential reads will see an increase in performance at smaller block sizes; while random and sequential writes will see a decrease in performance at larger block sizes.

To use the LVM plugin, add lvm as a subcommand to the ceph-volume command:

ceph-volume lvm

There are three subcommands to the lvm subcommand, as follows:

Note

Using the create subcommand combines the prepare and activate subcommands into one subcommand. See the create subcommand section for more details.

6.1.1. Preparing OSDs

The prepare subcommand prepares an OSD backend object store and consumes logical volumes for both the OSD data and journal. There is no default object storage type. The object storage type requires either the --filestore or --bluestore option to be set at preparation time. Starting with Red Hat Ceph Storage 3.2, support for the BlueStore object storage type is available. The prepare subcommand will not create or modify the logical volumes, except for adding some extra metadata using LVM tags.

LVM tags makes volumes easier to discover later, and help identify them as part of a Ceph system, and what role they have. The ceph-volume lvm prepare command adds the following list of LVM tags:

  • cluster_fsid
  • data_device
  • journal_device
  • encrypted
  • osd_fsid
  • osd_id
  • journal_uuid

The prepare process is very strict, it requires two logical volumes that are ready for use, and requires the minimum size for an OSD data and journal. The journal device can be either a logical volume or a partition.

Here is the prepare workflow process:

  1. Accept logical volumes for data and journal
  2. Generate a UUID for the OSD
  3. Ask the Ceph Monitor to get an OSD identifier reusing the generated UUID
  4. OSD data directory is created and data volume mounted
  5. Journal is symlinked from data volume to journal location
  6. The monmap is fetched for activation
  7. Device is mounted and the data directory is populated by ceph-osd
  8. LVM tags are assigned to theOSD data and journal volumes

Do the following step on an OSD node, and as the root user, to prepare a simple OSD deployment using LVM:

ceph-volume lvm prepare --bluestore --data $VG_NAME/$LV_NAME

For example:

# ceph-volume lvm prepare --bluestore --data example_vg/data_lv

For BlueStore, you can also specify the --block.db and --block.wal options, if you want to use a separate device for RocksDB.

Here is an example of using FileStore with a partition as a journal device:

# ceph-volume lvm prepare --filestore --data example_vg/data_lv --journal /dev/sdc1

When using a partition, it must contain a PARTUUID discoverable by the blkid command, this way it can be identified correctly regardless of the device name or path.

Important

The ceph-volume LVM plugin does not create partitions on a raw disk device. Creating this partition has to be done before using a partition for the OSD journal device.

6.1.2. Activating OSDs

Once the prepare process is done, the OSD is ready to go active. The activation process enables a Systemd unit at boot time which allows the correct OSD identifier and its UUID to be enabled and mounted.

Here is the activate workflow process:

  1. Requires both OSD id and OSD uuid
  2. Enable the systemd unit with matching OSD id and OSD uuid
  3. The systemd unit will ensure all devices are ready and mounted
  4. The matching ceph-osd systemd unit will get started

Do the following step on an OSD node, and as the root user, to activate an OSD:

ceph-volume lvm activate --filestore $OSD_ID $OSD_UUID

For example:

# ceph-volume lvm activate --filestore 0 0263644D-0BF1-4D6D-BC34-28BD98AE3BC8
Note

There are no side-effects when running this command multiple times.

6.1.3. Creating OSDs

The create subcommand wraps the two-step process to deploy a new OSD by calling the prepare subcommand and then calling the activate subcommand into a single subcommand. The reason to use prepare and then activate separately is to gradually introduce new OSDs into a storage cluster, and avoiding large amounts of data being rebalanced. There is nothing different to the process except the OSD will become up and in immediately after completion.

Do the following step on an OSD node, and as the root user:

ceph-volume lvm create --filestore --data $VG_NAME/$LV_NAME --journal $JOURNAL_DEVICE

For example:

# ceph-volume lvm create --filestore --data example_vg/data_lv --journal example_vg/journal_lv

6.1.4. Using batch mode

The batch subcommand automates the creation of multiple OSDs when single devices are provided. The ceph-volume command decides the best method in creating the OSDs based on drive type. This best method is dependant on the object store format, BlueStore or FileStore.

BlueStore is the default object store type for OSDs. When using BlueStore, OSD optimization depends on three different scenarios based on the devices being used. If all devices are traditional hard drives, then one OSD per device is created. If all devices are solid state drives, then two OSDs per device are created. If there is a mix of traditional hard drives and solid state drives, then data is put on the traditional hard drives, and the journal (block.db) is created as large as possible on the solid state drive.

Note

The batch subcommand does not support the creating of a separate logical volume for the write-ahead-log (block.wal) device.

BlusStore example

# ceph-volume lvm batch --bluestore /dev/sda /dev/sdb /dev/nvme0n1

When using FileStore, OSD optimization depends on two different scenarios based on the devices being used. If all devices are traditional hard drives or are solid state drives, then one OSD per device is created, collocating the journal on the same device. If there is a mix of traditional hard drives and solid state drives, then data is put on the traditional hard drives, and the journal is created on the solid state drive using the sizing parameters specified in the Ceph configuration file, by default ceph.conf, with a default journal size of 5 GB.

FileStore example

# ceph-volume lvm batch --filestore /dev/sda /dev/sdb