Chapter 6. Using the ceph-volume Utility to Deploy OSDs
ceph-volume utility is a single purpose command-line tool to deploy logical volumes as OSDs. It uses a plugin-type framework to deploying OSDs with different device technologies. The
ceph-volume utility follows a similar workflow of the
ceph-disk utility for deploying OSDs, with a predictable, and robust way of preparing, activating, and starting OSDs. Currently, the
ceph-volume utility only supports the
lvm plugin, with the plan to support others technologies in the future.
ceph-disk command is deprecated.
6.1. Using the
ceph-volume LVM Plugin
By making use of LVM tags, the
lvm sub-command is able to store and re-discover by querying devices associated with OSDs so they can be activated. This includes support for lvm-based technologies like
dm-cache as well.
ceph-volume, the use of
dm-cache is transparent, and treats
dm-cache like a logical volume. The performance gains and losses when using
dm-cache will depend on the specific workload. Generally, random and sequential reads will see an increase in performance at smaller block sizes; while random and sequential writes will see a decrease in performance at larger block sizes.
To use the LVM plugin, add
lvm as a subcommand to the
There are three subcommands to the
lvm subcommand, as follows:
create subcommand combines the
activate subcommands into one subcommand. See the
create subcommand section for more details.
6.1.1. Preparing OSDs
prepare subcommand prepares an OSD backend object store and consumes logical volumes for both the OSD data and journal. There is no default object storage type. The object storage type requires either the
--bluestore option to be set at preparation time. Starting with Red Hat Ceph Storage 3.2, support for the BlueStore object storage type is available. The
prepare subcommand will not create or modify the logical volumes, except for adding some extra metadata using LVM tags.
LVM tags makes volumes easier to discover later, and help identify them as part of a Ceph system, and what role they have. The
ceph-volume lvm prepare command adds the following list of LVM tags:
prepare process is very strict, it requires two logical volumes that are ready for use, and requires the minimum size for an OSD data and journal. The journal device can be either a logical volume or a partition.
Here is the
prepare workflow process:
- Accept logical volumes for data and journal
- Generate a UUID for the OSD
- Ask the Ceph Monitor to get an OSD identifier reusing the generated UUID
- OSD data directory is created and data volume mounted
- Journal is symlinked from data volume to journal location
monmapis fetched for activation
Device is mounted and the data directory is populated by
- LVM tags are assigned to theOSD data and journal volumes
Do the following step on an OSD node, and as the
root user, to prepare a simple OSD deployment using LVM:
ceph-volume lvm prepare --bluestore --data $VG_NAME/$LV_NAME
# ceph-volume lvm prepare --bluestore --data example_vg/data_lv
For BlueStore, you can also specify the
--block.wal options, if you want to use a separate device for RocksDB.
Here is an example of using FileStore with a partition as a journal device:
# ceph-volume lvm prepare --filestore --data example_vg/data_lv --journal /dev/sdc1
When using a partition, it must contain a
PARTUUID discoverable by the
blkid command, this way it can be identified correctly regardless of the device name or path.
ceph-volume LVM plugin does not create partitions on a raw disk device. Creating this partition has to be done before using a partition for the OSD journal device.
6.1.2. Activating OSDs
Once the prepare process is done, the OSD is ready to go active. The activation process enables a Systemd unit at boot time which allows the correct OSD identifier and its UUID to be enabled and mounted.
Here is the
activate workflow process:
- Requires both OSD id and OSD uuid
- Enable the systemd unit with matching OSD id and OSD uuid
- The systemd unit will ensure all devices are ready and mounted
ceph-osdsystemd unit will get started
Do the following step on an OSD node, and as the
root user, to activate an OSD:
ceph-volume lvm activate --filestore $OSD_ID $OSD_UUID
# ceph-volume lvm activate --filestore 0 0263644D-0BF1-4D6D-BC34-28BD98AE3BC8
There are no side-effects when running this command multiple times.
6.1.3. Creating OSDs
create subcommand wraps the two-step process to deploy a new OSD by calling the
prepare subcommand and then calling the
activate subcommand into a single subcommand. The reason to use
prepare and then
activate separately is to gradually introduce new OSDs into a storage cluster, and avoiding large amounts of data being rebalanced. There is nothing different to the process except the OSD will become up and in immediately after completion.
Do the following step, for FileStore, on an OSD node, and as the
ceph-volume lvm create --filestore --data $VG_NAME/$LV_NAME --journal $JOURNAL_DEVICE
# ceph-volume lvm create --filestore --data example_vg/data_lv --journal example_vg/journal_lv
Do the following step, for BlueStore, on an OSD node, and as the
# ceph-volume lvm create --bluestore --data <device>
# ceph-volume lvm create --bluestore --data /dev/sda
batch subcommand automates the creation of multiple OSDs when single devices are provided. The
ceph-volume command decides the best method in creating the OSDs based on drive type. This best method is dependant on the object store format, BlueStore or FileStore.
BlueStore is the default object store type for OSDs. When using BlueStore, OSD optimization depends on three different scenarios based on the devices being used. If all devices are traditional hard drives, then one OSD per device is created. If all devices are solid state drives, then two OSDs per device are created. If there is a mix of traditional hard drives and solid state drives, then data is put on the traditional hard drives, and the
block.db is created as large as possible on the solid state drive.
batch subcommand does not support the creating of a separate logical volume for the write-ahead-log (
# ceph-volume lvm batch --bluestore /dev/sda /dev/sdb /dev/nvme0n1
When using FileStore, OSD optimization depends on two different scenarios based on the devices being used. If all devices are traditional hard drives or are solid state drives, then one OSD per device is created, collocating the journal on the same device. If there is a mix of traditional hard drives and solid state drives, then data is put on the traditional hard drives, and the journal is created on the solid state drive using the sizing parameters specified in the Ceph configuration file, by default
ceph.conf, with a default journal size of 5 GB.
# ceph-volume lvm batch --filestore /dev/sda /dev/sdb