Red Hat Training
A Red Hat training course is available for Red Hat Ceph Storage
Chapter 10. Using NVMe with LVM Optimally
Summary
The procedures below demonstrate how to deploy Ceph for Object Gateway usage optimally when using high speed NVMe based SSDs (this applies to SATA SSDs too). Journals and bucket indexes will be placed together on high speed storage devices, which can increase performance compared to having all journals on one device. This configuration requires setting
osd_scenario
tolvm
.Procedures for two example configurations are provided:
- One NVMe device and at least four HDDs using one bucket index: One NVMe device
- Two NVMe devices and at least four HDDs using two bucket indexes: Two NVMe devices
Details
The most basic Ceph setup uses the
osd_scenario
setting ofcollocated
. This stores the OSD data and its journal on one storage device together (they are "co-located"). Typical server configurations include both HDDs and SSDs. Since HDDs are usually larger than SSDs, in a collocated configuration to utitlize the most storage space an HDD would be chosen, putting both the OSD data and journal on it alone. However, the journal should ideally be on a faster SSD. Another option is using theosd_scenario
setting ofnon-collocated
. This allows configuration of dedicated devices for journals, so you can put the OSD data on HDDs and the journals on SSDs.In addition to OSD data and journals, when using Object Gateway a bucket index needs to be stored on a device. In this case Ceph is often configured so that HDDs hold the OSD data, one SSD holds the journals, and another SSD holds the bucket indexes. This can create highly imbalanced situations where the SSD with all the journals becomes saturated while the SSD with bucket indexes is underutilized.
The solution is to set
osd_scenario
tolvm
and use Logical Volume Manager (LVM) to divide up single SSD devices for more than one purpose. This allows journals and bucket indexes to exist side by side on a single device. Most importantly, it allows journals to exist on more than one SSD, spreading the intense IO data transfer of the journals across more than one device.The normal Ansible playbooks provided by the
ceph-ansible
RPM used to install Ceph (site.yml, osds.yml, etc.) don’t support using one device for more than one purpose.In the future the normal Ansible playbooks will support using one device for more than one purpose. In the meantime the playbooks
lv-create.yml
andlv-vars.yaml
are being provided to facilitate creating the required Logicial Volumes (LVs) for optimal SSD usage. Afterlv-create.yml
is runsite.yml
can be run normally and it will use the newly created LVs.ImportantThese procedures only apply to the FileStore storage backend, not the newer BlueStore storage backend.
10.1. Using One NVMe Device
Follow this procedure to deploy Ceph for Object Gateway usage with one NVMe device.
10.1.1. Purge Any Existing Ceph cluster
If Ceph is already configured, purge it in order to start over. An ansible playbook file named purge-cluster.yml
is provided for this purpose.
$ ansible-playbook purge-cluster.yml
For more information on how to use purge-cluster.yml
see Purging a Ceph Cluster by Using Ansible in the Installation Guide for Red Hat Enterprise Linux or Installation Guide for Ubuntu depending on your chosen Linux distribution.
Purging the cluster may not be enough to prepare the servers for redeploying Ceph using the following procedures. Any file system, GPT, RAID, or other signatures on storage devices used by Ceph may cause problems. Instructions to remove any signatures using wipefs
are provided under Run The lv-create.yml Ansible Playbook.
10.1.2. Configure The Cluster for Normal Installation
Setting aside any NVMe and/or LVM considerations, configure the cluster as you would normally but stop before running ansible-playbook site.yml
. Afterwards, the cluster installation configuration will be adjusted specifically for optimal NVMe/LVM usage to support the Object Gateway. Only at that time should ansible-playbook site.yml
be run.
To configure the cluster for normal installation consult the Installation Guide for Red Hat Enterprise Linux or Installation Guide for Ubuntu depending on your chosen Linux distribution. In particular, complete the steps in Installing a Red Hat Ceph Storage Cluster through Step 9 creating an Ansible log directory. Stop before Step 10 when ansible-playbook site.yml
is run.
Do not run ansible-playbook site.yml
until all the steps after this and before Install Ceph for NVMe and Verify Success have been completed.
10.1.3. Identify The NVMe and HDD Devices
Use lsblk
to identify the NVMe and HDD devices connected to the server. Example output from lsblk
is listed below:
[root@c04-h05-6048r ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 465.8G 0 disk ├─sda1 8:1 0 4G 0 part │ └─md1 9:1 0 4G 0 raid1 [SWAP] ├─sda2 8:2 0 512M 0 part │ └─md0 9:0 0 512M 0 raid1 /boot └─sda3 8:3 0 461.3G 0 part └─md2 9:2 0 461.1G 0 raid1 / sdb 8:16 0 465.8G 0 disk ├─sdb1 8:17 0 4G 0 part │ └─md1 9:1 0 4G 0 raid1 [SWAP] ├─sdb2 8:18 0 512M 0 part │ └─md0 9:0 0 512M 0 raid1 /boot └─sdb3 8:19 0 461.3G 0 part └─md2 9:2 0 461.1G 0 raid1 / sdc 8:32 0 1.8T 0 disk sdd 8:48 0 1.8T 0 disk sde 8:64 0 1.8T 0 disk sdf 8:80 0 1.8T 0 disk sdg 8:96 0 1.8T 0 disk sdh 8:112 0 1.8T 0 disk sdi 8:128 0 1.8T 0 disk sdj 8:144 0 1.8T 0 disk sdk 8:160 0 1.8T 0 disk sdl 8:176 0 1.8T 0 disk sdm 8:192 0 1.8T 0 disk sdn 8:208 0 1.8T 0 disk sdo 8:224 0 1.8T 0 disk sdp 8:240 0 1.8T 0 disk sdq 65:0 0 1.8T 0 disk sdr 65:16 0 1.8T 0 disk sds 65:32 0 1.8T 0 disk sdt 65:48 0 1.8T 0 disk sdu 65:64 0 1.8T 0 disk sdv 65:80 0 1.8T 0 disk sdw 65:96 0 1.8T 0 disk sdx 65:112 0 1.8T 0 disk sdy 65:128 0 1.8T 0 disk sdz 65:144 0 1.8T 0 disk sdaa 65:160 0 1.8T 0 disk sdab 65:176 0 1.8T 0 disk sdac 65:192 0 1.8T 0 disk sdad 65:208 0 1.8T 0 disk sdae 65:224 0 1.8T 0 disk sdaf 65:240 0 1.8T 0 disk sdag 66:0 0 1.8T 0 disk sdah 66:16 0 1.8T 0 disk sdai 66:32 0 1.8T 0 disk sdaj 66:48 0 1.8T 0 disk sdak 66:64 0 1.8T 0 disk sdal 66:80 0 1.8T 0 disk nvme0n1 259:0 0 745.2G 0 disk nvme1n1 259:1 0 745.2G 0 disk
In this example the following raw block devices will be used:
NVMe devices
-
/dev/nvme0n1
HDD devices
-
/dev/sdc
-
/dev/sdd
-
/dev/sde
-
/dev/sdf
The file lv_vars.yaml
configures logical volume creation on the chosen devices. It creates journals on NVMe, an NVMe based bucket index, and HDD based OSDs. The actual creation of logical volumes is initiated by lv-create.yml
, which reads lv_vars.yaml
.
That file should only have one NVMe device referenced in it at a time. For information on using Ceph with two NVMe devices optimally see Using Two NVMe Devices.
10.1.4. Add The Devices to lv_vars.yaml
As
root
, navigate to the/usr/share/ceph-ansible/
directory:# cd /usr/share/ceph-ansible
As
root
, copy thelv_vars.yaml
Ansible playbook to the current directory:# cp infrastructure-playbooks/vars/lv_vars.yaml .
Edit the file so it includes the following lines:
nvme_device: /dev/nvme0n1 hdd_devices: - /dev/sdc - /dev/sdd - /dev/sde - /dev/sdf
10.1.5. Run The lv-create.yml Ansible Playbook
The purpose of the lv-create.yml
playbook is to create logical volumes for the object gateway bucket index, and journals, on a single NVMe. It does this by using osd_scenario=lvm
as opposed to using osd_scenario=non-collocated
. The lv-create.yml
Ansible playbook makes it easier to configure Ceph in this way by automating some of the complex LVM creation and configuration.
As
root
, copy thelv-create.yml
Ansible playbook to the current directory:# cp infrastructure-playbooks/lv-create.yml .
Ensure the storage devices are raw
Before running
lv-create.yml
to create the logical volumes on the NVMe devices and HDD devices, ensure there are no file system, GPT, RAID, or other signatures on them.If they are not raw, when you run
lv-create.yml
it may fail with the following error:device /dev/sdc excluded by a filter
Wipe storage device signatures (optional)
If the devices have signatures you can use
wipefs
to erase them.An example of using
wipefs
to erase the devices is shown below:[root@c04-h01-6048r ~]# wipefs -a /dev/sdc /dev/sdc: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54 /dev/sdc: 8 bytes were erased at offset 0x1d19ffffe00 (gpt): 45 46 49 20 50 41 52 54 /dev/sdc: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa /dev/sdc: calling ioclt to re-read partition table: Success [root@c04-h01-6048r ~]# wipefs -a /dev/sdd /dev/sdd: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54 /dev/sdd: 8 bytes were erased at offset 0x1d19ffffe00 (gpt): 45 46 49 20 50 41 52 54 /dev/sdd: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa /dev/sdd: calling ioclt to re-read partition table: Success [root@c04-h01-6048r ~]# wipefs -a /dev/sde /dev/sde: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54 /dev/sde: 8 bytes were erased at offset 0x1d19ffffe00 (gpt): 45 46 49 20 50 41 52 54 /dev/sde: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa /dev/sde: calling ioclt to re-read partition table: Success [root@c04-h01-6048r ~]# wipefs -a /dev/sdf /dev/sdf: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54 /dev/sdf: 8 bytes were erased at offset 0x1d19ffffe00 (gpt): 45 46 49 20 50 41 52 54 /dev/sdf: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa /dev/sdf: calling ioclt to re-read partition table: Success
Run the
lv-teardown.yml
Ansible playbook:Always run
lv-teardown.yml
before runninglv-create.yml
:As
root
, copy thelv-teardown.yml
Ansible playbook to the current directory:# cp infrastructure-playbooks/lv-teardown.yml .
Run the
lv-teardown.yml
Ansible playbook:$ ansible-playbook lv-teardown.yml
WarningProceed with caution when running the
lv-teardown.yml
Ansible script. It destroys data. Ensure you have backups of any important data.Run the
lv-create.yml
Ansible playbook:$ ansible-playbook lv-create.yml
-
Once
lv-create.yml
completes without error continue to the next section to verify it worked properly.
10.1.6. Verify LVM Configuration
Review
lv-created.log
Once the
lv-create.yml
Ansible playbook completes successfully, configuration information will be written tolv-created.log
. Later this information will be copied intogroup_vars/osds.yml
. Openlv-created.log
and look for information similar to the below example:- data: ceph-bucket-index-1 data_vg: ceph-nvme-vg-nvme0n1 journal: ceph-journal-bucket-index-1-nvme0n1 journal_vg: ceph-nvme-vg-nvme0n1 - data: ceph-hdd-lv-sdc data_vg: ceph-hdd-vg-sdc journal: ceph-journal-sdc journal_vg: ceph-nvme-vg-nvme0n1 - data: ceph-hdd-lv-sdd data_vg: ceph-hdd-vg-sdd journal: ceph-journal-sdd journal_vg: ceph-nvme-vg-nvme0n1 - data: ceph-hdd-lv-sde data_vg: ceph-hdd-vg-sde journal: ceph-journal-sde journal_vg: ceph-nvme-vg-nvme0n1 - data: ceph-hdd-lv-sdf data_vg: ceph-hdd-vg-sdf journal: ceph-journal-sdf journal_vg: ceph-nvme-vg-nvme0n1
Review LVM configuration
Based on the example of one NVMe device and four HDDs the following Logical Volumes (LVs) should be created:
One journal LV per HDD placed on NVMe (four LVs on /dev/nvme0n1)
One data LV per HDD placed on each HDD (one LV per HDD)
One journal LV for bucket index placed on NVMe (one LV on /dev/nvme0n1)
One data LV for bucket index placed on NVMe (one LV on /dev/nvme0n1)
The LVs can be seen in
lsblk
andlvscan
output. In the example explained above, there should be ten LVs for Ceph. As a rough sanity check you could count the Ceph LVs to make sure there are at least ten, but ideally you would make sure the appropriate LVs were created on the right storage devices (NVMe vs HDD).Example output from
lsblk
is shown below:[root@c04-h01-6048r ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 465.8G 0 disk ├─sda1 8:1 0 4G 0 part │ └─md1 9:1 0 4G 0 raid1 [SWAP] ├─sda2 8:2 0 512M 0 part │ └─md0 9:0 0 512M 0 raid1 /boot └─sda3 8:3 0 461.3G 0 part └─md2 9:2 0 461.1G 0 raid1 / sdb 8:16 0 465.8G 0 disk ├─sdb1 8:17 0 4G 0 part │ └─md1 9:1 0 4G 0 raid1 [SWAP] ├─sdb2 8:18 0 512M 0 part │ └─md0 9:0 0 512M 0 raid1 /boot └─sdb3 8:19 0 461.3G 0 part └─md2 9:2 0 461.1G 0 raid1 / sdc 8:32 0 1.8T 0 disk └─ceph--hdd--vg--sdc-ceph--hdd--lv--sdc 253:6 0 1.8T 0 lvm sdd 8:48 0 1.8T 0 disk └─ceph--hdd--vg--sdd-ceph--hdd--lv--sdd 253:7 0 1.8T 0 lvm sde 8:64 0 1.8T 0 disk └─ceph--hdd--vg--sde-ceph--hdd--lv--sde 253:8 0 1.8T 0 lvm sdf 8:80 0 1.8T 0 disk └─ceph--hdd--vg--sdf-ceph--hdd--lv--sdf 253:9 0 1.8T 0 lvm sdg 8:96 0 1.8T 0 disk sdh 8:112 0 1.8T 0 disk sdi 8:128 0 1.8T 0 disk sdj 8:144 0 1.8T 0 disk sdk 8:160 0 1.8T 0 disk sdl 8:176 0 1.8T 0 disk sdm 8:192 0 1.8T 0 disk sdn 8:208 0 1.8T 0 disk sdo 8:224 0 1.8T 0 disk sdp 8:240 0 1.8T 0 disk sdq 65:0 0 1.8T 0 disk sdr 65:16 0 1.8T 0 disk sds 65:32 0 1.8T 0 disk sdt 65:48 0 1.8T 0 disk sdu 65:64 0 1.8T 0 disk sdv 65:80 0 1.8T 0 disk sdw 65:96 0 1.8T 0 disk sdx 65:112 0 1.8T 0 disk sdy 65:128 0 1.8T 0 disk sdz 65:144 0 1.8T 0 disk sdaa 65:160 0 1.8T 0 disk sdab 65:176 0 1.8T 0 disk sdac 65:192 0 1.8T 0 disk sdad 65:208 0 1.8T 0 disk sdae 65:224 0 1.8T 0 disk sdaf 65:240 0 1.8T 0 disk sdag 66:0 0 1.8T 0 disk sdah 66:16 0 1.8T 0 disk sdai 66:32 0 1.8T 0 disk sdaj 66:48 0 1.8T 0 disk sdak 66:64 0 1.8T 0 disk sdal 66:80 0 1.8T 0 disk nvme0n1 259:0 0 745.2G 0 disk ├─ceph--nvme--vg--nvme0n1-ceph--journal--bucket--index--1--nvme0n1 253:0 0 5.4G 0 lvm ├─ceph--nvme--vg--nvme0n1-ceph--journal--sdc 253:1 0 5.4G 0 lvm ├─ceph--nvme--vg--nvme0n1-ceph--journal--sdd 253:2 0 5.4G 0 lvm ├─ceph--nvme--vg--nvme0n1-ceph--journal--sde 253:3 0 5.4G 0 lvm ├─ceph--nvme--vg--nvme0n1-ceph--journal--sdf 253:4 0 5.4G 0 lvm └─ceph--nvme--vg--nvme0n1-ceph--bucket--index--1 253:5 0 718.4G 0 lvm nvme1n1 259:1 0 745.2G 0 disk
Example
lvscan
output is below:[root@c04-h01-6048r ~]# lvscan ACTIVE '/dev/ceph-hdd-vg-sdf/ceph-hdd-lv-sdf' [<1.82 TiB] inherit ACTIVE '/dev/ceph-hdd-vg-sde/ceph-hdd-lv-sde' [<1.82 TiB] inherit ACTIVE '/dev/ceph-hdd-vg-sdd/ceph-hdd-lv-sdd' [<1.82 TiB] inherit ACTIVE '/dev/ceph-nvme-vg-nvme0n1/ceph-journal-bucket-index-1-nvme0n1' [5.37 GiB] inherit ACTIVE '/dev/ceph-nvme-vg-nvme0n1/ceph-journal-sdc' [5.37 GiB] inherit ACTIVE '/dev/ceph-nvme-vg-nvme0n1/ceph-journal-sdd' [5.37 GiB] inherit ACTIVE '/dev/ceph-nvme-vg-nvme0n1/ceph-journal-sde' [5.37 GiB] inherit ACTIVE '/dev/ceph-nvme-vg-nvme0n1/ceph-journal-sdf' [5.37 GiB] inherit ACTIVE '/dev/ceph-nvme-vg-nvme0n1/ceph-bucket-index-1' [<718.36 GiB] inherit ACTIVE '/dev/ceph-hdd-vg-sdc/ceph-hdd-lv-sdc' [<1.82 TiB] inherit
10.1.7. Edit The osds.yml and all.yml Ansible Playbooks
-
Copy the previously mentioned configuration information from
lv-created.log
intogroup_vars/osds.yml
under thelvm_volumes:
line. Set
osd_scenario:
tolvm
:osd_scenario: lvm
Set
osd_objectstore: filestore
inall.yml
andosds.yml
.The
osds.yml
file should look similar to this:# Variables here are applicable to all host groups NOT roles osd_objectstore: filestore osd_scenario: lvm lvm_volumes: - data: ceph-bucket-index-1 data_vg: ceph-nvme-vg-nvme0n1 journal: ceph-journal-bucket-index-1-nvme0n1 journal_vg: ceph-nvme-vg-nvme0n1 - data: ceph-hdd-lv-sdc data_vg: ceph-hdd-vg-sdc journal: ceph-journal-sdc journal_vg: ceph-nvme-vg-nvme0n1 - data: ceph-hdd-lv-sdd data_vg: ceph-hdd-vg-sdd journal: ceph-journal-sdd journal_vg: ceph-nvme-vg-nvme0n1 - data: ceph-hdd-lv-sde data_vg: ceph-hdd-vg-sde journal: ceph-journal-sde journal_vg: ceph-nvme-vg-nvme0n1 - data: ceph-hdd-lv-sdf data_vg: ceph-hdd-vg-sdf journal: ceph-journal-sdf journal_vg: ceph-nvme-vg-nvme0n1
10.1.8. Install Ceph for NVMe and Verify Success
After configuring Ceph for installation to use NVMe with LVM optimally, install it.
Run the
site.yml
Ansible playbook to install Ceph$ ansible-playbook -v -i hosts site.yml
Verify Ceph is running properly after install completes
# ceph -s
# ceph osd tree
Example
ceph -s
output showing Ceph is running properly:# ceph -s cluster: id: 15d31a8c-3152-4fa2-8c4e-809b750924cd health: HEALTH_WARN Reduced data availability: 32 pgs inactive services: mon: 3 daemons, quorum b08-h03-r620,b08-h05-r620,b08-h06-r620 mgr: b08-h03-r620(active), standbys: b08-h05-r620, b08-h06-r620 osd: 35 osds: 35 up, 35 in data: pools: 4 pools, 32 pgs objects: 0 objects, 0 bytes usage: 0 kB used, 0 kB / 0 kB avail pgs: 100.000% pgs unknown 32 unknown
Example
ceph osd tree
output showing Ceph is running properly:[root@c04-h01-6048r ~]# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 55.81212 root default -15 7.97316 host c04-h01-6048r 13 hdd 1.81799 osd.13 up 1.00000 1.00000 20 hdd 1.81799 osd.20 up 1.00000 1.00000 26 hdd 1.81799 osd.26 up 1.00000 1.00000 32 hdd 1.81799 osd.32 up 1.00000 1.00000 6 ssd 0.70119 osd.6 up 1.00000 1.00000 -3 7.97316 host c04-h05-6048r 12 hdd 1.81799 osd.12 up 1.00000 1.00000 17 hdd 1.81799 osd.17 up 1.00000 1.00000 23 hdd 1.81799 osd.23 up 1.00000 1.00000 29 hdd 1.81799 osd.29 up 1.00000 1.00000 2 ssd 0.70119 osd.2 up 1.00000 1.00000 -13 7.97316 host c04-h09-6048r 11 hdd 1.81799 osd.11 up 1.00000 1.00000 16 hdd 1.81799 osd.16 up 1.00000 1.00000 22 hdd 1.81799 osd.22 up 1.00000 1.00000 27 hdd 1.81799 osd.27 up 1.00000 1.00000 4 ssd 0.70119 osd.4 up 1.00000 1.00000 -5 7.97316 host c04-h13-6048r 10 hdd 1.81799 osd.10 up 1.00000 1.00000 15 hdd 1.81799 osd.15 up 1.00000 1.00000 21 hdd 1.81799 osd.21 up 1.00000 1.00000 28 hdd 1.81799 osd.28 up 1.00000 1.00000 1 ssd 0.70119 osd.1 up 1.00000 1.00000 -9 7.97316 host c04-h21-6048r 8 hdd 1.81799 osd.8 up 1.00000 1.00000 18 hdd 1.81799 osd.18 up 1.00000 1.00000 25 hdd 1.81799 osd.25 up 1.00000 1.00000 30 hdd 1.81799 osd.30 up 1.00000 1.00000 5 ssd 0.70119 osd.5 up 1.00000 1.00000 -11 7.97316 host c04-h25-6048r 9 hdd 1.81799 osd.9 up 1.00000 1.00000 14 hdd 1.81799 osd.14 up 1.00000 1.00000 33 hdd 1.81799 osd.33 up 1.00000 1.00000 34 hdd 1.81799 osd.34 up 1.00000 1.00000 0 ssd 0.70119 osd.0 up 1.00000 1.00000 -7 7.97316 host c04-h29-6048r 7 hdd 1.81799 osd.7 up 1.00000 1.00000 19 hdd 1.81799 osd.19 up 1.00000 1.00000 24 hdd 1.81799 osd.24 up 1.00000 1.00000 31 hdd 1.81799 osd.31 up 1.00000 1.00000 3 ssd 0.70119 osd.3 up 1.00000 1.00000
Ceph is now set up to use one NVMe device and LVM optimally for Object Storage Gateway.
10.2. Using Two NVMe Devices
Follow this procedure to deploy Ceph for Object Gateway usage with two NVMe devices.
10.2.1. Purge Any Existing Ceph Cluster
If Ceph is already configured, purge it in order to start over. An ansible playbook file named purge-cluster.yml
is provided for this purpose.
$ ansible-playbook purge-cluster.yml
For more information on how to use purge-cluster.yml
see Purging a Ceph Cluster by Using Ansible in the Installation Guide for Red Hat Enterprise Linux or Installation Guide for Ubuntu depending on your chosen Linux distribution.
Purging the cluster may not be enough to prepare the servers for redeploying Ceph using the following procedures. Any file system, GPT, RAID, or other signatures on storage devices used by Ceph may cause problems. Instructions to remove any signatures using wipefs
are provided under Run The lv-create.yml Ansible Playbook.
10.2.2. Configure The Cluster for Normal Installation
Setting aside any NVMe and/or LVM considerations, configure the cluster as you would normally but stop before running ansible-playbook site.yml
. Afterwards, the cluster installation configuration will be adjusted specifically for optimal NVMe/LVM usage to support the Object Gateway. Only at that time should ansible-playbook site.yml
be run.
To configure the cluster for normal installation consult the Installation Guide for Red Hat Enterprise Linux or Installation Guide for Ubuntu depending on your chosen Linux distribution. In particular, complete the steps in Installing a Red Hat Ceph Storage Cluster through Step 9 creating an Ansible log directory. Stop before Step 10 when ansible-playbook site.yml
is run.
Do not run ansible-playbook site.yml
until all the steps after this and before Install Ceph for NVMe and Verify Success have been completed.
10.2.3. Identify The NVMe and HDD Devices
Use lsblk
to identify the NVMe and HDD devices connected to the server. Example output from lsblk
is listed below:
[root@c04-h09-6048r ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 465.8G 0 disk ├─sda1 8:1 0 512M 0 part /boot └─sda2 8:2 0 465.3G 0 part ├─vg_c04--h09--6048r-lv_root 253:0 0 464.8G 0 lvm / └─vg_c04--h09--6048r-lv_swap 253:1 0 512M 0 lvm [SWAP] sdb 8:16 0 465.8G 0 disk sdc 8:32 0 1.8T 0 disk sdd 8:48 0 1.8T 0 disk sde 8:64 0 1.8T 0 disk sdf 8:80 0 1.8T 0 disk sdg 8:96 0 1.8T 0 disk sdh 8:112 0 1.8T 0 disk sdi 8:128 0 1.8T 0 disk sdj 8:144 0 1.8T 0 disk sdk 8:160 0 1.8T 0 disk sdl 8:176 0 1.8T 0 disk sdm 8:192 0 1.8T 0 disk sdn 8:208 0 1.8T 0 disk sdo 8:224 0 1.8T 0 disk sdp 8:240 0 1.8T 0 disk sdq 65:0 0 1.8T 0 disk sdr 65:16 0 1.8T 0 disk sds 65:32 0 1.8T 0 disk sdt 65:48 0 1.8T 0 disk sdu 65:64 0 1.8T 0 disk sdv 65:80 0 1.8T 0 disk sdw 65:96 0 1.8T 0 disk sdx 65:112 0 1.8T 0 disk sdy 65:128 0 1.8T 0 disk sdz 65:144 0 1.8T 0 disk sdaa 65:160 0 1.8T 0 disk sdab 65:176 0 1.8T 0 disk sdac 65:192 0 1.8T 0 disk sdad 65:208 0 1.8T 0 disk sdae 65:224 0 1.8T 0 disk sdaf 65:240 0 1.8T 0 disk sdag 66:0 0 1.8T 0 disk sdah 66:16 0 1.8T 0 disk sdai 66:32 0 1.8T 0 disk sdaj 66:48 0 1.8T 0 disk sdak 66:64 0 1.8T 0 disk sdal 66:80 0 1.8T 0 disk nvme0n1 259:1 0 745.2G 0 disk nvme1n1 259:0 0 745.2G 0 disk
In this example the following raw block devices will be used:
NVMe devices
-
/dev/nvme0n1
-
/dev/nvme1n1
HDD devices
-
/dev/sdc
-
/dev/sdd
-
/dev/sde
-
/dev/sdf
The file lv_vars.yaml
configures logical volume creation on the chosen devices. It creates journals on NVMe, an NVMe based bucket index, and HDD based OSDs. The actual creation of logical volumes is initiated by lv-create.yml
, which reads lv_vars.yaml
.
That file should only have one NVMe device referenced in it at a time. It should also only reference the HDD devices to be associated with that particular NVMe device. For OSDs that contain more than one NVMe device edit lv_vars.yaml
for each NVMe and run lv-create.yml
repeatedly for each NVMe. This is explained below.
In the example this means lv-create.yml
will first be run on /dev/nvme0n1
and then again on /dev/nvme1n1
.
10.2.4. Add The Devices to lv_vars.yaml
As
root
, navigate to the/usr/share/ceph-ansible/
directory:# cd /usr/share/ceph-ansible
As
root
, copy thelv_vars.yaml
Ansible playbook to the current directory:# cp infrastructure-playbooks/vars/lv_vars.yaml .
For the first run edit the file so it includes the following lines:
nvme_device: /dev/nvme0n1 hdd_devices: - /dev/sdc - /dev/sdd
The journal size, number of bucket indexes, their sizes and names, and the bucket indexes' journal names can all be adjusted in lv_vars.yaml
. See the comments within the file for more information.
10.2.5. Run The lv-create.yml Ansible Playbook
The purpose of the lv-create.yml
playbook is to create logical volumes for the object gateway bucket index, and journals, on a single NVMe. It does this by using osd_scenario=lvm
as opposed to using osd_scenario=non-collocated
. The lv-create.yml
Ansible playbook makes it easier to configure Ceph in this way by automating some of the complex LVM creation and configuration.
As
root
, copy thelv-create.yml
Ansible playbook to the current directory:# cp infrastructure-playbooks/lv-create.yml .
Ensure the storage devices are raw
Before running
lv-create.yml
to create the logical volumes on the NVMe devices and HDD devices, ensure there are no file system, GPT, RAID, or other signatures on them.If they are not raw, when you run
lv-create.yml
it may fail with the following error:device /dev/sdc excluded by a filter
Wipe storage device signatures (optional)
If the devices have signatures you can use
wipefs
to erase them.An example of using
wipefs
to erase the devices is shown below:[root@c04-h01-6048r ~]# wipefs -a /dev/sdc /dev/sdc: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54 /dev/sdc: 8 bytes were erased at offset 0x1d19ffffe00 (gpt): 45 46 49 20 50 41 52 54 /dev/sdc: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa /dev/sdc: calling ioclt to re-read partition table: Success [root@c04-h01-6048r ~]# wipefs -a /dev/sdd /dev/sdd: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54 /dev/sdd: 8 bytes were erased at offset 0x1d19ffffe00 (gpt): 45 46 49 20 50 41 52 54 /dev/sdd: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa /dev/sdd: calling ioclt to re-read partition table: Success [root@c04-h01-6048r ~]# wipefs -a /dev/sde /dev/sde: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54 /dev/sde: 8 bytes were erased at offset 0x1d19ffffe00 (gpt): 45 46 49 20 50 41 52 54 /dev/sde: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa /dev/sde: calling ioclt to re-read partition table: Success [root@c04-h01-6048r ~]# wipefs -a /dev/sdf /dev/sdf: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54 /dev/sdf: 8 bytes were erased at offset 0x1d19ffffe00 (gpt): 45 46 49 20 50 41 52 54 /dev/sdf: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa /dev/sdf: calling ioclt to re-read partition table: Success
Run the
lv-teardown.yml
Ansible playbook:Always run
lv-teardown.yml
before runninglv-create.yml
:As
root
, copy thelv-teardown.yml
Ansible playbook to the current directory:# cp infrastructure-playbooks/lv-teardown.yml .
Run the
lv-teardown.yml
Ansible playbook:$ ansible-playbook lv-teardown.yml
WarningProceed with caution when running the
lv-teardown.yml
Ansible script. It destroys data. Ensure you have backups of any important data.Run the
lv-create.yml
Ansible playbook:$ ansible-playbook lv-create.yml
10.2.6. Copy First NVMe LVM Configuration
Review
lv-created.log
Once the
lv-create.yml
Ansible playbook completes successfully, configuration information will be written tolv-created.log
. Openlv-created.log
and look for information similar to the below example:- data: ceph-bucket-index-1 data_vg: ceph-nvme-vg-nvme0n1 journal: ceph-journal-bucket-index-1-nvme0n1 journal_vg: ceph-nvme-vg-nvme0n1 - data: ceph-hdd-lv-sdc data_vg: ceph-hdd-vg-sdc journal: ceph-journal-sdc journal_vg: ceph-nvme-vg-nvme0n1 - data: ceph-hdd-lv-sdd data_vg: ceph-hdd-vg-sdd journal: ceph-journal-sdd journal_vg: ceph-nvme-vg-nvme0n1
-
Copy this information into
group_vars/osds.yml
underlvm_volumes:
.
10.2.7. Run The lv-create.yml
Playbook on NVMe device two
The following instructions are abbreviated steps to set up a second NVMe device. Consult the related steps above for further context if needed.
Modify
lv-vars.yaml
to use the second NVMe and associated HDDs.Following the previous example,
lv-vars.yaml
will now have the following devices set:nvme_device: /dev/nvme1n1 hdd_devices: - /dev/sde - /dev/sdf
Run
lv-teardown.yml
:$ ansible-playbook lv-teardown.yml
Run
lv-create.yml
again$ ansible-playbook lv-create.yml
10.2.8. Copy Second NVMe LVM Configuration
Review
lv-created.log
Once the
lv-create.yml
Ansible playbook completes successfully, configuration information will be written tolv-created.log
. Openlv-created.log
and look for information similar to the below example:- data: ceph-bucket-index-1 data_vg: ceph-nvme-vg-nvme1n1 journal: ceph-journal-bucket-index-1-nvme1n1 journal_vg: ceph-nvme-vg-nvme1n1 - data: ceph-hdd-lv-sde data_vg: ceph-hdd-vg-sde journal: ceph-journal-sde journal_vg: ceph-nvme-vg-nvme1n1 - data: ceph-hdd-lv-sdf data_vg: ceph-hdd-vg-sdf journal: ceph-journal-sdf journal_vg: ceph-nvme-vg-nvme1n1
-
Copy this information into
group_vars/osds.yml
under the already entered information underlvm_volumes:
.
10.2.9. Verify LVM Configuration
Review LVM Configuration
Based on the example of two NVMe device and four HDDs the following Logical Volumes (LVs) should be created:
One journal LV per HDD placed on both NVMe devices (two LVs on /dev/nvme0n1, two on /dev/nvme1n1)
One data LV per HDD placed on each HDD (one LV per HDD)
One journal LV per bucket index placed on NVMe (one LV on /dev/nvme0n1, one LV on /dev/nvme1n1)
One data LV per bucket index placed on both NVMe devices (one LV on /dev/nvme0n1, one LV on /dev/nvme1n1)
The LVs can be seen in
lsblk
andlvscan
output. In the example explained above, there should be twelve LVs for Ceph. As a rough sanity check you could count the Ceph LVs to make sure there are at least twelve, but ideally you would make sure the appropriate LVs were created on the right storage devices (NVMe vs HDD).Example output from
lsblk
is shown below:[root@c04-h01-6048r ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 465.8G 0 disk ├─sda1 8:1 0 4G 0 part │ └─md1 9:1 0 4G 0 raid1 [SWAP] ├─sda2 8:2 0 512M 0 part │ └─md0 9:0 0 512M 0 raid1 /boot └─sda3 8:3 0 461.3G 0 part └─md2 9:2 0 461.1G 0 raid1 / sdb 8:16 0 465.8G 0 disk ├─sdb1 8:17 0 4G 0 part │ └─md1 9:1 0 4G 0 raid1 [SWAP] ├─sdb2 8:18 0 512M 0 part │ └─md0 9:0 0 512M 0 raid1 /boot └─sdb3 8:19 0 461.3G 0 part └─md2 9:2 0 461.1G 0 raid1 / sdc 8:32 0 1.8T 0 disk └─ceph--hdd--vg--sdc-ceph--hdd--lv--sdc 253:4 0 1.8T 0 lvm sdd 8:48 0 1.8T 0 disk └─ceph--hdd--vg--sdd-ceph--hdd--lv--sdd 253:5 0 1.8T 0 lvm sde 8:64 0 1.8T 0 disk └─ceph--hdd--vg--sde-ceph--hdd--lv--sde 253:10 0 1.8T 0 lvm sdf 8:80 0 1.8T 0 disk └─ceph--hdd--vg--sdf-ceph--hdd--lv--sdf 253:11 0 1.8T 0 lvm sdg 8:96 0 1.8T 0 disk sdh 8:112 0 1.8T 0 disk sdi 8:128 0 1.8T 0 disk sdj 8:144 0 1.8T 0 disk sdk 8:160 0 1.8T 0 disk sdl 8:176 0 1.8T 0 disk sdm 8:192 0 1.8T 0 disk sdn 8:208 0 1.8T 0 disk sdo 8:224 0 1.8T 0 disk sdp 8:240 0 1.8T 0 disk sdq 65:0 0 1.8T 0 disk sdr 65:16 0 1.8T 0 disk sds 65:32 0 1.8T 0 disk sdt 65:48 0 1.8T 0 disk sdu 65:64 0 1.8T 0 disk sdv 65:80 0 1.8T 0 disk sdw 65:96 0 1.8T 0 disk sdx 65:112 0 1.8T 0 disk sdy 65:128 0 1.8T 0 disk sdz 65:144 0 1.8T 0 disk sdaa 65:160 0 1.8T 0 disk sdab 65:176 0 1.8T 0 disk sdac 65:192 0 1.8T 0 disk sdad 65:208 0 1.8T 0 disk sdae 65:224 0 1.8T 0 disk sdaf 65:240 0 1.8T 0 disk sdag 66:0 0 1.8T 0 disk sdah 66:16 0 1.8T 0 disk sdai 66:32 0 1.8T 0 disk sdaj 66:48 0 1.8T 0 disk sdak 66:64 0 1.8T 0 disk sdal 66:80 0 1.8T 0 disk nvme0n1 259:0 0 745.2G 0 disk ├─ceph--nvme--vg--nvme0n1-ceph--journal--bucket--index--1--nvme0n1 253:0 0 5.4G 0 lvm ├─ceph--nvme--vg--nvme0n1-ceph--journal--sdc 253:1 0 5.4G 0 lvm ├─ceph--nvme--vg--nvme0n1-ceph--journal--sdd 253:2 0 5.4G 0 lvm └─ceph--nvme--vg--nvme0n1-ceph--bucket--index--1 253:3 0 729.1G 0 lvm nvme1n1 259:1 0 745.2G 0 disk ├─ceph--nvme--vg--nvme1n1-ceph--journal--bucket--index--1--nvme1n1 253:6 0 5.4G 0 lvm ├─ceph--nvme--vg--nvme1n1-ceph--journal--sde 253:7 0 5.4G 0 lvm ├─ceph--nvme--vg--nvme1n1-ceph--journal--sdf 253:8 0 5.4G 0 lvm └─ceph--nvme--vg--nvme1n1-ceph--bucket--index--1 253:9 0 729.1G 0 lvm
Example output from
lvscan
is shown below:[root@c04-h01-6048r ~]# lvscan ACTIVE '/dev/ceph-hdd-vg-sde/ceph-hdd-lv-sde' [<1.82 TiB] inherit ACTIVE '/dev/ceph-hdd-vg-sdc/ceph-hdd-lv-sdc' [<1.82 TiB] inherit ACTIVE '/dev/ceph-hdd-vg-sdf/ceph-hdd-lv-sdf' [<1.82 TiB] inherit ACTIVE '/dev/ceph-nvme-vg-nvme1n1/ceph-journal-bucket-index-1-nvme1n1' [5.37 GiB] inherit ACTIVE '/dev/ceph-nvme-vg-nvme1n1/ceph-journal-sde' [5.37 GiB] inherit ACTIVE '/dev/ceph-nvme-vg-nvme1n1/ceph-journal-sdf' [5.37 GiB] inherit ACTIVE '/dev/ceph-nvme-vg-nvme1n1/ceph-bucket-index-1' [<729.10 GiB] inherit ACTIVE '/dev/ceph-nvme-vg-nvme0n1/ceph-journal-bucket-index-1-nvme0n1' [5.37 GiB] inherit ACTIVE '/dev/ceph-nvme-vg-nvme0n1/ceph-journal-sdc' [5.37 GiB] inherit ACTIVE '/dev/ceph-nvme-vg-nvme0n1/ceph-journal-sdd' [5.37 GiB] inherit ACTIVE '/dev/ceph-nvme-vg-nvme0n1/ceph-bucket-index-1' [<729.10 GiB] inherit ACTIVE '/dev/ceph-hdd-vg-sdd/ceph-hdd-lv-sdd' [<1.82 TiB] inherit
10.2.10. Edit The osds.yml and all.yml Ansible Playbooks
Set
osd_objectstore
tofilestore
In addition to adding the second set of information from
lv-create.log
intoosds.yml
,osd_objectstore
also needs to be set tofilestore
in both theosds.yml
andall.yml
files.The line should look like this in both
osds.yml
andall.yml
:osd_objectstore: filestore
Set
osd_scenario
tolvm
inosds.yml
The
osds.yml
file should look similar to the following example:# Variables here are applicable to all host groups NOT roles osd_objectstore: filestore osd_scenario: lvm lvm_volumes: - data: ceph-bucket-index-1 data_vg: ceph-nvme-vg-nvme0n1 journal: ceph-journal-bucket-index-1-nvme0n1 journal_vg: ceph-nvme-vg-nvme0n1 - data: ceph-hdd-lv-sdc data_vg: ceph-hdd-vg-sdc journal: ceph-journal-sdc journal_vg: ceph-nvme-vg-nvme0n1 - data: ceph-hdd-lv-sdd data_vg: ceph-hdd-vg-sdd journal: ceph-journal-sdd journal_vg: ceph-nvme-vg-nvme0n1 - data: ceph-bucket-index-1 data_vg: ceph-nvme-vg-nvme1n1 journal: ceph-journal-bucket-index-1-nvme1n1 journal_vg: ceph-nvme-vg-nvme1n1 - data: ceph-hdd-lv-sde data_vg: ceph-hdd-vg-sde journal: ceph-journal-sde journal_vg: ceph-nvme-vg-nvme1n1 - data: ceph-hdd-lv-sdf data_vg: ceph-hdd-vg-sdf journal: ceph-journal-sdf journal_vg: ceph-nvme-vg-nvme1n1
10.2.11. Install Ceph for NVMe and Verify Success
Run the
site.yml
Ansible playbook to install Ceph$ ansible-playbook -v -i hosts site.yml
Verify Ceph is running properly after install completes
# ceph -s
# ceph osd tree
Example
ceph -s
output showing Ceph is running properly:# ceph -s cluster: id: 9ba22f4c-b53f-4c49-8c72-220aaf567c2b health: HEALTH_WARN Reduced data availability: 32 pgs inactive services: mon: 3 daemons, quorum b08-h03-r620,b08-h05-r620,b08-h06-r620 mgr: b08-h03-r620(active), standbys: b08-h05-r620, b08-h06-r620 osd: 42 osds: 42 up, 42 in data: pools: 4 pools, 32 pgs objects: 0 objects, 0 bytes usage: 0 kB used, 0 kB / 0 kB avail pgs: 100.000% pgs unknown 32 unknown
Example
ceph osd tree
output showing Ceph is running properly:[root@c04-h01-6048r ~]# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 60.86740 root default -7 8.69534 host c04-h01-6048r 10 hdd 1.81799 osd.10 up 1.00000 1.00000 13 hdd 1.81799 osd.13 up 1.00000 1.00000 21 hdd 1.81799 osd.21 up 1.00000 1.00000 27 hdd 1.81799 osd.27 up 1.00000 1.00000 6 ssd 0.71169 osd.6 up 1.00000 1.00000 15 ssd 0.71169 osd.15 up 1.00000 1.00000 -3 8.69534 host c04-h05-6048r 7 hdd 1.81799 osd.7 up 1.00000 1.00000 20 hdd 1.81799 osd.20 up 1.00000 1.00000 29 hdd 1.81799 osd.29 up 1.00000 1.00000 38 hdd 1.81799 osd.38 up 1.00000 1.00000 4 ssd 0.71169 osd.4 up 1.00000 1.00000 25 ssd 0.71169 osd.25 up 1.00000 1.00000 -22 8.69534 host c04-h09-6048r 17 hdd 1.81799 osd.17 up 1.00000 1.00000 31 hdd 1.81799 osd.31 up 1.00000 1.00000 35 hdd 1.81799 osd.35 up 1.00000 1.00000 39 hdd 1.81799 osd.39 up 1.00000 1.00000 5 ssd 0.71169 osd.5 up 1.00000 1.00000 34 ssd 0.71169 osd.34 up 1.00000 1.00000 -9 8.69534 host c04-h13-6048r 8 hdd 1.81799 osd.8 up 1.00000 1.00000 11 hdd 1.81799 osd.11 up 1.00000 1.00000 30 hdd 1.81799 osd.30 up 1.00000 1.00000 32 hdd 1.81799 osd.32 up 1.00000 1.00000 0 ssd 0.71169 osd.0 up 1.00000 1.00000 26 ssd 0.71169 osd.26 up 1.00000 1.00000 -19 8.69534 host c04-h21-6048r 18 hdd 1.81799 osd.18 up 1.00000 1.00000 23 hdd 1.81799 osd.23 up 1.00000 1.00000 36 hdd 1.81799 osd.36 up 1.00000 1.00000 40 hdd 1.81799 osd.40 up 1.00000 1.00000 3 ssd 0.71169 osd.3 up 1.00000 1.00000 33 ssd 0.71169 osd.33 up 1.00000 1.00000 -16 8.69534 host c04-h25-6048r 16 hdd 1.81799 osd.16 up 1.00000 1.00000 22 hdd 1.81799 osd.22 up 1.00000 1.00000 37 hdd 1.81799 osd.37 up 1.00000 1.00000 41 hdd 1.81799 osd.41 up 1.00000 1.00000 1 ssd 0.71169 osd.1 up 1.00000 1.00000 28 ssd 0.71169 osd.28 up 1.00000 1.00000 -5 8.69534 host c04-h29-6048r 9 hdd 1.81799 osd.9 up 1.00000 1.00000 12 hdd 1.81799 osd.12 up 1.00000 1.00000 19 hdd 1.81799 osd.19 up 1.00000 1.00000 24 hdd 1.81799 osd.24 up 1.00000 1.00000 2 ssd 0.71169 osd.2 up 1.00000 1.00000 14 ssd 0.71169 osd.14 up 1.00000 1.00000
Ceph is now set up to use two NVMe devices and LVM optimally for Object Storage Gateway.