Chapter 5. Customizing the Ceph Storage cluster
Director deploys containerized Red Hat Ceph Storage using a default configuration. You can customize Ceph Storage by overriding the default settings.
Prerequistes
To deploy containerized Ceph Storage you must include the /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml
file during overcloud deployment. This environment file defines the following resources:
-
CephAnsibleDisksConfig
- This resource maps the Ceph Storage node disk layout. For more information, see Section 5.3, “Mapping the Ceph Storage node disk layout”. -
CephConfigOverrides
- This resource applies all other custom settings to your Ceph Storage cluster.
Use these resources to override any defaults that the director sets for containerized Ceph Storage.
Procedure
Enable the Red Hat Ceph Storage 4 Tools repository:
$ sudo subscription-manager repos --enable=rhceph-4-tools-for-rhel-8-x86_64-rpms
Install the
ceph-ansible
package on the undercloud:$ sudo dnf install ceph-ansible
To customize your Ceph Storage cluster, define custom parameters in a new environment file, for example,
/home/stack/templates/ceph-config.yaml
. You can apply Ceph Storage cluster settings with the following syntax in theparameter_defaults
section of your environment file:parameter_defaults: CephConfigOverrides: section: KEY:VALUE
NoteYou can apply the
CephConfigOverrides
parameter to the[global]
section of theceph.conf
file, as well as any other section, such as[osd]
,[mon]
, and[client]
. If you specify a section, thekey:value
data goes into the specified section. If you do not specify a section, the data goes into the[global]
section by default. For information about Ceph Storage configuration, customization, and supported parameters, see Red Hat Ceph Storage Configuration Guide.Replace
KEY
andVALUE
with the Ceph cluster settings that you want to apply. For example, in theglobal
section,max_open_files
is theKEY
and131072
is the correspondingVALUE
:parameter_defaults: CephConfigOverrides: global: max_open_files: 131072 osd: osd_scrub_during_recovery: false
This configuration results in the following settings defined in the configuration file of your Ceph cluster:
[global] max_open_files = 131072 [osd] osd_scrub_during_recovery = false
5.1. Setting ceph-ansible group variables
The ceph-ansible
tool is a playbook used to install and manage Ceph Storage clusters.
The ceph-ansible
tool has a group_vars
directory that defines configuration options and the default settings for those options. Use the group_vars
directory to set Ceph Storage parameters.
For information about the group_vars
directory, see Installing a Red Hat Ceph Storage cluster in the Installation Guide.
To change the variable defaults in director, use the CephAnsibleExtraConfig
parameter to pass the new values in heat environment files. For example, to set the ceph-ansible
group variable journal_size
to 40960, create an environment file with the following journal_size
definition:
parameter_defaults: CephAnsibleExtraConfig: journal_size: 40960
Change ceph-ansible
group variables with the override parameters; do not edit group variables directly in the /usr/share/ceph-ansible
directory on the undercloud.
5.2. Ceph containers for Red Hat OpenStack Platform with Ceph Storage
A Ceph container is required to configure Red Hat OpenStack Platform (RHOSP) to use Ceph, even with an external Ceph cluster. To be compatible with Red Hat Enterprise Linux 8, RHOSP 16.0 requires Red Hat Ceph Storage 4. The Ceph Storage 4 container is hosted at registry.redhat.io, a registry which requires authentication.
You can use the heat environment parameter ContainerImageRegistryCredentials
to authenticate at registry.redhat.io
, as described in Container image preparation parameters.
5.3. Mapping the Ceph Storage node disk layout
When you deploy containerized Ceph Storage, you must map the disk layout and specify dedicated block devices for the Ceph OSD service. You can perform this mapping in the environment file that you created earlier to define your custom Ceph parameters: /home/stack/templates/ceph-config.yaml
.
Use the CephAnsibleDisksConfig
resource in parameter_defaults
to map your disk layout. This resource uses the following variables:
Variable | Required? | Default value (if unset) | Description |
---|---|---|---|
osd_scenario | Yes | lvm
NOTE: The default value is |
The |
devices | Yes | NONE. Variable must be set. | A list of block devices that you want to use for OSDs on the node. |
dedicated_devices |
Yes (only if | devices |
A list of block devices that maps each entry in the |
dmcrypt | No | false |
Sets whether data stored on OSDs is encrypted ( |
osd_objectstore | No | bluestore
NOTE: The default value is | Sets the storage back end used by Ceph. |
5.3.1. Using BlueStore
To specify the block devices that you want to use as Ceph OSDs, use a variation of the following snippet:
parameter_defaults: CephAnsibleDisksConfig: devices: - /dev/sdb - /dev/sdc - /dev/sdd - /dev/nvme0n1 osd_scenario: lvm osd_objectstore: bluestore
Because /dev/nvme0n1
is in a higher performing device class, the example parameter defaults produce three OSDs that run on /dev/sdb
, /dev/sdc
, and /dev/sdd
. The three OSDs use /dev/nvme0n1
as a BlueStore WAL device. The ceph-volume tool does this by using the batch
subcommand. The same setup is duplicated for each Ceph storage node and assumes uniform hardware. If the BlueStore WAL data resides on the same disks as the OSDs, then change the parameter defaults:
parameter_defaults: CephAnsibleDisksConfig: devices: - /dev/sdb - /dev/sdc - /dev/sdd osd_scenario: lvm osd_objectstore: bluestore
5.3.2. Referring to devices with persistent names
In some nodes, disk paths, such as /dev/sdb
and /dev/sdc
, may not point to the same block device during reboots. If this is the case with your CephStorage
nodes, specify each disk with the /dev/disk/by-path/
symlink to ensure that the block device mapping is consistent throughout deployments:
parameter_defaults: CephAnsibleDisksConfig: devices: - /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:10:0 - /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:11:0 dedicated_devices - /dev/nvme0n1 - /dev/nvme0n1
Because you must set the list of OSD devices prior to overcloud deployment, it may not be possible to identify and set the PCI path of disk devices. In this case, gather the /dev/disk/by-path/symlink
data for block devices during introspection.
In the following example, run the first command to download the introspection data from the undercloud Object Storage service (swift) for the server b08-h03-r620-hci
and saves the data in a file called b08-h03-r620-hci.json
. Run the second command to grep for “by-path”. The output of this command contains the unique /dev/disk/by-path
values that you can use to identify disks.
(undercloud) [stack@b08-h02-r620 ironic]$ openstack baremetal introspection data save b08-h03-r620-hci | jq . > b08-h03-r620-hci.json (undercloud) [stack@b08-h02-r620 ironic]$ grep by-path b08-h03-r620-hci.json "by_path": "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:0:0", "by_path": "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:1:0", "by_path": "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:3:0", "by_path": "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:4:0", "by_path": "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:5:0", "by_path": "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:6:0", "by_path": "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:7:0", "by_path": "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:0:0",
For more information about naming conventions for storage devices, see Overview of persistent naming attributes in the Managing storage devices guide.
For details about each journaling scenario and disk mapping for containerized Ceph Storage, see the OSD Scenarios section of the project documentation for ceph-ansible.
5.4. Assigning custom attributes to different Ceph pools
By default, Ceph pools created with director have the same number of placement groups (pg_num
and pgp_num
) and sizes. You can use either method in Chapter 5, Customizing the Ceph Storage cluster to override these settings globally; that is, doing so applies the same values for all pools.
You can also apply different attributes to each Ceph pool. To do so, use the CephPools
parameter:
parameter_defaults: CephPools: - name: POOL pg_num: 128 application: rbd
Replace POOL
with the name of the pool that you want to configure and the pg_num
setting to indicate the number of placement groups. This overrides the default pg_num
for the specified pool.
If you use the CephPools
parameter, you must also specify the application type. The application type for Compute, Block Storage, and Image Storage should be rbd
, as shown in the examples, but depending on what the pool is used for, you might need to specify a different application type. For example, the application type for the gnocchi metrics pool is openstack_gnocchi
. For more information, see Enable Application in the Storage Strategies Guide .
If you do not use the CephPools
parameter, director sets the appropriate application type automatically, but only for the default pool list.
You can also create new custom pools through the CephPools
parameter. For example, to add a pool called custompool
:
parameter_defaults: CephPools: - name: custompool pg_num: 128 application: rbd
This creates a new custom pool in addition to the default pools.
For typical pool configurations of common Ceph use cases, see the Ceph Placement Groups (PGs) per Pool Calculator. This calculator is normally used to generate the commands for manually configuring your Ceph pools. In this deployment, the director will configure the pools based on your specifications.
Red Hat Ceph Storage 3 (Luminous) introduced a hard limit on the maximum number of PGs an OSD can have, which is 200 by default. Do not override this parameter beyond 200. If there is a problem because the Ceph PG number exceeds the maximum, adjust the pg_num
per pool to address the problem, not the mon_max_pg_per_osd
.
5.5. Mapping the disk layout to non-homogeneous Ceph Storage nodes
By default, all nodes of a role that host Ceph OSDs (indicated by the OS::TripleO::Services::CephOSD service in roles_data.yaml
), for example CephStorage
or ComputeHCI
nodes, use the global devices
and dedicated_devices
lists set in Section 5.3, “Mapping the Ceph Storage node disk layout”. This assumes that all of these servers have homogeneous hardware. If a subset of these servers do not have homogeneous hardware, then director needs to be aware that each of these servers has different devices
and dedicated_devices
lists. This is known as a node-specific disk configuration.
To pass a node-specific disk configuration to director, you must pass a heat environment file, such as node-spec-overrides.yaml
, to the openstack overcloud deploy
command and the file content must identify each server by a machine unique UUID and a list of local variables to override the global variables.
You can extract the machine unique UUID for each individual server or from the Ironic database.
To locate the UUID for an individual server, log in to the server and enter the following command:
dmidecode -s system-uuid
To extract the UUID from the Ironic database, enter the following command on the undercloud:
openstack baremetal introspection data save NODE-ID | jq .extra.system.product.uuid
If the undercloud.conf
does not have inspection_extras = true
before undercloud installation or upgrade and introspection, then the machine unique UUID is not in the Ironic database.
The machine unique UUID is not the Ironic UUID.
A valid node-spec-overrides.yaml
file might look like the following:
parameter_defaults: NodeDataLookup: {"32E87B4C-C4A7-418E-865B-191684A6883B": {"devices": ["/dev/sdc"]}}
All lines after the first two lines must be valid JSON. An easy way to verify that the JSON is valid is to use the jq
command:
-
Remove the first two lines (
parameter_defaults:
andNodeDataLookup:
) from the file temporarily. -
Enter
cat node-spec-overrides.yaml | jq .
As the node-spec-overrides.yaml
file grows, jq
might also be used to ensure that the embedded JSON is valid. For example, because the devices
and dedicated_devices
list must be the same length, use the following command to verify that they are the same length before you start the deployment.
(undercloud) [stack@b08-h02-r620 tht]$ cat node-spec-c05-h17-h21-h25-6048r.yaml | jq '.[] | .devices | length' 33 30 33 (undercloud) [stack@b08-h02-r620 tht]$ cat node-spec-c05-h17-h21-h25-6048r.yaml | jq '.[] | .dedicated_devices | length' 33 30 33 (undercloud) [stack@b08-h02-r620 tht]$
In the above example, the node-spec-c05-h17-h21-h25-6048r.yaml
has three servers in rack c05 in which slots h17, h21, and h25 are missing disks. A more complicated example is included at the end of this section.
After the JSON has been validated add back the two lines which makes it a valid environment YAML file (parameter_defaults:
and NodeDataLookup:
) and include it with a -e
in the deployment.
In the example below, the updated heat environment file uses NodeDataLookup
for Ceph deployment. All of the servers had a devices list with 35 disks except one of them had a disk missing. This environment file overrides the default devices list for only that single node and gives it the list of 34 disks it must use instead of the global list.
parameter_defaults: # c05-h01-6048r is missing scsi-0:2:35:0 (00000000-0000-0000-0000-0CC47A6EFD0C) NodeDataLookup: { "00000000-0000-0000-0000-0CC47A6EFD0C": { "devices": [ "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:1:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:32:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:2:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:3:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:4:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:5:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:6:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:33:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:7:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:8:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:34:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:9:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:10:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:11:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:12:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:13:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:14:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:15:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:16:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:17:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:18:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:19:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:20:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:21:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:22:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:23:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:24:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:25:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:26:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:27:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:28:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:29:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:30:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:31:0" ], "dedicated_devices": [ "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1" ] } }
5.6. Increasing the restart delay for large Ceph clusters
During deployment, Ceph services such as OSDs and Monitors, are restarted and the deployment does not continue until the service is running again. Ansible waits 15 seconds (the delay) and checks 5 times for the service to start (the retries). If the service does not restart, the deployment stops so the operator can intervene.
Depending on the size of the Ceph cluster, you may need to increase the retry or delay values. The exact names of these parameters and their defaults are as follows:
health_mon_check_retries: 5 health_mon_check_delay: 15 health_osd_check_retries: 5 health_osd_check_delay: 15
Procedure
Update the
CephAnsibleExtraConfig
parameter to change the default delay and retry values:parameter_defaults: CephAnsibleExtraConfig: health_osd_check_delay: 40 health_osd_check_retries: 30 health_mon_check_delay: 20 health_mon_check_retries: 10
This example makes the cluster check 30 times and wait 40 seconds between each check for the Ceph OSDs, and check 20 times and wait 10 seconds between each check for the Ceph MONs.
-
To incorporate the changes, pass the updated
yaml
file with-e
usingopenstack overcloud deploy
.
5.7. Overriding Ansible environment variables
The Red Hat OpenStack Platform Workflow service (mistral) uses Ansible to configure Ceph Storage, but you can customize the Ansible environment by using Ansible environment variables.
Procedure
To override an ANSIBLE_*
environment variable, use the CephAnsibleEnvironmentVariables
heat template parameter.
This example configuration increases the number of forks and SSH retries:
parameter_defaults: CephAnsibleEnvironmentVariables: ANSIBLE_SSH_RETRIES: '6' DEFAULT_FORKS: '35'
For more information about Ansible environment variables, see Ansible Configuration Settings.
For more information about how to customize your Ceph Storage cluster, see Customizing the Ceph Storage cluster.