Red Hat Training

A Red Hat training course is available for Red Hat OpenStack Platform

Chapter 5. Customizing the Ceph Storage cluster

Director deploys containerized Red Hat Ceph Storage using a default configuration. You can customize Ceph Storage by overriding the default settings.

Prerequistes

To deploy containerized Ceph Storage you must include the /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml file during overcloud deployment. This environment file defines the following resources:

Procedure

  1. Enable the Red Hat Ceph Storage 3 Tools repository:

    $ sudo subscription-manager repos --enable=rhel-7-server-rhceph-3-tools-rpms
  2. Install the ceph-ansible package on your undercloud:

    $ sudo yum install ceph-ansible
  3. To customize your Ceph Storage cluster, define custom parameters in a new environment file, for example, /home/stack/templates/ceph-config.yaml. You can apply Ceph Storage cluster settings with the following syntax in the parameter_defaults section of your environment file:

    parameter_defaults:
      section:
        KEY:VALUE
    Note

    You can apply the CephConfigOverrides parameter to the [global] section of the ceph.conf file, as well as any other section, such as [osd], [mon], and [client]. If you specify a section, the key:value data goes into the specified section. If you do not specify a section, the data goes into the [global] section by default. For information about Ceph Storage configuration, customization, and supported parameters, see Red Hat Ceph Storage Configuration Guide.

  4. Replace KEY and VALUE with the Ceph cluster settings that you want to apply. For example, in the global section, max_open_files is the KEY and 131072 is the corresponding VALUE:

    parameter_defaults:
      CephConfigOverrides:
        global:
          max_open_files: 131072
        osd:
          osd_scrub_during_recovery: false

    This configuration results in the following settings defined in the configuration file of your Ceph cluster:

    [global]
    max_open_files = 131072
    [osd]
    osd_scrub_during_recovery = false

5.1. Setting ceph-ansible group variables

The ceph-ansible tool is a playbook used to install and manage Ceph Storage clusters.

For information about the group_vars directory, see 3.2. Installing a Red Hat Ceph Storage Cluster in the Installation Guide for Red Hat Enterprise Linux.

To change the variable defaults in director, use the CephAnsibleExtraConfig parameter to pass the new values in heat environment files. For example, to set the ceph-ansible group variable journal_size to 40960, create an environment file with the following journal_size definition:

parameter_defaults:
  CephAnsibleExtraConfig:
    journal_size: 40960
Important

Change ceph-ansible group variables with the override parameters; do not edit group variables directly in the /usr/share/ceph-ansible directory on the undercloud.

5.2. Mapping the Ceph Storage node disk layout

When you deploy containerized Ceph Storage, you must map the disk layout and specify dedicated block devices for the Ceph OSD service. You can perform this mapping in the environment file you created earlier to define your custom Ceph parameters: /home/stack/templates/ceph-config.yaml.

Use the CephAnsibleDisksConfig resource in parameter_defaults to map your disk layout. This resource uses the following variables:

VariableRequired?Default value (if unset)Description

osd_scenario

Yes

lvm

NOTE: For new deployments using Ceph 3.2 and later, lvm is the default. For Ceph 3.1 and earlier, the default is collocated

With Ceph 3.2, lvm allows ceph-ansible to use ceph-volume to configure OSDs and BlueStore WAL devices.

With Ceph 3.1, the values set the journaling scenario, such as whether OSDs must be created with journals that are either:

- co-located on the same device for filestore (collocated), or

- stored on dedicated devices for filestore (non-collocated).

devices

Yes

NONE. Variable must be set.

A list of block devices to be used on the node for OSDs.

dedicated_devices

Yes (only if osd_scenario is non-collocated)

devices

A list of block devices that maps each entry under devices to a dedicated journaling block device. Use this variable only when osd_scenario=non-collocated.

dmcrypt

No

false

Sets whether data stored on OSDs are encrypted (true) or not (false).

osd_objectstore

No

bluestore

NOTE: For new deployments using Ceph 3.2 and later, bluestore is the default. For Ceph 3.1 and earlier, the default is filestore.

Sets the storage back end used by Ceph.

Important

If you deployed your Ceph cluster with a version of ceph-ansible older than 3.3 and osd_scenario is set to collocated or non-collocated, OSD reboot failure can occur due to a device naming discrepancy. For more information about this fault, see https://bugzilla.redhat.com/show_bug.cgi?id=1670734. For information about a workaround, see https://access.redhat.com/solutions/3702681.

5.2.1. Using BlueStore in Ceph 3.2 and later

Note

New deployments of OpenStack Platform 13 must use bluestore. Current deployments that use filestore must continue using filestore, as described in Using FileStore in Ceph 3.1 and earlier. Migrations from filestore to bluestore are not supported by default in RHCS 3.x.

Procedure

  1. To specify the block devices to be used as Ceph OSDs, use a variation of the following:

    parameter_defaults:
      CephAnsibleDisksConfig:
        devices:
          - /dev/sdb
          - /dev/sdc
          - /dev/sdd
          - /dev/nvme0n1
        osd_scenario: lvm
        osd_objectstore: bluestore
  2. Because /dev/nvme0n1 is in a higher performing device class—​it is an SSD and the other devices are HDDs—​the example parameter defaults produce three OSDs that run on /dev/sdb, /dev/sdc, and /dev/sdd. The three OSDs use /dev/nvme0n1 as a BlueStore WAL device. The ceph-volume tool does this by using the batch subcommand. The same configuration is duplicated for each Ceph storage node and assumes uniform hardware. If the BlueStore WAL data resides on the same disks as the OSDs, then change the parameter defaults in the following way:

    parameter_defaults:
      CephAnsibleDisksConfig:
        devices:
          - /dev/sdb
          - /dev/sdc
          - /dev/sdd
        osd_scenario: lvm
        osd_objectstore: bluestore

5.2.2. Using FileStore in Ceph 3.1 and earlier

Important

The default journaling scenario is set to osd_scenario=collocated, which has lower hardware requirements consistent with most testing environments. In a typical production environment, however, journals are stored on dedicated devices, osd_scenario=non-collocated, to accommodate heavier I/O workloads. For more information, see Identifying a Performance Use Case in the Red Hat Ceph Storage Hardware Selection Guide.

Procedure

  1. List each block device to be used by the OSDs as a simple list under the devices variable, for example:

    devices:
      - /dev/sda
      - /dev/sdb
      - /dev/sdc
      - /dev/sdd
  2. Optional: If osd_scenario=non-collocated, you must also map each entry in devices to a corresponding entry in dedicated_devices. For example, the following snippet in /home/stack/templates/ceph-config.yaml:

    osd_scenario: non-collocated
    devices:
      - /dev/sda
      - /dev/sdb
      - /dev/sdc
      - /dev/sdd
    
    dedicated_devices:
      - /dev/sdf
      - /dev/sdf
      - /dev/sdg
      - /dev/sdg
    Result

    Each Ceph Storage node in the resulting Ceph cluster has the following characteristics:

    • /dev/sda has /dev/sdf1 as its journal
    • /dev/sdb has /dev/sdf2 as its journal
    • /dev/sdc has /dev/sdg1 as its journal
    • /dev/sdd has /dev/sdg2 as its journal

5.2.3. Referring to devices with persistent names

Procedure

  1. In some nodes, disk paths such as /dev/sdb and /dev/sdc, might not point to the same block device during reboots. If this is the case with your CephStorage nodes, specify each disk with the /dev/disk/by-path/ symlink to ensure that the block device mapping is consistent throughout deployments:

    parameter_defaults:
      CephAnsibleDisksConfig:
        devices:
    
          - /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:10:0
          - /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:11:0
    
    
        dedicated_devices:
          - /dev/nvme0n1
          - /dev/nvme0n1
  2. Optional: Because you must set the list of OSD devices before overcloud deployment, it might not be possible to identify and set the PCI path of disk devices. In this case, gather the /dev/disk/by-path/symlink data for block devices during introspection.

    In the following example, run the first command to download the introspection data from the undercloud Object Storage service (swift) for the server, b08-h03-r620-hci, and save the data in a file called b08-h03-r620-hci.json. Run the second command to grep for “by-path”. The output of this command contains the unique /dev/disk/by-path values that you can use to identify disks.

    (undercloud) [stack@b08-h02-r620 ironic]$ openstack baremetal introspection data save b08-h03-r620-hci | jq . > b08-h03-r620-hci.json
    (undercloud) [stack@b08-h02-r620 ironic]$ grep by-path b08-h03-r620-hci.json
            "by_path": "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:0:0",
            "by_path": "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:1:0",
            "by_path": "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:3:0",
            "by_path": "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:4:0",
            "by_path": "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:5:0",
            "by_path": "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:6:0",
            "by_path": "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:7:0",
            "by_path": "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:0:0",

For more information about naming conventions for storage devices, see Persistent Naming in the Red Hat Enterprise Linux (RHEL) Managing storage devices guide.

Warning

osd_scenario: lvm is used in the example to default new deployments to bluestore as configured by ceph-volume; this is only available with ceph-ansible 3.2 or later and Ceph Luminous or later. The parameters to support filestore with ceph-ansible 3.2 are backwards compatible. Therefore, in existing FileStore deployments, do not change the osd_objectstore or osd_scenario parameters.

5.2.4. Creating a valid JSON file automatically from Bare Metal service introspection data

When you customize devices in a Ceph Storage deployment by manually including node-specific overrides, you can inadvertently introduce errors. The director tools directory contains a utility named make_ceph_disk_list.py that you can use to create a valid JSON environment file automatically from Bare Metal service (ironic) introspection data.

Procedure

  1. Export the introspection data from the Bare Metal service database for the Ceph Storage nodes you want to deploy:

    openstack baremetal introspection data save oc0-ceph-0 > ceph0.json
    openstack baremetal introspection data save oc0-ceph-1 > ceph1.json
    ...
  2. Copy the utility to the stack user’s home directory on the undercloud, and then use it to generate a node_data_lookup.json file that you can pass to the openstack overcloud deploy command:

    ./make_ceph_disk_list.py -i ceph*.json -o node_data_lookup.json -k by_path
    • The -i option can take an expression such as *.json or a list of files as input.
    • The -k option defines the key of the ironic disk data structure used to identify the OSD disks.

      Note

      Red Hat does not recommend using name because it produces a list of devices such as /dev/sdd, which may not always point to the same device on reboot. Instead, Red Hat recommends that you use by_path, which is the default option if -k is not specified.

      Note

      You can only define NodeDataLookup once during a deployment, so pass the introspection data file to all nodes that host Ceph OSDs. The Bare Metal service reserves one of the available disks on the system as the root disk. The utility always exludes the root disk from the list of generated devices.

  3. Run the ./make_ceph_disk_list.py –help command to see other available options.

5.2.5. Mapping the Disk Layout to Non-Homogeneous Ceph Storage Nodes

Important

Non-homogeneous Ceph Storage nodes can cause performance issues, such as the risk of unpredictable performance loss. Although you can configure non-homogeneous Ceph Storage nodes in your Red Hat OpenStack Platform environment, Red Hat does not recommend it.

By default, all nodes that host Ceph OSDs use the global devices and dedicated_devices lists that you set in Section 5.2, “Mapping the Ceph Storage node disk layout”.

This default configuration is appropriate when all Ceph OSD nodes have homogeneous hardware. However, if a subset of these servers do not have homogeneous hardware, then you must define a node-specific disk configuration in the director.

Note

To identify nodes that host Ceph OSDs, inspect the roles_data.yaml file and identify all roles that include the OS::TripleO::Services::CephOSD service.

To define a node-specific configuration, create a custom environment file that identifies each server and includes a list of local variables that override global variables and include the environment file in the openstack overcloud deploy command. For example, create a node-specific configuration file called node-spec-overrides.yaml.

You can extract the machine unique UUID for each individual server or from the Ironic database.

To locate the UUID for an individual server, log in to the server and run the following command:

dmidecode -s system-uuid

To extract the UUID from the Ironic database, run the following command on the undercloud:

openstack baremetal introspection data save NODE-ID | jq .extra.system.product.uuid
Warning

If the undercloud.conf file does not have inspection_extras = true prior to undercloud installation or upgrade and introspection, then the machine unique UUID will not be in the Ironic database.

Important

The machine unique UUID is not the Ironic UUID.

A valid node-spec-overrides.yaml file may look like the following:

parameter_defaults:
  NodeDataLookup: |
    {"32E87B4C-C4A7-418E-865B-191684A6883B": {"devices": ["/dev/sdc"]}}

All lines after the first two lines must be valid JSON. An easy way to verify that the JSON is valid is to use the jq command. For example:

  1. Remove the first two lines (parameter_defaults: and NodeDataLookup: |) from the file temporarily.
  2. Run cat node-spec-overrides.yaml | jq .

As the node-spec-overrides.yaml file grows, jq may also be used to ensure that the embedded JSON is valid. For example, because the devices and dedicated_devices list should be the same length, use the following to verify that they are the same length before starting the deployment.

(undercloud) [stack@b08-h02-r620 tht]$ cat node-spec-c05-h17-h21-h25-6048r.yaml | jq '.[] | .devices | length'
33
30
33
(undercloud) [stack@b08-h02-r620 tht]$ cat node-spec-c05-h17-h21-h25-6048r.yaml | jq '.[] | .dedicated_devices | length'
33
30
33
(undercloud) [stack@b08-h02-r620 tht]$

In this example, the node-spec-c05-h17-h21-h25-6048r.yaml has three servers in rack c05 in which slots h17, h21, and h25 are missing disks. A more complicated example is included at the end of this section.

After you validate the JSON syntax, ensure that you repopulate the first two lines of the environment file and use the -e option to include the file in the deployment command.

In the following example, the updated environment file uses NodeDataLookup for Ceph deployment. All of the servers had a devices list with 35 disks, except one server has a disk missing.

Use the following example environment file to override the default devices list for the node that has 34 disks with the list of disks it should use instead of the global list.

parameter_defaults:
  # c05-h01-6048r is missing scsi-0:2:35:0 (00000000-0000-0000-0000-0CC47A6EFD0C)
  NodeDataLookup: |
    {
    "00000000-0000-0000-0000-0CC47A6EFD0C": {
      "devices": [
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:1:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:32:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:2:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:3:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:4:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:5:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:6:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:33:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:7:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:8:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:34:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:9:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:10:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:11:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:12:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:13:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:14:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:15:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:16:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:17:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:18:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:19:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:20:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:21:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:22:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:23:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:24:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:25:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:26:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:27:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:28:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:29:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:30:0",
    "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:31:0"
        ],
      "dedicated_devices": [
    "/dev/disk/by-path/pci-0000:81:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:81:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:81:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:81:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:81:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:81:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:81:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:81:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:81:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:81:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:81:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:81:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:81:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:81:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:81:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:81:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:81:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:81:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:84:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:84:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:84:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:84:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:84:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:84:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:84:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:84:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:84:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:84:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:84:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:84:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:84:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:84:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:84:00.0-nvme-1",
    "/dev/disk/by-path/pci-0000:84:00.0-nvme-1"
        ]
      }
    }

5.3. Controlling resources that are available to Ceph Storage containers

When you colocate Ceph Storage containers and Red Hat OpenStack Platform containers on the same server, the containers can compete for memory and CPU resources.

To control the amount of memory or CPU that Ceph Storage containers can use, define the CPU and memory limits as shown in the following example:

parameter_defaults:
  CephAnsibleExtraConfig:
    ceph_mds_docker_cpu_limit: 4
    ceph_mgr_docker_cpu_limit: 1
    ceph_mon_docker_cpu_limit: 1
    ceph_osd_docker_cpu_limit: 4
    ceph_mds_docker_memory_limit: 64438m
    ceph_mgr_docker_memory_limit: 64438m
    ceph_mon_docker_memory_limit: 64438m
Note

The limits shown are for example only. Actual values can vary based on your environment.

Warning

The default value for all of the memory limits specified in the example is the total host memory on the system. For example, ceph-ansible uses "{{ ansible_memtotal_mb }}m".

Warning

The ceph_osd_docker_memory_limit parameter is intentionally excluded from the example. Do not use the ceph_osd_docker_memory_limit parameter. For more information, see Reserving Memory Resources for Ceph in the Hyper-Converged Infrastructure Guide.

If the server on which the containers are colocated does not have sufficient memory or CPU, or if your design requires physical isolation, you can use composable services to deploy Ceph Storage containers to additional nodes. For more information, see Composable Services and Custom Roles in the Advanced Overcloud Customization guide.

5.4. Overriding Ansible environment variables

The Red Hat OpenStack Platform Workflow service (mistral) uses Ansible to configure Ceph Storage, but you can customize the Ansible environment by using Ansible environment variables.

Procedure

To override an ANSIBLE_* environment variable, use the CephAnsibleEnvironmentVariables heat template parameter.

This example configuration increases the number of forks and SSH retries:

parameter_defaults:
  CephAnsibleEnvironmentVariables:
    ANSIBLE_SSH_RETRIES: '6'
    DEFAULT_FORKS: '35'

For more information about Ansible environment variables, see Ansible Configuration Settings.

For more information about how to customize your Ceph Storage cluster, see Customizing the Ceph Storage cluster.