Chapter 7. Deploying second-tier Ceph storage on OpenStack

Using OpenStack director, you can deploy different Red Hat Ceph Storage performance tiers by adding new Ceph nodes dedicated to a specific tier in a Ceph cluster.

For example, you can add new object storage daemon (OSD) nodes with SSD drives to an existing Ceph cluster to create a Block Storage (cinder) backend exclusively for storing data on these nodes. A user creating a new Block Storage volume can then choose the desired performance tier: either HDDs or the new SSDs.

This type of deployment requires Red Hat OpenStack Platform director to pass a customized CRUSH map to ceph-ansible. The CRUSH map allows you to split OSD nodes based on disk performance, but you can also use this feature for mapping physical infrastructure layout.

The following sections demonstrate how to deploy four nodes where two of the nodes use SSDs and the other two use HDDs. The example is kept simple to communicate a repeatable pattern. However, a production deployment should use more nodes and more OSDs to be supported as per the Red Hat Ceph Storage hardware selection guide.

7.1. Create a CRUSH map

The CRUSH map allows you to put OSD nodes into a CRUSH root. By default, a “default” root is created and all OSD nodes are included in it.

Inside a given root, you define the physical topology, rack, rooms, and so forth, and then place the OSD nodes in the desired hierarchy (or bucket). By default, no physical topology is defined; a flat design is assumed as if all nodes are in the same rack.

two tier crush map

See Crush Administration in the Storage Strategies Guide for details about creating a custom CRUSH map.

7.2. Mapping the OSDs

Complete the following step to map the OSDs.

Procedure

  1. Declare the OSDs/journal mapping:

    parameter_defaults:
      CephAnsibleDisksConfig:
       devices:
       - /dev/sda
       - /dev/sdb
       dedicated_devices:
       - /dev/sdc
       - /dev/sdc
       osd_scenario: non-collocated
       journal_size: 8192

7.3. Setting the replication factor

Complete the following step to set the replication factor.

Note

This is normally supported only for full SSD deployment. See Red Hat Ceph Storage: Supported configurations.

Procedure

  1. Set the default replication factor to two. This example splits four nodes into two different roots.

    parameter_defaults:
      CephPoolDefaultSize: 2
Note

If you upgrade a deployment that uses gnocchi as the backend, you might encounter deployment timeout. To prevent this timeout, use the following CephPool definition to customize the gnocchi pool:

parameter_defaults
    CephPools:  {"name": metrics, "pg_num": 128, "pgp_num": 128, "size": 1}

7.4. Defining the CRUSH hierarchy

Director provides the data for the CRUSH hierarchy, but ceph-ansible actually passes that data by getting the CRUSH mapping through the Ansible inventory file. Unless you keep the default root, you must specify the location of the root for each node.

For example if node lab-ceph01 (provisioning IP 172.16.0.26) is placed in rack1 inside the fast_root, the Ansible inventory should resemble the following:

172.16.0.26:
osd_crush_location: {host: lab-ceph01, rack: rack1, root: fast_root}

When you use director to deploy Ceph, you don’t actually write the Ansible inventory; it is generated for you. Therefore, you must use NodeDataLookup to append the data.

NodeDataLookup works by specifying the system product UUID stored on the motherboard of the systems. The Bare Metal service (ironic) also stores this information after the introspection phase.

To create a CRUSH map that supports second-tier storage, complete the following steps:

Procedure

  1. Run the following commands to retrieve the UUIDs of the four nodes:

    for ((x=1; x<=4; x++)); \
    { echo "Node overcloud-ceph0${x}"; \
    openstack baremetal introspection data save overcloud-ceph0${x} | jq .extra.system.product.uuid; }
    Node overcloud-ceph01
    "32C2BC31-F6BB-49AA-971A-377EFDFDB111"
    Node overcloud-ceph02
    "76B4C69C-6915-4D30-AFFD-D16DB74F64ED"
    Node overcloud-ceph03
    "FECF7B20-5984-469F-872C-732E3FEF99BF"
    Node overcloud-ceph04
    "5FFEFA5F-69E4-4A88-B9EA-62811C61C8B3"
    Note

    In the example, overcloud-ceph0[1-4] are the Ironic nodes names; they will be deployed as lab-ceph0[1–4] (via HostnameMap.yaml).

  2. Specify the node placement as follows:

    RootRackNode

    standard_root

    rack1_std

    overcloud-ceph01 (lab-ceph01)

    rack2_std

    overcloud-ceph02 (lab-ceph02)

    fast_root

    rack1_fast

    overcloud-ceph03 (lab-ceph03)

    rack2_fast

    overcloud-ceph04 (lab-ceph04)

    Note

    You cannot have two buckets with the same name. Even if lab-ceph01 and lab-ceph03 are in the same physical rack, you cannot have two buckets called rack1. Therefore, we named them rack1_std and rack1_fast.

    Note

    This example demonstrates how to create a specific route called “standard_root” to illustrate multiple custom roots. However, you could have kept the HDDs OSD nodes in the default root.

  3. Use the following NodeDataLookup syntax:

    NodeDataLookup: {"SYSTEM_UUID": {"osd_crush_location": {"root": "$MY_ROOT", "rack": "$MY_RACK", "host": "$OVERCLOUD_NODE_HOSTNAME"}}}
    Note

    You must specify the system UUID and then the CRUSH hierarchy from top to bottom. Also, the host parameter must point to the node’s overcloud host name, not the Bare Metal service (ironic) node name. To match the example configuration, enter the following:

    parameter_defaults:
      NodeDataLookup: {"32C2BC31-F6BB-49AA-971A-377EFDFDB111": {"osd_crush_location": {"root": "standard_root", "rack": "rack1_std", "host": "lab-ceph01"}},
         "76B4C69C-6915-4D30-AFFD-D16DB74F64ED": {"osd_crush_location": {"root": "standard_root", "rack": "rack2_std", "host": "lab-ceph02"}},
         "FECF7B20-5984-469F-872C-732E3FEF99BF": {"osd_crush_location": {"root": "fast_root", "rack": "rack1_fast", "host": "lab-ceph03"}},
         "5FFEFA5F-69E4-4A88-B9EA-62811C61C8B3": {"osd_crush_location": {"root": "fast_root", "rack": "rack2_fast", "host": "lab-ceph04"}}}
  4. Enable CRUSH map management at the ceph-ansible level:

    parameter_defaults:
      CephAnsibleExtraConfig:
        create_crush_tree: true
  5. Use scheduler hints to ensure the Bare Metal service node UUIDs correctly map to the hostnames:

    parameter_defaults:
      CephStorageCount: 4
      OvercloudCephStorageFlavor: ceph-storage
      CephStorageSchedulerHints:
        'capabilities:node': 'ceph-%index%'
  6. Tag the Bare Metal service nodes with the corresponding hint:

    openstack baremetal node set --property capabilities='profile:ceph-storage,node:ceph-0,boot_option:local' overcloud-ceph01
    
    openstack baremetal node set --property capabilities=profile:ceph-storage,'node:ceph-1,boot_option:local' overcloud-ceph02
    
    openstack baremetal node set --property capabilities='profile:ceph-storage,node:ceph-2,boot_option:local' overcloud-ceph03
    
    openstack baremetal node set --property capabilities='profile:ceph-storage,node:ceph-3,boot_option:local' overcloud-ceph04
    Note

    For more information about predictive placement, see Assigning Specific Node IDs in the Advanced Overcloud Customization guide.

7.5. Defining CRUSH map rules

Rules define how the data is written on a cluster. After the CRUSH map node placement is complete, define the CRUSH rules.

Procedure

  1. Use the following syntax to define the CRUSH rules:

    parameter_defaults:
      CephAnsibleExtraConfig:
        crush_rules:
          - name: $RULE_NAME
            root: $ROOT_NAME
            type: $REPLICAT_DOMAIN
            default: true/false
    Note

    Setting the default parameter to true means that this rule will be used when you create a new pool without specifying any rule. There may only be one default rule.

    In the following example, rule standard points to the OSD nodes hosted on the standard_root with one replicate per rack. Rule fast points to the OSD nodes hosted on the standard_root with one replicate per rack:

    parameter_defaults:
      CephAnsibleExtraConfig:
        crush_rule_config: true
        crush_rules:
          - name: standard
            root: standard_root
            type: rack
            default: true
          - name: fast
            root: fast_root
            type: rack
            default: false
    Note

    You must set crush_rule_config to true.

7.6. Configuring OSP pools

Ceph pools are configured with a CRUSH rules that define how to store data. This example features all built-in OSP pools using the standard_root (the standard rule) and a new pool using fast_root (the fast rule).

Procedure

  1. Use the following syntax to define or change a pool property:

    - name: $POOL_NAME
         pg_num: $PG_COUNT
         rule_name: $RULE_NAME
         application: rbd
  2. List all OSP pools and set the appropriate rule (standard, in this case), and create a new pool called tier2 that uses the fast rule. This pool will be used by Block Storage (cinder).

    parameter_defaults:
      CephPools:
        - name: tier2
          pg_num: 64
          rule_name: fast
          application: rbd
    
        - name: volumes
          pg_num: 64
          rule_name: standard
          application: rbd
    
        - name: vms
          pg_num: 64
          rule_name: standard
          application: rbd
    
        - name: backups
          pg_num: 64
          rule_name: standard
          application: rbd
    
        - name: images
          pg_num: 64
          rule_name: standard
          application: rbd
    
        - name: metrics
          pg_num: 64
          rule_name: standard
          application: openstack_gnocchi

7.7. Configuring Block Storage to use the new pool

Add the Ceph pool to the cinder.conf file to enable Block Storage (cinder) to consume it:

Procedure

  1. Update cinder.conf as follows:

    parameter_defaults:
      CinderRbdExtraPools:
        - tier2

7.8. Verifying customized CRUSH map

After the openstack overcloud deploy command creates or updates the overcloud, complete the following step to verify that the customized CRUSH map was correctly applied.

Note

Be careful if you move a host from one route to another.

Procedure

  1. Connect to a Ceph monitor node and run the following command:

    # ceph osd tree
    ID WEIGHT  TYPE NAME               UP/DOWN REWEIGHT PRIMARY-AFFINITY
    -7 0.39996 root standard_root
    -6 0.19998     rack rack1_std
    -5 0.19998         host lab-ceph02
     1 0.09999             osd.1            up  1.00000          1.00000
     4 0.09999             osd.4            up  1.00000          1.00000
    -9 0.19998     rack rack2_std
    -8 0.19998         host lab-ceph03
     0 0.09999             osd.0            up  1.00000          1.00000
     3 0.09999             osd.3            up  1.00000          1.00000
    -4 0.19998 root fast_root
    -3 0.19998     rack rack1_fast
    -2 0.19998         host lab-ceph01
     2 0.09999             osd.2            up  1.00000          1.00000
     5 0.09999             osd.5            up  1.00000          1.00000