Chapter 4. Configuring Resource Isolation on Hyper-Converged Nodes

With the Red Hat OpenStack Platform implementation of HCI, the director creates hyper-converged nodes by colocating Ceph OSD and Compute services. However, without any further tuning this colocation also risks resource contention between Ceph and Compute services, as neither are aware of each other’s presence on the same host. Resource contention can result in degradation of service. This, in turn, offsets any benefits provided by hyper-convergence.

To prevent contention, you need to configure resource isolation for both Ceph and Compute services. The following subsections describe how to do so.

4.1. Reserve CPU and Memory Resources for Compute

By default, the Compute service parameters do not take into account the colocation of Ceph OSD services on the same node. Hyper-converged nodes need to be tuned in order to address this to maintain stability and maximize the number of possible instances. To do this, you need to set resource constraints for the Compute service on hyper-converged nodes. You can configure this through a plan environment file.

Plan environment files define workflows, which the director can execute through the OpenStack Workflow (Mistral) service. The director also provides a default plan environment file specifically for configuring resource constraints on hyper-converged nodes, namely:

/usr/share/openstack-tripleo-heat-templates/plan-samples/plan-environment-derived-params.yaml

Use the -p parameter to invoke this plan environment file during deployment (as in, to your openstack overcloud deploy command). This plan environment file will direct OpenStack Workflow to:

  1. Retrieve hardware introspection data (collected during Inspecting the Hardware of Nodes),
  2. Calculate optimal CPU and memory constraints for Compute on hyper-converged nodes based on that data, and
  3. Autogenerate the necessary parameters to configure those constraints.

The ~/plan-samples/plan-environment-derived-params.yaml plan environment file defines several CPU and memory allocation workload profile defined under hci_profile_config. The hci_profile parameter sets which workload profile is enabled; for example, if you are using NFV, set hci_profile: nfv_default.

You can also define a custom profile in your own plan environment file using the same syntax. For example, to define a new profile named my_workload:

The average_guest_memory_size_in_mb and average_guest_cpu_utilization_percentage parameters in each workload profile will calculate values for the reserved_host_memory and cpu_allocation_ratio settings of Compute. These values are calculated based on Red Hat recommendations, and are similar to calculations made manually in previous releases (in particular, Reserve CPU and Memory Resources for Compute).

4.1.1. Override Calculated Settings for Memory or CPU Allocation

You can override the Compute settings automatically defined by OpenStack Workflow through another environment file. This is useful if you want to only override either reserved_host_memory or cpu_allocation_ratio and let OpenStack Workflow define the other. Consider the following snippet:

parameter_defaults:
  ComputeHCIParameters:
    NovaReservedHostMemory: 181000  1
  ComputeHCIExtraConfig:
    nova::cpu_allocation_ratio: 8.2  2
1
The NovaReservedHostMemory parameter sets how much RAM should be reserved for the Ceph OSD services and per-guest instance overhead on hyper-converged nodes.
2
The nova::cpu_allocation_ratio: parameter sets the ratio that the Compute scheduler should use when choosing which Compute node to deploy an instance.

The ComputeHCIParameters and ComputeHCIExtraConfig hooks apply their nested parameters to all nodes that use the ComputeHCI role (namely, all hyper-converged nodes). For more information about manually determining optimal values for NovaReservedHostMemory and nova::cpu_allocation_ratio:, see Section A.2, “Compute CPU and Memory Calculator”.

4.2. Reduce Ceph Backfill and Recovery Operations

When a Ceph OSD is removed, Ceph uses backfill and recovery operations to rebalance the cluster. Ceph does this to keep multiple copies of data according to the placement group policy. These operations use system resources. If a Ceph cluster is under load its performance will drop as it diverts resources to backfill and recovery.

To mitigate this performance effect during OSD removal, you can reduce the priority of backfill and recovery operations. Keep in mind that the trade off for this is that there are less data replicas for a longer time, which puts the data at a slightly greater risk.

To configure the priority of backfill and recovery operations, add an environment file named ceph-backfill-recovery.yaml to ~/templates containing the following:

parameter_defaults:
  CephConfigOverrides:
    osd_recovery_op_priority: 3  1
    osd_recovery_max_active: 3  2
    osd_max_backfills: 1  3
1
The osd_recovery_op_priority sets the priority for recovery operations, relative to the OSD client OP priority.
2
The osd_recovery_max_active sets the number of active recovery requests per OSD, at one time. More requests will accelerate recovery, but the requests place an increased load on the cluster. Set this to 1 if you want to reduce latency.
3
The osd_max_backfills sets the maximum number of backfills allowed to or from a single OSD.
Important

The values used in this sample are the current defaults. You do not need to add ceph-backfill-recovery.yaml to your deployment unless you plan to use different values.

4.3. Reserving Memory Resources for Ceph

In hyper-converged deployments, there is contention for memory resources between Compute (nova) and Ceph processes. Hyper-converged deployments of Red Hat OpenStack Platform Red Hat Ceph Storage (RHCS) should use ceph-ansible 3.2 and newer, because it automatically tunes Ceph memory settings. BlueStore is the recommended back end for hyper-converged deployments because of its memory handling features.

WARNING
Red Hat does not recommend directly overriding the ceph_osd_docker_memory_limit parameter.

As of ceph-ansible 3.2, the ceph_osd_docker_memory_limit is set automatically to the maximum memory of the host, as discovered by Ansible, regardless of whether the FileStore or BlueStore back end is used.

The osd_memory_target parameter is the preferred way to reduce memory growth by Ceph OSDs, and it was introduced for BlueStore in RHCS 3.2. Ceph-ansible automatically sets this parameter, and ceph-ansible adjusts the setting for hyper-converged infrastructures (HCI) deployments if the new is_hci parameter is set to true as shown in the example:

parameter_defaults:
  CephAnsibleExtraConfig:
    is_hci: true

Save this setting in /home/stack/templates/storage-container-config.yaml.

4.4. Reserving CPU Resources for Ceph

In hyper-converged deployments there is contention for CPU resources between Compute (nova) and Ceph processes. By default, ceph-ansible limits each OSD to one vCPU by using the --cpu-quota option of the docker run command. The following example overrides the default so that two vCPUs are available for each OSD:

parameter_defaults:
  CephAnsibleExtraConfig:
    ceph_osd_docker_cpu_limit: 2

If more than one CPU per OSD is required, set the ceph_osd_docker_cpu_limit to the desired limit and save it in /home/stack/templates/storage-container-config.yaml.

The values used in this example show how to tune CPU resources per OSD. What the tuned value should be varies based on hardware and workload. See the Red Hat Ceph Storage Hardware Guide for guidelines and always test workloads before sending them into production.