Appendix A. Appendix

A.1. Scaling

To scale HCI nodes up or down, the same principles (and for the most part, methods) for scaling Compute or Ceph Storage nodes apply. Be mindful of the following caveats described below.

A.1.1. Scaling Up

To scale up HCI nodes in a pure HCI environment (as in, if all Compute nodes are hyper-converged nodes), use the same methods for scaling up Compute nodes. See Adding Additional Nodes (from Director Installation and Usage) for details.

The same methods apply for scaling up HCI nodes in a mixed HCI environment (when the overcloud features both hyper-converged and normal Compute nodes). When you tag new nodes, remember to use the right flavor (in this case, osdcompute). See Section 3.2, “Creating and Assigning a New Flavor”.

A.1.2. Scaling Down

The process for scaling down HCI nodes (in both pure and mixed HCI environments) can be summarized as follows:

  1. Disable and rebalance the Ceph OSD services on the HCI node. This step is necessary because the director does not automatically rebalance the Red Hat Ceph Storage cluster when you remove HCI or Ceph Storage nodes.

    See Scaling Down and Replacing Ceph Storage Nodes (from Deploying an Overcloud with Containerized Red Hat Ceph). Do not follow the steps here for removing the node, as you will need to migrate instances and disable the Compute services on the node first.

  2. Migrate the instances from the HCI nodes. See Migrating instances from a Compute node for instructions.
  3. Disable the Compute services on the nodes to prevent them from being used to spawn new instances.
  4. Remove the node from the overcloud.

For the third and fourth step (disabling Compute services and removing the node), see Removing Compute Nodes (from Director Installation and Usage).

A.2. Compute CPU and Memory Calculator

With this release, you can use OpenStack Workflow to automatically set suitable CPU and memory allocation settings for hyper-converged nodes. However, in some instances you may only want to let OpenStack Workflow set either CPU and memory so you can set the other yourself. To do so, you can override them normally (as described in Section 4.1.1, “Override Calculated Settings for Memory or CPU Allocation”).

You can use the following script to calculate suitable baseline NovaReservedHostMemory and cpu_allocation_ratio values for your hyper-converged nodes.

nova_mem_cpu_calc.py

The following subsections describe both settings in greater detail.

A.2.1. NovaReservedHostMemory

The NovaReservedHostMemory parameter sets the amount of memory (in MB) to reserve for the host node. To determine an appropriate value for hyper-converged nodes, assume that each OSD consumes 3 GB of memory. Given a node with 256 GB memory and 10 OSDs, you can allocate 30 GB of memory for Ceph, leaving 226 GB for Compute. With that much memory a node can host, for example, 113 instances using 2 GB of memory each.

However, you still need to consider additional overhead per instance for the hypervisor. Assuming this overhead is 0.5 GB, the same node can only host 90 instances, which accounts for the 226 GB divided by 2.5 GB. The amount of memory to reserve for the host node (that is, memory the Compute service should not use) is:

(In * Ov) + (Os * RA)

Where:

  • In: number of instances
  • Ov: amount of overhead memory needed per instance
  • Os: number of OSDs on the node
  • RA: amount of RAM that each OSD should have

With 90 instances, this give us (90*0.5) + (10*3) = 75 GB. The Compute service expects this value in MB, namely 75000.

The following Python code provides this computation:

left_over_mem = mem - (GB_per_OSD * osds)
number_of_guests = int(left_over_mem /
    (average_guest_size + GB_overhead_per_guest))
nova_reserved_mem_MB = MB_per_GB * (
    (GB_per_OSD * osds) +
    (number_of_guests * GB_overhead_per_guest))

A.2.2. cpu_allocation_ratio

The Compute scheduler uses cpu_allocation_ratio when choosing which Compute nodes on which to deploy an instance. By default, this is 16.0 (as in, 16:1). This means if there are 56 cores on a node, the Compute scheduler will schedule enough instances to consume 896 vCPUs on a node before considering the node unable to host any more.

To determine a suitable cpu_allocation_ratio for a hyper-converged node, assume each Ceph OSD uses at least one core (unless the workload is I/O-intensive, and on a node with no SSD). On a node with 56 cores and 10 OSDs, this would leave 46 cores for Compute. If each instance uses 100 per cent of the CPU it receives, then the ratio would simply be the number of instance vCPUs divided by the number of cores; that is, 46 / 56 = 0.8. However, since instances do not normally consume 100 per cent of their allocated CPUs, you can raise the cpu_allocation_ratio by taking the anticipated percentage into account when determining the number of required guest vCPUs.

So, if we can predict that instances will only use 10 per cent (or 0.1) of their vCPU, then the number of vCPUs for instances can be expressed as 46 / 0.1 = 460. When this value is divided by the number of cores (56), the ratio increases to approximately 8.

The following Python code provides this computation:

cores_per_OSD = 1.0
average_guest_util = 0.1 # 10%
nonceph_cores = cores - (cores_per_OSD * osds)
guest_vCPUs = nonceph_cores / average_guest_util
cpu_allocation_ratio = guest_vCPUs / cores
Tip

You can also use the nova_mem_cpu_calc.py script to compute baseline values for both reserved_host_memory and cpu_allocation_ratio. See Section A.2, “Compute CPU and Memory Calculator” for more details.