Appendix A. Appendix

A.1. Scaling

To scale HCI nodes up or down, the same principles (and for the most part, methods) for scaling Compute or Ceph Storage nodes apply. Be mindful of the following caveats:

A.1.1. Scaling Up

To scale up HCI nodes in a pure HCI environment, use the same methods for scaling up Compute nodes. See Adding Additional Nodes (from Director Installation and Usage) for details.

The same methods apply for scaling up HCI nodes in a mixed HCI environment. When you tag new nodes, remember to use the right flavor (in this case, osdcompute). See Section 3.2.3, “Creating and Assigning a New Flavor”.

A.1.2. Scaling Down

The process for scaling down HCI nodes (in both pure and mixed HCI environments) can be summarized as follows:

  1. Disable and rebalance the Ceph OSD services on the HCI node. This step is necessary because the director does not automatically rebalance the Red Hat Ceph Storage cluster when you remove HCI or Ceph Storage nodes.

    See Scaling Down and Replacing Ceph Storage Nodes (from Red Hat Ceph Storage for the Overcloud). Do not follow the steps here for removing the node, as you will need to migrate instances and disable the Compute services on the node first.

  2. Migrate the instances from the HCI nodes. See Migrating Instances for instructions.
  3. Disable the Compute services on the nodes to prevent them from being used to spawn new instances.
  4. Remove the node from the overcloud.

For the third and fourth step (disabling Compute services and removing the node), see Removing Compute Nodes from Director Installation and Usage.

A.2. Upstream Tools

Heat templates, environment files, scripts, and other resources relevant to hyper-convergence in OpenStack are available from the following upstream Github repository:

https://github.com/RHsyseng/hci/tree/master/custom-templates

This repository features the scripts in Section A.2.1, “Compute CPU and Memory Calculator” and Section A.2.2, “Custom Script to Configure NUMA Pinning for Ceph OSD Services”.

To use these scripts, clone the repository directly to your undercloud:

$ git clone https://github.com/RHsyseng/hci

A.2.1. Compute CPU and Memory Calculator

The Section 4.1, “Reserve CPU and Memory Resources for Compute” section explains how to determine suitable values for reserved_host_memory_mb and cpu_allocation_ratio on hyper-converged nodes. You can also use the following Python script (also available from the repository in Section A.2, “Upstream Tools”) to perform the computations for you:

#!/usr/bin/env python
# Filename:                nova_mem_cpu_calc.py
# Supported Langauge(s):   Python 2.7.x
# Time-stamp:              <2017-03-10 20:31:18 jfulton>
# -------------------------------------------------------
# This program was originally written by Ben England
# -------------------------------------------------------
# Calculates cpu_allocation_ratio and reserved_host_memory
# for nova.conf based on on the following inputs:
#
# input command line parameters:
# 1 - total host RAM in GB
# 2 - total host cores
# 3 - Ceph OSDs per server
# 4 - average guest size in GB
# 5 - average guest CPU utilization (0.0 to 1.0)
#
# It assumes that we want to allow 3 GB per OSD
# (based on prior Ceph Hammer testing)
# and that we want to allow an extra 1/2 GB per Nova (KVM guest)
# based on test observations that KVM guests' virtual memory footprint
# was actually significantly bigger than the declared guest memory size
# This is more of a factor for small guests than for large guests.
# -------------------------------------------------------
import sys
from sys import argv

NOTOK = 1  # process exit status signifying failure
MB_per_GB = 1000

GB_per_OSD = 3
GB_overhead_per_guest = 0.5  # based on measurement in test environment
cores_per_OSD = 1.0  # may be a little low in I/O intensive workloads

def usage(msg):
  print msg
  print(
    ("Usage: %s Total-host-RAM-GB Total-host-cores OSDs-per-server " +
     "Avg-guest-size-GB Avg-guest-CPU-util") % sys.argv[0])
  sys.exit(NOTOK)

if len(argv) < 5: usage("Too few command line params")
try:
  mem = int(argv[1])
  cores = int(argv[2])
  osds = int(argv[3])
  average_guest_size = int(argv[4])
  average_guest_util = float(argv[5])
except ValueError:
  usage("Non-integer input parameter")

average_guest_util_percent = 100 * average_guest_util

# print inputs
print "Inputs:"
print "- Total host RAM in GB: %d" % mem
print "- Total host cores: %d" % cores
print "- Ceph OSDs per host: %d" % osds
print "- Average guest memory size in GB: %d" % average_guest_size
print "- Average guest CPU utilization: %.0f%%" % average_guest_util_percent

# calculate operating parameters based on memory constraints only
left_over_mem = mem - (GB_per_OSD * osds)
number_of_guests = int(left_over_mem /
                       (average_guest_size + GB_overhead_per_guest))
nova_reserved_mem_MB = MB_per_GB * (
                        (GB_per_OSD * osds) +
                        (number_of_guests * GB_overhead_per_guest))
nonceph_cores = cores - (cores_per_OSD * osds)
guest_vCPUs = nonceph_cores / average_guest_util
cpu_allocation_ratio = guest_vCPUs / cores

# display outputs including how to tune Nova reserved mem

print "\nResults:"
print "- number of guests allowed based on memory = %d" % number_of_guests
print "- number of guest vCPUs allowed = %d" % int(guest_vCPUs)
print "- nova.conf reserved_host_memory = %d MB" % nova_reserved_mem_MB
print "- nova.conf cpu_allocation_ratio = %f" % cpu_allocation_ratio

if nova_reserved_mem_MB > (MB_per_GB * mem * 0.8):
    print "ERROR: you do not have enough memory to run hyperconverged!"
    sys.exit(NOTOK)

if cpu_allocation_ratio < 0.5:
    print "WARNING: you may not have enough CPU to run hyperconverged!"

if cpu_allocation_ratio > 16.0:
    print(
        "WARNING: do not increase VCPU overcommit ratio " +
        "beyond OSP8 default of 16:1")
    sys.exit(NOTOK)

print "\nCompare \"guest vCPUs allowed\" to \"guests allowed based on memory\" for actual guest count"

A.2.2. Custom Script to Configure NUMA Pinning for Ceph OSD Services

The Configure Ceph NUMA Pinning section describes the creation of a script that pins the Ceph OSD services to an available NUMA socket. This script, ~/templates/numa-systemd-osd.sh, should:

  • Take the network interface used for Ceph network traffic
  • Use lstopo to determine that interface’s NUMA socket
  • Configure numactl to start the OSD service with a NUMA policy that prefers the NUMA node of the Ceph network’s interface
  • Restart each Ceph OSD daemon sequentially so that the service runs with the new NUMA option
Important

The numa-systemd-osd.sh script will also attempt to install NUMA configuration tools. As such, the overcloud nodes must also be registered with Red Hat, as described in Registering the Nodes (from Red Hat Ceph Storage for the Overcloud).

The following script does all of these for Mixed HCI deployments, assuming that the hostname of each hyper-converged node uses either ceph or osd-compute. Edit the top-level IF statement accordingly if you are customizing the hostnames of hyper-converged nodes:

#!/usr/bin/env bash
{
if [[ `hostname` = *"ceph"* ]] || [[ `hostname` = *"osd-compute"* ]]; then  # 1

    # Verify the passed network interface exists
    if [[ ! $(ip add show $OSD_NUMA_INTERFACE) ]]; then
	exit 1
    fi

    # If NUMA related packages are missing, then install them
    # If packages are baked into image, no install attempted
    for PKG in numactl hwloc; do
	if [[ ! $(rpm -q $PKG) ]]; then
	    yum install -y $PKG
	    if [[ ! $? ]]; then
		echo "Unable to install $PKG with yum"
		exit 1
	    fi
	fi
    done

    if [[ ! $(lstopo-no-graphics | tr -d [:punct:] | egrep "NUMANode|$OSD_NUMA_INTERFACE") ]];
    then
	echo "No NUMAnodes found. Exiting."
	exit 1
    fi

    # Find the NUMA socket of the $OSD_NUMA_INTERFACE
    declare -A NUMASOCKET
    while read TYPE SOCKET_NUM NIC ; do
	if [[ "$TYPE" == "NUMANode" ]]; then
	    NUMASOCKET=$(echo $SOCKET_NUM | sed s/L//g);
	fi
	if [[ "$NIC" == "$OSD_NUMA_INTERFACE" ]]; then
	    # because $NIC is the $OSD_NUMA_INTERFACE,
	    # the NUMASOCKET has been set correctly above
	    break # so stop looking
	fi
    done < <(lstopo-no-graphics | tr -d [:punct:] | egrep "NUMANode|$OSD_NUMA_INTERFACE")

    if [[ -z $NUMASOCKET ]]; then
	echo "No NUMAnode found for $OSD_NUMA_INTERFACE. Exiting."
	exit 1
    fi

    UNIT='/usr/lib/systemd/system/ceph-osd@.service'
    # Preserve the original ceph-osd start command
    CMD=$(crudini --get $UNIT Service ExecStart)

    if [[ $(echo $CMD | grep numactl) ]]; then
	echo "numactl already in $UNIT. No changes required."
	exit 0
    fi

    # NUMA control options to append in front of $CMD
    NUMA="/usr/bin/numactl -N $NUMASOCKET --preferred=$NUMASOCKET"

    # Update the unit file to start with numactl
    # TODO: why doesn't a copy of $UNIT in /etc/systemd/system work with numactl?
    crudini --verbose --set $UNIT Service ExecStart "$NUMA $CMD"

    # Reload so updated file is used
    systemctl daemon-reload

    # Restart OSDs with NUMA policy (print results for log)
    OSD_IDS=$(ls /var/lib/ceph/osd | awk 'BEGIN { FS = "-" } ; { print $2 }')
    for OSD_ID in $OSD_IDS; do
	echo -e "\nStatus of OSD $OSD_ID before unit file update\n"
	systemctl status ceph-osd@$OSD_ID
	echo -e "\nRestarting OSD $OSD_ID..."
	systemctl restart ceph-osd@$OSD_ID
	echo -e "\nStatus of OSD $OSD_ID after unit file update\n"
	systemctl status ceph-osd@$OSD_ID
    done
fi
}  2>&1 > /root/post_deploy_heat_output.txt
1
The top-level IF statement assumes that the hostname of each hyper-converged node contains either ceph or osd-compute. Edit the IF statement accordingly if you are customizing the hostnames of hyper-converged nodes.

Likewise, on a Pure HCI deployment all Compute nodes are hyper-converged. As such, on a Pure HCI deployment change the top-level IF statement to:

if [[ `hostname` = *"compute"* ]]; then