Chapter 3. Red Hat OpenStack deployment best practices

Review the following best practices when you plan and prepare to deploy OpenStack. You can apply one or more of these practices in your environment.

3.1. Red Hat OpenStack deployment preparation

Before you deploy Red Hat OpenStack Platform (RHOSP), review the following list of deployment preparation tasks. You can apply one or more of the deployment preparation tasks in your environment:

Set a subnet range for introspection to accommodate the maximum overcloud nodes for which you want to perform introspection at a time
When you use director to deploy and configure RHOSP, use CIDR notations for the control plane network to accommodate all overcloud nodes that you add now or in the future.
Set the root password on your overcloud image to allow console access to the overcloud image

Use the console to troubleshoot failed deployments when networking is set incorrectly. For more information, see Installing virt-customize to the director and Setting the Root Password in the Partner Integration Guide. Adhere to the information security policies of your organization for password management when you implement this recommendation.

Alternatively, you can use the userdata_root_password.yaml template to configure the root password by using the NodeUserData parameter. You can find the template in /usr/share/openstack-tripleo-heat-templates/firstboot/userdata_root_password.yaml.

The following example uses the template to configure the NodeUserData parameter:

resource_registry:
  OS::TripleO::NodeUserData: /usr/share/openstack-tripleo-heat-templates/firstboot/userdata_root_password.yaml
parameter_defaults:
  NodeRootPassword: '<password>'
Use scheduler hints to assign hardware to a role
  • Use scheduler hints to assign hardware to a role, such as Controller, Compute, CephStorage, and others. Scheduler hints provide easier identification of deployment issues that affect only a specific piece of hardware.
  • Do not use profile tagging when you use scheduler hints.
  • In performance testing, use identical hardware for specific roles to reduce variability in testing and performance results.
  • For more information, see Assigning Specific Node IDs in the Advanced Overcloud Customization Guide.
Set the World Wide Name (WWN) as the root disk hint for each node to prevent nodes from using the wrong disk during deployment and booting
When nodes contain multiple disks, use the introspection data to set the WWN as the root disk hint for each node. This prevents the node from using the wrong disk during deployment and booting. For more information, see Defining the Root Disk for Multi-Disk Clusters in the Director Installation and Usage guide.
Enable the Bare Metal service (ironic) automated cleaning on nodes that have more than one disk

Use the Bare Metal service automated cleaning to erase metadata on nodes that have more than one disk and are likely to have multiple boot loaders. Nodes might become inconsistent with the boot disk due to the presence of multiple bootloaders on disks, which leads to node deployment failure when you attempt to pull the metadata that uses the wrong URL.

To enable the Bare Metal service automated cleaning, on the undercloud node, edit the undercloud.conf file and add the following line:

clean_nodes = true
Limit the number of nodes for Bare Metal (ironic) introspection

If you perform introspection on all nodes at the same time, failures might occur due to network constraints. Perform introspection on up to 50 nodes at a time.

Ensure that the dhcp_start and dhcp_end range in the undercloud.conf file is large enough for the number of nodes that you expect to have in the environment.

If there are insufficient available IPs, do not issue more than the size of the range. This limits the number of simultaneous introspection operations. To allow the introspection DHCP leases to expire, do not issue more IP addresses for a few minutes after the introspection completes.

Prepare Ceph for different types of configurations

The following list is a set of recommendations for different types of configurations:

  • All-flash OSD configuration

    Each OSD requires additional CPUs according to the IOPS capacity of the device type, so Ceph IOPS are CPU-limited at a lower number of OSDs. This is true for NVM SSDs, which can have two orders of magnitude higher IOPS capacity than traditional HDDs. For SATA/SAS SSDs, expect one order of magnitude greater random IOPS/OSD than HDDs, but only about two to four times the sequential IOPS increase. You can supply less CPU resources to Ceph than Ceph needs for OSD devices.

  • Hyper Converged Infrastructure (HCI)

    It is recommended to reserve at least half of your CPU capacity, memory, and network for the OpenStack Compute (nova) guests. Ensure that you have enough CPU capacity and memory to support both OpenStack Compute (nova) guests and Ceph Storage. Observe memory consumption because Ceph Storage memory consumption is not elastic. On a multi-CPUs socket system, limit Ceph CPU consumption with NUMA-pinning Ceph to a single socket. For example, use the numactl -N 0 -p 0 command. Do not hard-pin Ceph memory consumption to 1 socket.

  • Latency-sensitive applications such as NFV

    Place Ceph on the same CPU socket as the network card that Ceph uses and limit the network card interruptions to that CPU socket if possible, with a network application that runs on a different NUMA socket and network card.

    If you use dual bootloaders, use disk-by-path for the OSD map. This gives the user consistent deployments, unlike using the device name. The following snippet is an example of the CephAnsibleDisksConfig for a disk-by-path mapping.

    CephAnsibleDisksConfig:
      osd_scenario: non-collocated
      devices:
        - /dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:0:0
        - /dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:1:0
      dedicated_devices:
        - /dev/nvme0n1
        - /dev/nvme0n1
      journal_size: 512

3.2. Red Hat OpenStack deployment configuration

Review the following list of recommendations for your Red Hat OpenStack Platform(RHOSP) deployment configuration:

Validate the heat templates with a small scale deployment
Deploy a small environment that consists of at least three Controllers, one Compute note, and three Ceph Storage nodes. You can use this configuration to ensure that all of your heat templates are correct.
Disable telemetry notifications on the undercloud

You can disable telemetry notifications on the undercloud for the following OpenStack services to decrease the RabbitMQ queue:

  • Compute (nova)
  • Networking (neutron)
  • Orchestration (heat)
  • Identity (keystone)

To disable the notifications, in the /usr/share/openstack-tripleo-heat-templates/environments/disable-telemetry.yaml template, set the notification driver setting to noop.

Limit the number of nodes that are provisioned at the same time

Fifty is the typical amount of servers that can fit within a average enterprise-level rack unit, therefore, you can deploy an average of one rack of nodes at one time.

To minimize the debugging necessary to diagnose issues with the deployment, deploy no more than 50 nodes at one time. However, if you want to deploy a higher number of nodes, Red Hat has successfully tested up to 100 nodes simultaneously.

To scale Compute nodes in batches, use the openstack overcloud deploy command with the --limit option. This can result in saved time and lower resource consumption on the undercloud.

Note

The --limit option is in Technology Preview.

Use this option with a comma-separated list of tags from the config-download playbook to run the deployment with a specific set of config-download tasks.

Disable unused NICs

If the overcloud has any unused NICs during the deployment, you must define the unused interfaces in the NIC configuration templates and set the interfaces to use_dhcp: false and defroute: false.

If you do not define unused interfaces, there might be routing issues and IP allocation problems during introspection and scaling operations. By default, the NICs set BOOTPROTO=dhcp, which means the unused overcloud NICs consume IP addresses that are needed for the PXE provisioning. This can reduce the pool of available IP addresses for your nodes.

Power off unused Bare Metal Provisioning (ironic) nodes
Ensure that you power off any unused Bare Metal Provisioning (ironic) nodes in maintenance mode. Red Hat has identified cases where nodes from previous deployments are left in maintenance mode in a powered on state. This can occur with Bare Metal automated cleaning, where a node that fails cleaning is set to maintenance mode. Bare Metal Provisioning does not track the power state of nodes in maintenance mode and incorrectly reports the power state as off. This can cause problems with ongoing deployments. When you redeploy after a failed deployment, ensure that you power off all unused nodes that use the power management device of the node.

3.3. Tuning the undercloud

Review this section when you plan to scale your RHOSP deployment and apply tuning to your default undercloud settings.

If you use the Telemetry service (ceilometer), improve the performance of the service

Because the Telemetry service is CPU-intensive, telemetry is not enabled by default in RHOSP 16.2. If you use want to use Telemetry, you can improve the performance of the service.

For more information, see Configuration recommendations for the Telemetry service in the Deployment Recommendations for Specific Red Hat OpenStack Platform Services Guide.

Separate the provisioning and configuration processes
  • To create only the stack and associated RHOSP resources, you can run the deployment command with the --stack-only option.
  • Red Hat recommends separating the stack and config-download steps when deploying more than 100 nodes.

Include any environment files that are required for your overcloud:

$ openstack overcloud deploy \
  --templates \
  -e <environment-file1.yaml> \
  -e <environment-file2.yaml> \
  ...
  --stack-only
  • After you have provisioned the stack, you can enable the SSH access for the tripleo-admin user from the undercloud to the overcloud. The config-download process uses the tripleo-admin user to perform the Ansible based configuration:

    $ openstack overcloud admin authorize
  • To disable the overcloud stack creation and run only the config-download workflow to apply the software configuration, you can run the deployment command with the --config-download-only option. Include any environment files that are required for your overcloud:

    $ openstack overcloud deploy \
     --templates \
     -e <environment-file1.yaml> \
     -e <environment-file2.yaml> \
      ...
     --config-download-only
  • To limit the config-download playbook execution to a specific node or set of nodes, you can use the --limit option.
  • The --limit option can be used to separate nodes into different roles, to limit the number of nodes to deploy, or to separate nodes with a specific hardware type. For scale up operations, when you want to apply software configuration on the new nodes only, use the --limit option with the --config-download-only option.

    $ openstack overcloud deploy \
    --templates \
    -e <environment-file1.yaml> \
    -e <environment-file2.yaml> \
    ...
    --config-download-only --config-download-timeout --limit <Undercloud>,<Controller>,<Compute-1>,<Compute-2>

    If you use the --limit option always include <Controller> and <Undercloud> in the list. Tasks that use the external_deploy_steps interface, for example all Ceph configurations, are only executed if the <Undercloud> is included in the options list. All external_deploy_steps tasks run on the undercloud.

    For example, if you run a scale-up task to add a compute node that requires a connection to Ceph and you do not include <Undercloud> in the list, the Ceph configuration and cephx key files are missing, and the task fails. Do not use the --skip-tags external_deploy_steps option or the task fails.

3.4. Tuning the overcloud

Review the following section when you plan to scale your Red Hat OpenStack Platform (RHOSP) deployment and apply tuning to your default overcloud settings:

Increase OVN OVSDB client probe intervals to prevent failover

Increase OVSDB client probe intervals for large RHOSP deployments. Pacemaker triggers a failover of the ovn-dbs-bundle when it does not get a response from OVN within the configured timeout. To increase the OVN OVSDB client probe intervals to 360 seconds, edit the OVNDBSPacemakerTimeout parameter in your heat templates:

OVNDBSPacemakerTimeout: 360

On each Compute and Controller node, the OVN controller periodically probes the OVN SBDB and if these requests timeout, the OVN controller resynchronizes. When multiple Compute and Controller nodes are loaded with requests to create resources, the default 60 seconds timeout values are not sufficient. To increase the OVN SBDB client probe intervals to 180 seconds, edit the OVNOpenflowProbeInterval parameter in your heat templates:

ControllerParameters:
  OVNRemoteProbeInterval: 180000
  OVNOpenflowProbeInterval: 180
Note

During RHOSP user and service triggered operations, due to resource constraints, such as CPU or memory resource constraints, multiple components can reach their configured timeout values. This can result in timeout request failures to the haproxy front end or back end, messaging timeout, db query-related failures, cluster instability, and so on. Benchmark your overcloud environment after initial deployment to help identify timeout-related bottlenecks.

Set a high interval between instance network information cache updates

The default interval between instance network information cache updates is 60 seconds: heal_instance_info_cache_interval = 60. To ensure that the system can manage the load that healing of the instance cache puts on the neutron server, modify the value of the heal_instance_info_cache_interval parameter for your environment. For example:

  • For once every ten minutes, set heal_instance_info_cache_interval = 600.
  • For once every hour, set heal_instance_info_cache_interval = 3600.
Warning

Some third-party plugins return inconsistent API database responses under heavy loads. ML2/OVS and ML2/OVN backends do not experience an inconsistent API database response problem. If your deployment uses ML2/OVS or ML2/OVN, you can disable the heal_instance_info_cache_interval parameter by setting the value to -1.

For more information about the parameter, see nova in the Configuration Reference guide.