Chapter 6. Deploying SR-IOV technologies

Special considerations may be needed for networking performance beyond what is possible through virtual networking. SR-IOV allows near bare metal performance by allowing instances from OpenStack direct access to a shared PCIe resource through virtual resources.

6.1. Prerequisites

Note

Do not manually edit values in /etc/tuned/cpu-partitioning-variables.conf that are modified by Director heat templates.

6.2. Configuring SR-IOV

Note

The CPU assignments, memory allocation and NIC configurations of the following examples may differ from your topology and use case.

  1. Generate the built-in ComputeSriov to define nodes in the OpenStack cluster that will run NeutronSriovAgent, NeutronSriovHostConfig and default compute services.

    # openstack overcloud roles generate \
    -o /home/stack/templates/roles_data.yaml \
    Controller ComputeSriov
  2. Include the neutron-sriov.yaml and roles_data.yaml files when generating overcloud_images.yaml so that SR-IOV containers are prepared.

    SERVICES=\
    /usr/share/openstack-tripleo-heat-templates/environments/services
    
    openstack overcloud container image prepare \
    --namespace=registry.access.redhat.com/rhosp13 \
    --push-destination=192.168.24.1:8787 \
    --prefix=openstack- \
    --tag-from-label {version}-{release} \
    -e ${SERVICES}/neutron-sriov.yaml \
    --roles-file /home/stack/templates/roles_data.yaml \
    --output-env-file=/home/stack/templates/overcloud_images.yaml \
    --output-images-file=/home/stack/local_registry_images.yaml

    For more information on container image preparation, see Director Installation and Usage.

  3. To apply the KernelAgs and TunedProfile parameters, include the host-config-and-reboot.yaml file from /usr/share/openstack-tripleo-heat-templates/environments to your deployment script.

    openstack overcloud deploy --templates \
    … \
    -e /usr/share/openstack-tripleo-heat-templates/environments/host-config-and-reboot.yaml \
    ...
  4. Configure the parameters for the SR-IOV nodes under parameter_defaults in accordance with the needs of your cluster, and the configuration of your hardware. These settings are typically added to the network-environment.yaml file.

      NeutronBridgeMappings: 'tenant:br-link0'
      NeutronNetworkType: 'vlan'
      NeutronNetworkVLANRanges: 'tenant:22:22,tenant:25:25'
      NovaPCIPassthrough:
        - devname: "p7p1"
           physical_network: "tenant"
        - devname: "p7p2"
          physical_network: "tenant"
      NeutronPhysicalDevMappings: "tenant:p7p1,tenant:p7p2"
      NeutronSriovNumVFs:
        - p7p1:5
        - p7p2:5
      NeutronTunnelTypes: ''
  5. In the same file, configure role specific parameters for SR-IOV compute nodes.

      ComputeSriovParameters:
        KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=32 iommu=pt intel_iommu=on isolcpus=1-19,21-39"
        TunedProfileName: "cpu-partitioning"
        IsolCpusList: "1-19,21-39"
        NovaVcpuPinSet: '1-19,21-39'
        NovaReservedHostMemory: 4096
  6. Configure the SR-IOV enabled interfaces in the compute.yaml network configuration template. Ensure the interfaces are configured as standalone NICs for the purposes of creating SR-IOV virtual functions (VFs):

                 - type: interface
                    name: p7p3
                    mtu: 9000
                    use_dhcp: false
                    defroute: false
                    nm_controlled: true
                    hotplug: true
    
                  - type: interface
                    name: p7p4
                    mtu: 9000
                    use_dhcp: false
                    defroute: false
                    nm_controlled: true
                    hotplug: true
  7. Ensure that the list of default filters includes the value AggregateInstanceExtraSpecsFilter.

    NovaSchedulerDefaultFilters: ['AvailabilityZoneFilter','RamFilter','ComputeFilter','ComputeCapabilitiesFilter','ImagePropertiesFilter','ServerGroupAntiAffinityFilter','ServerGroupAffinityFilter','PciPassthroughFilter','AggregateInstanceExtraSpecsFilter']
  8. Deploy the overcloud.
TEMPLATES_HOME="/usr/share/openstack-tripleo-heat-templates"
CUSTOM_TEMPLATES="/home/stack/templates"

openstack overcloud deploy --templates \
  -r ${CUSTOM_TEMPLATES}/roles_data.yaml \
  -e ${TEMPLATES_HOME}/environments/host-config-and-reboot.yaml \
  -e ${TEMPLATES_HOME}/environments/services/neutron-sriov.yaml \
  -e ${CUSTOM_TEMPLATES}/network-environment.yaml

6.3. Configuring Hardware Offload (Technology Preview)

OpenvSwitch (OVS) hardware offload is a technology preview and not recommended for production deployments. For more information about technology preview features, see Scope of Coverage Details.

OVS hardware offload takes advantage of single root input/output virtualization (SR-IOV), therefore some of the same configuration steps apply.

6.3.1. OpenvSwitch hardware offload

To enable OVS hardware offload, complete the following steps.

Procedure

  1. Generate the ComputeSriov role:

    openstack overcloud roles generate -o roles_data.yaml Controller ComputeSriov
  2. Configure the role specific parameters for the relevant nodes under the parameter_defaults section in the network-environment.yaml file:

    • For VLAN, set the physical_network parameter to the name of the network you create in neutron after deployment. This value should also be in NeutronBridgeMappings.
    • For VXLAN, set the physical_network parameter to the string value null.

      parameter_defaults:
        ComputeSriovParameters:
          KernelArgs: “default_hugepagesz=1GB hugepagesz=1G hugepages=128 intel_iommu=on iommu=pt”
          IsolCpusList: 2-9,21-29,11-19,31-39
          NovaVcpuPinSet: 1-9,21-29,11-19,31-39
          NovaPCIPassthrough:
            - devname: “p7p1”
              physical_network: “tenant”
         NeutronBridgeMappings:
            - “tenant:br-tenant”
  3. Ensure that the list of default filters includes the value NUMATopologyFilter:

      NovaSchedulerDefaultFilters: ['RetryFilter','AvailabilityZoneFilter','ComputeFilter','ComputeCapabilitiesFilter','ImagePropertiesFilter','ServerGroupAntiAffinityFilter','ServerGroupAffinityFilter','PciPassthroughFilter','NUMATopologyFilter']
  4. Configure one or more network interfaces intended for hardware offload in the compute-sriov.yaml configuration file:

      - type: ovs_bridge
        name: br-tenant
        mtu: 9000
        members:
        - type: sriov_pf
          name: p7p1
          numvfs: 5
          mtu: 9000
          primary: true
          promisc: true
          use_dhcp: false
          link_mode: switchdev
    Note

    Do not configure Mellanox network interfaces as a nic-config interface type ovs-vlan because this prevents tunnel endpoints such as VXLAN from passing traffic due to driver limitations.

  5. Include the following files during the deployment of the overcloud:

    • ovs-hw-offload.yaml
    • host-config-and-reboot.yaml

      TEMPLATES_HOME=”/usr/share/openstack-tripleo-heat-templates”
      CUSTOM_TEMPLATES=”/home/stack/templates”
      
      openstack overcloud deploy --templates \
        -r ${CUSTOME_TEMPLATES}/roles_data.yaml \
        -e ${TEMPLATES_HOME}/environments/ovs-hw-offload.yaml \
        -e ${TEMPLATES_HOME}/environments/host-config-and-reboot.yaml \
        -e ${CUSTOME_TEMPLATES}/network-environment.yaml

6.3.2. Verification

  1. Confirm that a pci device has its mode configured as switchdev:

    # devlink dev eswitch show pci/0000:03:00.0
    pci/0000:03:00.0: mode switchdev inline-mode none encap enable
  2. Confirm offload is enabled in OVS:

    # ovs-vsctl get Open_vSwitch . other_config:hw-offload
    “true”

6.4. Deploying an instance for SR-IOV

It is recommended to use host aggregates to separate high performance compute hosts. For information on creating host aggregates and associated flavors for scheduling see Creating host aggregates.

Note

You should use host aggregates to separate CPU pinned instances from unpinned instances. Instances that do not use CPU pinning do not respect the resourcing requirements of instances that use CPU pinning.

Display an instance for SR-IOV by performing the following steps:

  1. Create a flavor.

    # openstack flavor create <flavor> --ram <MB> --disk <GB> --vcpus <#>
  2. Create the network.

    openstack network create net1 --provider-physical-network tenant --provider-network-type vlan --provider-segment <VLAN-ID>
  3. Create the port.

    • Use vnic-type direct to create an SR-IOV VF port.

      # openstack port create --network net1 --vnic-type direct sriov_port
    • Use vnic-type direct-physical to create an SR-IOV PF port.

      # openstack port create --network net1 --vnic-type direct-physical sriov_port
  4. Deploy an instance

    # openstack server create --flavor <flavor> --image <image> --nic port-id=<id> <instance name>

6.5. Creating host aggregates

It is recommended to deploy guests using cpu pinning and hugepages for increased performance. You can schedule high performance instances on a subset of hosts by matching aggregate metadata with flavor metadata.

6.5.1. Procedure

  1. Ensure that the AggregateInstanceExtraSpecFilter value is added to the scheduler_default_filters parameter in the nova.conf configuration file. If you add AggregateInstanceExtraSpecFilter to satisfy this requirement, restart the nova container.
  2. Create an aggregate group for SR-IOV, and add relevant hosts. Define metadata, for example, sriov=true, that matches defined flavor metadata.

    # openstack aggregate create sriov_group
    # openstack aggregate add host sriov_group compute-sriov-0.localdomain
    # openstack aggregate set sriov_group sriov=true
  3. Create a flavor.

    # openstack flavor create <flavor> --ram <MB> --disk <GB> --vcpus <#>
  4. Set additional flavor properties. Note that the defined metadata, sriov=true, matches the defined metadata on the SR-IOV aggregate.

    openstack flavor set --property sriov=true --property hw:cpu_policy=dedicated --property hw:mem_page_size=1GB <flavor>