Chapter 8. Configuring OVS TC-flower hardware offload

In your Red Hat OpenStack Platform (RHOSP) network functions virtualization (NFV) deployment, you can achieve higher performance with Open vSwitch (OVS) TC-flower hardware offload. Hardware offloading diverts networking tasks from the CPU to a dedicated processor on a network interface controller (NIC). These specialized hardware resources provide additional computing power that frees the CPU to perform more valuable computational tasks.

Configuring RHOSP for OVS hardware offload is similar to configuring RHOSP for SR-IOV.

Important

This section includes examples that you must modify for your topology and functional requirements. For more information, see Hardware requirements for NFV.

Prerequisites

  • A RHOSP undercloud.

    You must install and configure the undercloud before you can deploy the overcloud. For more information, see Installing and managing Red Hat OpenStack Platform with director.

    Note

    RHOSP director modifies OVS hardware offload configuration files through the key-value pairs that you specify in director templates and custom environment files. You must not modify the OVS hardware offload configuration files directly.

  • Access to the undercloud host and credentials for the stack user.
  • Ensure that the NICs, their applications, the VF guest, and OVS reside on the same NUMA Compute node.

    Doing so helps to prevent performance degradation from cross-NUMA operations.

  • Access to sudo on the hosts that contain NICs.
  • Ensure that you keep the NIC firmware updated.

    Yum or dnf updates might not complete the firmware update. For more information, see your vendor documentation.

  • Enable security groups and port security on switchdev ports for the connection tracking (conntrack) module to offload OpenFlow flows to hardware.

Procedure

Use RHOSP director to install and configure RHOSP in an OVS hardware offload environment. The high-level steps are:

  1. Create a network configuration file, network_data.yaml, to configure the physical network for your overcloud, by following the instructions in Configuring overcloud networking in Installing and managing Red Hat OpenStack Platform with director.
  2. Generate roles and image files.
  3. Configure PCI passthrough devices for OVS hardware offload.
  4. Add role-specific parameters and other configuration overrides.
  5. Create a bare metal nodes definition file.
  6. Create a NIC configuration template for OVS hardware offload.
  7. Provision overcloud networks and VIPs.

    For more information, see:

  8. Provision overcloud bare metal nodes.

    For more information, see Provisioning bare metal nodes for the overcloud in the Installing and managing Red Hat OpenStack Platform with director guide.

  9. Deploy an OVS hardware offload overcloud.

8.1. Generating roles and image files for OVS TC-flower hardware offload

Red Hat OpenStack Platform (RHOSP) director uses roles to assign services to nodes. When configuring RHOSP in an OVS TC-flower hardware offload environment, you create a new role that is based on the default role, Compute, that is provided with your RHOSP installation.

The undercloud installation requires an environment file to determine where to obtain container images and how to store them.

Prerequisites

  • Access to the undercloud host and credentials for the stack user.

Procedure

  1. Log in to the undercloud as the stack user.
  2. Source the stackrc file:

    $ source ~/stackrc
  3. Generate an overcloud role for OVS hardware offload that is based on the Compute role:

    Example

    In this example, a role is created, ComputeOvsHwOffload, based on the Compute role. The roles file that the command generates is named, roles_data_compute_ovshwol.yaml:

    $ openstack overcloud roles generate -o \
    roles_data_compute_ovshwol.yaml Controller Compute:ComputeOvsHwOffload
    Note

    If your RHOSP environment includes a mix of OVS-DPDK, SR-IOV, and OVS TC-flower hardware offload technologies, you generate just one roles data file, such as roles_data.yaml to include all the roles:

    $ openstack overcloud roles generate -o /home/stack/templates/\
    roles_data.yaml Controller ComputeOvsDpdk ComputeOvsDpdkSriov \
    Compute:ComputeOvsHwOffload
  4. (Optional) change the HostnameFormatDefault: '%stackname%-compute-%index%' name for the ComputeOvsHwOffload role.
  5. To generate an images file, you run the openstack tripleo container image prepare command. The following inputs are needed:

    • The roles data file that you generated in an earlier step, for example, roles_data_compute_ovshwol.yaml.
    • The SR-IOV environment file appropriate for your Networking service mechanism driver:

      • ML2/OVN environments

        /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-ovn-sriov.yaml

      • ML2/OVS environments

        /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-sriov.yaml

        Example

        In this example, the overcloud_images.yaml file is being generated for an ML2/OVN environment:

        $ sudo openstack tripleo container image prepare \
          --roles-file ~/templates/roles_data_compute_ovshwol.yaml \
          -e /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-ovn-sriov.yaml \
          -e ~/containers-prepare-parameter.yaml \
          --output-env-file=/home/stack/templates/overcloud_images.yaml
  6. Note the path and file name of the roles data file and the images file that you have created. You use these files later when you deploy your overcloud.

Additional resources

8.2. Configuring PCI passthrough devices for OVS TC-flower hardware offload

When deploying Red Hat OpenStack Platform for an OVS TC-flower hardware offload environment, you must configure the PCI passthrough devices for the compute nodes in a custom environment file.

Prerequisites

  • Access to the one or more physical servers that contain the PCI cards.
  • Access to the undercloud host and credentials for the stack user.

Procedure

  1. Use one of the following commands on the physical server that contains the PCI cards:

    • If your overcloud is deployed:

      $ lspci -nn -s  <pci_device_address>

      Sample output

      3b:00.0 Ethernet controller [0200]: Intel Corporation Ethernet
      Controller X710 for 10GbE SFP+ [<vendor_id>: <product_id>] (rev 02)

    • If your overcloud has not been deployed:

      $ openstack baremetal introspection data save <baremetal_node_name> | jq '.inventory.interfaces[] | .name, .vendor, .product'
  2. Note the vendor and product IDs for PCI passthrough devices on the ComputeOvsHwOffload nodes. You will need these IDs in a later step.
  3. Log in to the undercloud as the stack user.
  4. Source the stackrc file:

    $ source ~/stackrc
  5. Create a custom environment YAML file, for example, ovshwol-overrides.yaml. Configure the PCI passthrough devices for the compute nodes by adding the following content to the file:

    parameter_defaults:
      NeutronOVSFirewallDriver: iptables_hybrid
      ComputeOvsHwOffloadParameters:
        IsolCpusList: 2-9,21-29,11-19,31-39
        KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=128 intel_iommu=on iommu=pt"
        OvsHwOffload: true
        TunedProfileName: "cpu-partitioning"
        NeutronBridgeMappings:
          - tenant:br-tenant
        NovaPCIPassthrough:
          - vendor_id: <vendor-id>
            product_id: <product-id>
            address: <address>
            physical_network: "tenant"
          - vendor_id: <vendor-id>
            product_id: <product-id>
            address: <address>
            physical_network: "null"
        NovaReservedHostMemory: 4096
        NovaComputeCpuDedicatedSet: 1-9,21-29,11-19,31-39
        ...
    Note

    If you are using Mellanox smart NICs, add DerivePciWhitelistEnabled: true under the ComputeOvsHwOffloadParameters parameter. When using OVS hardware offload, the Compute service (nova) scheduler operates similarly to SR-IOV passthrough for instance spawning.

    • Replace <vendor_id> with the vendor ID of the PCI device.
    • Replace <product_id> with the product ID of the PCI device.
    • Replace <NIC_address> with the address of the PCI device.
    • Replace <physical_network> with the name of the physical network the PCI device is located on.
    • For VLAN, set the physical_network parameter to the name of the network you create in neutron after deployment. This value should also be in NeutronBridgeMappings.
    • For VXLAN, set the physical_network parameter to null.

      Note

      Do not use the devname parameter when you configure PCI passthrough because the device name of a NIC can change. To create a Networking service (neutron) port on a PF, specify the vendor_id, the product_id, and the PCI device address in NovaPCIPassthrough, and create the port with the --vnic-type direct-physical option. To create a Networking service port on a virtual function (VF), specify the vendor_id and product_id in NovaPCIPassthrough, and create the port with the --vnic-type direct option. The values of the vendor_id and product_id parameters might be different between physical function (PF) and VF contexts.

  6. In the custom environment file, ensure that PciPassthroughFilter and NUMATopologyFilter are in the list of filters for the NovaSchedulerEnabledFilters parameter. The Compute service (nova) uses this parameter to filter a node:

    parameter_defaults:
      ...
      NovaSchedulerEnabledFilters:
        - AvailabilityZoneFilter
        - ComputeFilter
        - ComputeCapabilitiesFilter
        - ImagePropertiesFilter
        - ServerGroupAntiAffinityFilter
        - ServerGroupAffinityFilter
        - PciPassthroughFilter
        - NUMATopologyFilter
        - AggregateInstanceExtraSpecsFilter
    Note

    Optional: For details on how to troubleshoot and configure OVS Hardware Offload issues in RHOSP 17.1 with Mellanox ConnectX5 NICs, see Troubleshooting Hardware Offload.

  7. Note the path and file name of the custom environment file that you have created. You use this file later when you deploy your overcloud.

Additional resources

8.3. Adding role-specific parameters and configuration overrides for OVS TC-flower hardware offload

You can add role-specific parameters for the ComputeOvsHwOffload nodes and override default configuration values in a custom environment YAML file that Red Hat OpenStack Platform (RHOSP) director uses when deploying your OVS TC-flower hardware offload environment.

Prerequisites

  • Access to the undercloud host and credentials for the stack user.

Procedure

  1. Log in to the undercloud as the stack user.
  2. Source the stackrc file:

    $ source ~/stackrc
  3. Open the custom environment YAML file that you created in Section 8.2, “Configuring PCI passthrough devices for OVS TC-flower hardware offload”, or create a new one.
  4. Add role-specific parameters for the ComputeOvsHwOffload nodes to the custom environment file.

    Example

      ComputeOvsHwOffloadParameters:
        IsolCpusList: 9-63,73-127
        KernelArgs: default_hugepagesz=1GB hugepagesz=1G hugepages=100 amd_iommu=on iommu=pt numa_balancing=disable processor.max_cstate=0 isolcpus=9-63,73-127
        NovaReservedHostMemory: 4096
        NovaComputeCpuSharedSet: 0-8,64-72
        NovaComputeCpuDedicatedSet: 9-63,73-127
        TunedProfileName: "cpu-partitioning"

  5. Add the OvsHwOffload parameter under role-specific parameters with a value of true.

      ComputeOvsHwOffloadParameters:
        IsolCpusList: 9-63,73-127
        KernelArgs: default_hugepagesz=1GB hugepagesz=1G hugepages=100 amd_iommu=on iommu=pt numa_balancing=disable processor.max_cstate=0 isolcpus=9-63,73-127
        NovaReservedHostMemory: 4096
        NovaComputeCpuSharedSet: 0-8,64-72
        NovaComputeCpuDedicatedSet: 9-63,73-127
        TunedProfileName: "cpu-partitioning"
        OvsHwOffload: true
      ...
  6. Review the configuration defaults that RHOSP director uses to configure OVS hardware offload. These defaults are provided in the file, and they vary based on your mechanism driver:

    • ML2/OVN

      /usr/share/openstack-tripleo-heat-templates/environment/services/neutron-ovn-sriov.yaml

    • ML2/OVS

      /usr/share/openstack-tripleo-heat-templates/environment/services/neutron-sriov.yaml

  7. If you need to override any of the configuration defaults, add your overrides to the custom environment file.

    This custom environment file, for example, is where you can add Nova PCI whitelist values or set the network type.

    Example

    In this example, the Networking service (neutron) network type is set to VLAN and ranges are added for the tenants:

    parameter_defaults:
      NeutronNetworkType: vlan
      NeutronNetworkVLANRanges:
        - tenant:22:22
        - tenant:25:25
      NeutronTunnelTypes: ''
  8. If you created a new custom environment file, note its path and file name. You use this file later when you deploy your overcloud.

Additional resources

8.4. Creating a bare metal nodes definition file for OVS TC-flower hardware offload

Use Red Hat OpenStack Platform (RHOSP) director and a definition file to provision your bare metal nodes for your OVS TC-flower hardware offload environment. In the bare metal nodes definition file, define the quantity and attributes of the bare metal nodes that you want to deploy and assign overcloud roles to these nodes. Also define the network layout of the nodes.

Prerequisites

  • Access to the undercloud host and credentials for the stack user.

Procedure

  1. Log in to the undercloud as the stack user.
  2. Source the stackrc file:

    $ source ~/stackrc
  3. Create a bare metal nodes definition file, such as overcloud-baremetal-deploy.yaml, as instructed in Provisioning bare metal nodes for the overcloud in the Installing and managing Red Hat OpenStack Platform with director guide.
  4. In the bare metal nodes definition file, add a declaration to the Ansible playbook, cli-overcloud-node-kernelargs.yaml.

    The playbook contains kernel arguments to use when you provision bare metal nodes.

    - name: ComputeOvsHwOffload
    ...
      ansible_playbooks:
        - playbook: /usr/share/ansible/tripleo-playbooks/cli-overcloud-node-kernelargs.yaml
    ...
  5. If you want to set any extra Ansible variables when running the playbook, use the extra_vars property to set them.

    Note

    The variables that you add to extra_vars should be the same role-specific parameters for the ComputeOvsHwOffload nodes that you added to the custom environment file earlier in Section 8.3, “Adding role-specific parameters and configuration overrides for OVS TC-flower hardware offload”.

    Example

    - name: ComputeOvsHwOffload
    ...
      ansible_playbooks:
        - playbook: /usr/share/ansible/tripleo-playbooks/cli-overcloud-node-kernelargs.yaml
          extra_vars:
            kernel_args: 'default_hugepagesz=1GB hugepagesz=1G hugepages=100 amd_iommu=on iommu=pt isolcpus=9-63,73-127'
            tuned_isolated_cores: '9-63,73-127'
            tuned_profile: 'cpu-partitioning'
            reboot_wait_timeout: 1800

  6. Note the path and file name of the bare metal nodes definition file that you have created. You use this file later when you configure your NICs and as the input file for the overcloud node provision command when you provision your nodes.

Additional resources

8.5. Creating a NIC configuration template for OVS TC-flower hardware offload

Define your NIC configuration templates for an OVS TC-flower hardware offload environment by modifying copies of the sample Jinja2 templates that ship with Red Hat OpenStack Platform (RHOSP).

Prerequisites

  • Access to the undercloud host and credentials for the stack user.
  • Ensure that the NICs, their applications, the VF guest, and OVS reside on the same NUMA Compute node.

    Doing so helps to prevent performance degradation from cross-NUMA operations.

Procedure

  1. Log in to the undercloud as the stack user.
  2. Source the stackrc file:

    $ source ~/stackrc
  3. Copy a sample network configuration template.

    Copy a NIC configuration Jinja2 template from the examples in the /usr/share/ansible/roles/tripleo_network_config/templates/ directory. Choose the one that most closely matches your NIC requirements. Modify it as needed.

  4. In your NIC configuration template, for example, single_nic_vlans.j2, add your PF and VF interfaces. To create VFs, configure the interfaces as standalone NICs.

    Example

    ...
    - type: sriov_pf
      name: enp196s0f0np0
      mtu: 9000
      numvfs: 16
      use_dhcp: false
      defroute: false
      nm_controlled: true
      hotplug: true
      promisc: false
      link_mode: switchdev
    ...

    Note

    The numvfs parameter replaces the NeutronSriovNumVFs parameter in the network configuration templates. Red Hat does not support modification of the NeutronSriovNumVFs parameter or the numvfs parameter after deployment. If you modify either parameter after deployment, the modification might cause a disruption for the running instances that have an SR-IOV port on that PF. In this case, you must hard reboot these instances to make the SR-IOV PCI device available again.

  5. Add the custom network configuration template to the bare metal nodes definition file that you created in Section 8.4, “Creating a bare metal nodes definition file for OVS TC-flower hardware offload”.

    Example

    - name: ComputeOvsHwOffload
      count: 2
      hostname_format: compute-%index%
      defaults:
        networks:
          - network: internal_api
            subnet: internal_api_subnet
          - network: tenant
            subnet: tenant_subnet
          - network: storage
            subnet: storage_subnet
        network_config:
          template: /home/stack/templates/single_nic_vlans.j2
    ...

  6. Configure one or more network interfaces intended for hardware offload in the compute-sriov.yaml configuration file:

      - type: ovs_bridge
        name: br-tenant
        mtu: 9000
        members:
        - type: sriov_pf
          name: p7p1
          numvfs: 5
          mtu: 9000
          primary: true
          promisc: true
          use_dhcp: false
          link_mode: switchdev
    Note
    • Do not use the NeutronSriovNumVFs parameter when configuring OVS hardware offload. The number of virtual functions is specified using the numvfs parameter in a network configuration file used by os-net-config. Red Hat does not support modifying the numvfs setting during update or redeployment.
    • Do not configure Mellanox network interfaces as nic-config interface type ovs-vlan because this prevents tunnel endpoints such as VXLAN from passing traffic due to driver limitations.
  7. Note the path and file name of the NIC configuration template that you have created. You use this file later if you want to partition your NICs.

Next steps

  1. Provision your overcloud networks.

    For more information, see Configuring and provisioning overcloud network definitions in the Installing and managing Red Hat OpenStack Platform with director guide

  2. Provision your overcloud VIPs.

    For more information, see Configuring and provisioning network VIPs for the overcloud in the Installing and managing Red Hat OpenStack Platform with director guide

  3. Provision your bare metal nodes.

    For more information, see Provisioning bare metal nodes for the overcloud in the Installing and managing Red Hat OpenStack Platform with director guide

  4. Deploy your overcloud.

    For more information, see Section 8.6, “Deploying an OVS TC-flower hardware offload overcloud”.

8.6. Deploying an OVS TC-flower hardware offload overcloud

The last step in deploying your Red Hat OpenStack Platform (RHOSP) overcloud in an OVS TC-flower hardware offload environment is to run the openstack overcloud deploy command. Inputs to the command include all of the various overcloud templates and environment files that you constructed.

Prerequisites

  • Access to the undercloud host and credentials for the stack user.
  • Access to sudo on hosts that contain NICs.
  • You have performed all of the steps listed in the earlier procedures in this section and have assembled all of the various heat templates and environment files to use as inputs for the overcloud deploy command.

Procedure

  1. Log in to the undercloud host as the stack user.
  2. Source the stackrc undercloud credentials file:

    $ source ~/stackrc
  3. Enter the openstack overcloud deploy command.

    It is important to list the inputs to the openstack overcloud deploy command in a particular order. The general rule is to specify the default heat template files first followed by your custom environment files and custom templates that contain custom configurations, such as overrides to the default properties.

    Add your inputs to the openstack overcloud deploy command in the following order:

    1. A custom network definition file that contains the specifications for your SR-IOV network on the overcloud, for example, network-data.yaml.

      For more information, see Network definition file configuration options in the Installing and managing Red Hat OpenStack Platform with director guide.

    2. A roles file that contains the Controller and ComputeOvsHwOffload roles that RHOSP director uses to deploy your OVS hardware offload environment.

      Example: roles_data_compute_ovshwol.yaml

      For more information, see Section 8.1, “Generating roles and image files for OVS TC-flower hardware offload”.

    3. An output file from provisioning your overcloud networks.

      Example: overcloud-networks-deployed.yaml

      For more information, see Configuring and provisioning overcloud network definitions in the Installing and managing Red Hat OpenStack Platform with director guide.

    4. An output file from provisioning your overcloud VIPs.

      Example: overcloud-vip-deployed.yaml

      For more information, see Configuring and provisioning network VIPs for the overcloud in the Installing and managing Red Hat OpenStack Platform with director guide.

    5. An output file from provisioning bare-metal nodes.

      Example: overcloud-baremetal-deployed.yaml

      For more information, see Provisioning bare metal nodes for the overcloud in the Installing and managing Red Hat OpenStack Platform with director guide.

    6. An images file that director uses to determine where to obtain container images and how to store them.

      Example: overcloud_images.yaml

      For more information, see Section 8.1, “Generating roles and image files for OVS TC-flower hardware offload”.

    7. An environment file for the Networking service (neutron) mechanism driver and router scheme that your environment uses:

      • ML2/OVN

        • Distributed virtual routing (DVR): neutron-ovn-dvr-ha.yaml
        • Centralized virtual routing: neutron-ovn-ha.yaml
      • ML2/OVS

        • Distributed virtual routing (DVR): neutron-ovs-dvr.yaml
        • Centralized virtual routing: neutron-ovs.yaml
    8. An environment file for SR-IOV, depending on your mechanism driver:

      • ML2/OVN

        • neutron-ovn-sriov.yaml
      • ML2/OVS

        • neutron-sriov.yaml

          Note

          If you also have an OVS-DPDK environment, and want to locate OVS-DPDK and SR-IOV instances on the same node, include the following environment files in your deployment script:

          • ML2/OVN

            neutron-ovn-dpdk.yaml

          • ML2/OVS

            neutron-ovs-dpdk.yaml

    9. One or more custom environment files that contain your configuration for:

      • PCI passthrough devices for the ComputeOvsHwOffload nodes.
      • role-specific parameters for the ComputeOvsHwOffload nodes
      • overrides of default configuration values for the OVS hardware offload environment.

        Example: ovshwol-overrides.yaml

        For more information, see:

      • Section 8.2, “Configuring PCI passthrough devices for OVS TC-flower hardware offload”.
      • Section 8.3, “Adding role-specific parameters and configuration overrides for OVS TC-flower hardware offload”.

        Example

        This excerpt from a sample openstack overcloud deploy command demonstrates the proper ordering of the command’s inputs for an SR-IOV, ML2/OVN environment that uses DVR:

        $ openstack overcloud deploy \
        --log-file overcloud_deployment.log \
        --templates /usr/share/openstack-tripleo-heat-templates/ \
        --stack overcloud \
        -n /home/stack/templates/network_data.yaml \
        -r /home/stack/templates/roles_data_compute_ovshwol.yaml \
        -e /home/stack/templates/overcloud-networks-deployed.yaml \
        -e /home/stack/templates/overcloud-vip-deployed.yaml \
        -e /home/stack/templates/overcloud-baremetal-deployed.yaml \
        -e /home/stack/templates/overcloud-images.yaml \
        -e /usr/share/openstack-tripleo-heat-templates/environments/services/\
        neutron-ovn-dvr-ha.yaml
        -e /usr/share/openstack-tripleo-heat-templates/environments/services/\
        neutron-ovn-sriov.yaml \
        -e /home/stack/templates/ovshwol-overrides.yaml
  4. Run the openstack overcloud deploy command.

    When the overcloud creation is finished, the RHOSP director provides details to help you access your overcloud.

Verification

Next steps

  1. Ensure that the e-switch mode for the NICs is set to switchdev.

    The switchdev mode establishes representor ports on the NIC that are mapped to the VFs.

    Important

    You must enable security groups and port security on switchdev ports for the connection tracking (conntrack) module to offload OpenFlow flows to hardware.

    1. Check the NIC by running this command:

      Example

      In this example, the NIC pci/0000:03:00.0 is queried:

      $ sudo devlink dev eswitch show pci/0000:03:00.0

      Sample output

      You should see output similar to the following:

      pci/0000:03:00.0: mode switchdev inline-mode none encap enable
    2. To set the NIC to switchdev mode, run this command:

      Example

      In this example, the e-switch mode for the NIC pci/0000:03:00.0 is set to switchdev:

      $ sudo devlink dev eswitch set pci/0000:03:00.0 mode switchdev
  2. To allocate a port from a switchdev-enabled NIC, do the following:

    1. Log in as a RHOSP user with the admin role, and create a neutron port with a binding-profile value of capabilities, and disable port security:

      Important

      You must enable security groups and port security on switchdev ports for the connection tracking (conntrack) module to offload OpenFlow flows to hardware.

      $ openstack port create --network private --vnic-type=direct --binding-profile '{"capabilities": ["switchdev"]}' direct_port1 --disable-port-security
    2. Pass this port information when you create the instance.

      You associate the representor port with the instance VF interface and connect the representor port to OVS bridge br-int for one-time OVS data path processing. A VF port representor functions like a software version of a physical “patch panel” front-end.

      For more information about new instance creation, see Section 8.8, “Creating an instance in an SR-IOV or an OVS TC-flower hardware offload environment”.

  3. Apply the following configuration on the interfaces, and the representor ports, to ensure that TC Flower pushes the flow programming at the port level:

     $ sudo ethtool -K <device-name> hw-tc-offload on
  4. Adjust the number of channels for each network interface to improve performance.

    A channel includes an interrupt request (IRQ) and the set of queues that trigger the IRQ. When you set the mlx5_core driver to switchdev mode, the mlx5_core driver defaults to one combined channel, which might not deliver optimal performance.

    On the physical function (PF) representors, enter the following command to adjust the number of CPUs available to the host.

    Example

    In this example, the number of multi-purpose channels is set to 3 on the network interface, eno3s0f0:

    $ sudo ethtool -L enp3s0f0 combined 3

Additional resources

8.7. Creating host aggregates in an SR-IOV or an OVS TC-flower hardware offload environment

For better performance in your Red Hat OpenStack Platform (RHOSP) SR-IOV or OVS TC-flower hardware offload environment, deploy guests that have CPU pinning and huge pages. You can schedule high performance instances on a subset of hosts by matching aggregate metadata with flavor metadata.

Prerequisites

Procedure

  1. Create an aggregate group, and add relevant hosts.

    Define metadata, for example, sriov=true, that matches defined flavor metadata.

    $ openstack aggregate create sriov_group
    $ openstack aggregate add host sriov_group compute-sriov-0.localdomain
    $ openstack aggregate set --property sriov=true sriov_group
  2. Create a flavor.

    $ openstack flavor create <flavor> --ram <size_mb> --disk <size_gb> \
    --vcpus <number>
  3. Set additional flavor properties.

    Note that the defined metadata, sriov=true, matches the defined metadata on the SR-IOV aggregate.

    $ openstack flavor set --property sriov=true \
    --property hw:cpu_policy=dedicated \
    --property hw:mem_page_size=1GB <flavor>

Additional resources

  • aggregate in the Command line interface reference
  • flavor in the Command line interface reference

8.8. Creating an instance in an SR-IOV or an OVS TC-flower hardware offload environment

You use several commands to create an instance in a Red Hat OpenStack Platform (RHOSP) SR-IOV or an OVS TC-flower hardware offload environment.

Use host aggregates to separate high performance Compute hosts. For more information, see Section 8.7, “Creating host aggregates in an SR-IOV or an OVS TC-flower hardware offload environment”.

Note

Pinned CPU instances can be located on the same Compute node as unpinned instances. For more information, see Configuring CPU pinning on Compute nodes in the Configuring the Compute service for instance creation guide.

Prerequisites

  • A RHOSP overcloud configured for an SR-IOV or an OVS hardware offload environment.
  • For OVS hardware offload environments, you must have a virtual function (VF) port or a physical function (PF) port from a RHOSP administrator to be able to create an instance.

    OVS hardware offload requires a binding profile to create VFs or PFs. Only RHOSP users with the admin role can use a binding profile.

Procedure

  1. Create a flavor.

    $ openstack flavor create <flavor_name> --ram <size_mb> \
    --disk <size_gb> --vcpus <number>
    Tip

    You can specify the NUMA affinity policy for PCI passthrough devices and SR-IOV interfaces by adding the extra spec hw:pci_numa_affinity_policy to your flavor. For more information, see Flavor metadata in Configuring the Compute service for instance creation.

  2. Create the network and the subnet:

    $ openstack network create <network_name> \
    --provider-physical-network tenant \
    --provider-network-type vlan --provider-segment <vlan_id>
    
    $ openstack subnet create <name> --network <network_name> \
    --subnet-range <ip_address_cidr> --dhcp
  3. If you are not a RHOSP user with the admin role, your RHOSP administrator can provide you with the necessary VF or PF to create an instance. Proceed to step 5.
  4. If you are a RHOSP user with the admin role, you can create VF or PF ports:

    • VF port:

      $ openstack port create --network <network_name> --vnic-type direct \
      --binding-profile '{"capabilities": ["switchdev"]} <port_name>
    • PF port that is dedicated to a single instance:

      This PF port is a Networking service (neutron) port but is not controlled by the Networking service, and is not visible as a network adapter because it is a PCI device that is passed through to the instance.

      $ openstack port create --network <network_name> \
      --vnic-type direct-physical <port_name>
  5. Create an instance.

    $ openstack server create --flavor <flavor> --image <image_name> \
    --nic port-id=<id> <instance_name>

Additional resources

8.9. Troubleshooting OVS TC-flower hardware offload

When troubleshooting a Red Hat OpenStack Platform (RHOSP) environment that uses OVS TC-flower hardware offload, review the prerequisites and configurations for the network and the interfaces.

Prerequisites

  • Linux Kernel 4.13 or newer
  • OVS 2.8 or newer
  • RHOSP 12 or newer
  • Iproute 4.12 or newer
  • Mellanox NIC firmware, for example FW ConnectX-5 16.21.0338 or newer

For more information about supported prerequisites, see see the Red Hat Knowledgebase solution Network Adapter Fast Datapath Feature Support Matrix.

Network configuration

In a HW offload deployment, you can choose one of the following scenarios for your network configuration according to your requirements:

  • You can base guest VMs on VXLAN and VLAN by using either the same set of interfaces attached to a bond, or a different set of NICs for each type.
  • You can bond two ports of a Mellanox NIC by using Linux bond.
  • You can host tenant VXLAN networks on VLAN interfaces on top of a Mellanox Linux bond.

Ensure that individual NICs and bonds are members of an ovs-bridge.

Refer to the following network configuration example:

...
- type: ovs_bridge
   name: br-offload
   mtu: 9000
   use_dhcp: false
   members:
   - type: linux_bond
     name: bond-pf
     bonding_options: "mode=active-backup miimon=100"
     members:
     - type: sriov_pf
       name: p5p1
       numvfs: 3
       primary: true
       promisc: true
       use_dhcp: false
       defroute: false
       link_mode: switchdev
     - type: sriov_pf
       name: p5p2
       numvfs: 3
       promisc: true
       use_dhcp: false
       defroute: false
       link_mode: switchdev
...
 - type: vlan
   vlan_id:
     get_param: TenantNetworkVlanID
   device: bond-pf
   addresses:
   - ip_netmask:
       get_param: TenantIpSubnet
...

The following bonding configurations are supported:

  • active-backup - mode=1
  • active-active or balance-xor - mode=2
  • 802.3ad (LACP) - mode=4

The following bonding configuration is not supported:

  • xmit_hash_policy=layer3+4

Interface configuration

Use the following procedure to verify the interface configuration.

Procedure

  1. During deployment, use the host network configuration tool os-net-config to enable hw-tc-offload.
  2. Enable hw-tc-offload on the sriov_config service any time you reboot the Compute node.
  3. Set the hw-tc-offload parameter to on for the NICs that are attached to the bond:.

    Example

    $ ethtool -k ens1f0 | grep tc-offload
    
    hw-tc-offload: on

Interface mode

Verify the interface mode by using the following procedure.

Procedure

  1. Set the eswitch mode to switchdev for the interfaces you use for HW offload.
  2. Use the host network configuration tool os-net-config to enable eswitch during deployment.
  3. Enable eswitch on the sriov_config service any time you reboot the Compute node.

    Example

    $ devlink dev eswitch show pci/$(ethtool -i ens1f0 | grep bus-info \
    | cut -d ':' -f 2,3,4 | awk '{$1=$1};1')

Note

The driver of the PF interface is set to "mlx5e_rep", to show that it is a representor of the e-switch uplink port. This does not affect the functionality.

OVS offload state

Use the following procedure to verify the OVS offload state.

  • Enable hardware offload in OVS in the Compute node.

    $ ovs-vsctl get Open_vSwitch . other_config:hw-offload
    
    "true"

VF representor port name

To ensure consistent naming of VF representor ports, os-net-config uses udev rules to rename the ports in the <PF-name>_<VF_id> format.

Procedure

  • After deployment, verify that the VF representor ports are named correctly.

    Example

    $ cat /etc/udev/rules.d/80-persistent-os-net-config.rules

    Sample output

    # This file is autogenerated by os-net-config
    
    SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}!="", ATTR{phys_port_name}=="pf*vf*", ENV{NM_UNMANAGED}="1"
    SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", KERNELS=="0000:65:00.0", NAME="ens1f0"
    SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="98039b7f9e48", ATTR{phys_port_name}=="pf0vf*", IMPORT{program}="/etc/udev/rep-link-name.sh $attr{phys_port_name}", NAME="ens1f0_$env{NUMBER}"
    SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", KERNELS=="0000:65:00.1", NAME="ens1f1"
    SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="98039b7f9e49", ATTR{phys_port_name}=="pf1vf*", IMPORT{program}="/etc/udev/rep-link-name.sh $attr{phys_port_name}", NAME="ens1f1_$env{NUMBER}"

Network traffic flow

HW offloaded network flow functions in a similar way to physical switches or routers with application-specific integrated circuit (ASIC) chips.

You can access the ASIC shell of a switch or router to examine the routing table and for other debugging. The following procedure uses a Broadcom chipset from a Cumulus Linux switch as an example. Replace the values that are appropriate to your environment.

Procedure

  1. To get Broadcom chip table content, use the bcmcmd command.

    $ cl-bcmcmd l2 show

    Sample output

    mac=00:02:00:00:00:08 vlan=2000 GPORT=0x2 modid=0 port=2/xe1
    mac=00:02:00:00:00:09 vlan=2000 GPORT=0x2 modid=0 port=2/xe1 Hit

  2. Inspect the Traffic Control (TC) Layer.

    $ tc -s filter show dev p5p1_1 ingress

    Sample output

    …
    filter block 94 protocol ip pref 3 flower chain 5
    filter block 94 protocol ip pref 3 flower chain 5 handle 0x2
      eth_type ipv4
      src_ip 172.0.0.1
      ip_flags nofrag
      in_hw in_hw_count 1
            action order 1: mirred (Egress Redirect to device eth4) stolen
            index 3 ref 1 bind 1 installed 364 sec used 0 sec
            Action statistics:
            Sent 253991716224 bytes 169534118 pkt (dropped 0, overlimits 0 requeues 0)
            Sent software 43711874200 bytes 30161170 pkt
            Sent hardware 210279842024 bytes 139372948 pkt
            backlog 0b 0p requeues 0
            cookie 8beddad9a0430f0457e7e78db6e0af48
            no_percpu

  3. Examine the in_hw flags and the statistics in this output. The word hardware indicates that the hardware processes the network traffic. If you use tc-policy=none, you can check this output or a tcpdump to investigate when hardware or software handles the packets. You can see a corresponding log message in dmesg or in ovs-vswitch.log when the driver is unable to offload packets.
  4. For Mellanox, as an example, the log entries resemble syndrome messages in dmesg.

    Sample output

    [13232.860484] mlx5_core 0000:3b:00.0: mlx5_cmd_check:756:(pid 131368): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x6b1266)

    In this example, the error code (0x6b1266) represents the following behavior:

    Sample output

    0x6B1266 |  set_flow_table_entry: pop vlan and forward to uplink is not allowed

Systems

Validate your system with the following procedure.

Procedure

  1. Ensure SR-IOV and VT-d are enabled on the system.
  2. Enable IOMMU in Linux by adding intel_iommu=on to kernel parameters, for example, using GRUB.

8.10. Debugging TC-flower hardware offload flow

You can use the following procedure if you encounter the following message in the ovs-vswitch.log file:

2020-01-31T06:22:11.257Z|00473|dpif_netlink(handler402)|ERR|failed to offload flow: Operation not supported: p6p1_5

Procedure

  1. To enable logging on the offload modules and to get additional log information for this failure, use the following commands on the Compute node:

    ovs-appctl vlog/set dpif_netlink:file:dbg
    # Module name changed recently (check based on the version used
    ovs-appctl vlog/set netdev_tc_offloads:file:dbg [OR] ovs-appctl vlog/set netdev_offload_tc:file:dbg
    ovs-appctl vlog/set tc:file:dbg
  2. Inspect the ovs-vswitchd logs again to see additional details about the issue.

    In the following example logs, the offload failed because of an unsupported attribute mark.

     2020-01-31T06:22:11.218Z|00471|dpif_netlink(handler402)|DBG|system@ovs-system: put[create] ufid:61bd016e-eb89-44fc-a17e-958bc8e45fda recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(7),skb_mark(0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=fa:16:3e:d2:f5:f3,dst=fa:16:3e:c4:a3:eb),eth_type(0x0800),ipv4(src=10.1.1.8/0.0.0.0,dst=10.1.1.31/0.0.0.0,proto=1/0,tos=0/0x3,ttl=64/0,frag=no),icmp(type=0/0,code=0/0), actions:set(tunnel(tun_id=0x3d,src=10.10.141.107,dst=10.10.141.124,ttl=64,tp_dst=4789,flags(df|key))),6
    
    2020-01-31T06:22:11.253Z|00472|netdev_tc_offloads(handler402)|DBG|offloading attribute pkt_mark isn't supported
    
    2020-01-31T06:22:11.257Z|00473|dpif_netlink(handler402)|ERR|failed to offload flow: Operation not supported: p6p1_5

Debugging Mellanox NICs

Mellanox has provided a system information script, similar to a Red Hat SOS report.

https://github.com/Mellanox/linux-sysinfo-snapshot/blob/master/sysinfo-snapshot.py

When you run this command, you create a zip file of the relevant log information, which is useful for support cases.

Procedure

  • You can run this system information script with the following command:

    # ./sysinfo-snapshot.py --asap --asap_tc --ibdiagnet --openstack

You can also install Mellanox Firmware Tools (MFT), mlxconfig, mlxlink and the OpenFabrics Enterprise Distribution (OFED) drivers.

Useful CLI commands

Use the ethtool utility with the following options to gather diagnostic information:

  • ethtool -l <uplink representor> : View the number of channels
  • ethtool -I <uplink/VFs> : Check statistics
  • ethtool -i <uplink rep> : View driver information
  • ethtool -g <uplink rep> : Check ring sizes
  • ethtool -k <uplink/VFs> : View enabled features

Use the tcpdump utility at the representor and PF ports to similarly check traffic flow.

  • Any changes you make to the link state of the representor port, affect the VF link state also.
  • Representor port statistics present VF statistics also.

Use the below commands to get useful diagnostic information:

$ ovs-appctl dpctl/dump-flows -m type=offloaded

$ ovs-appctl dpctl/dump-flows -m

$ tc filter show dev ens1_0 ingress

$ tc -s filter show dev ens1_0 ingress

$ tc monitor