Chapter 5. Configuring SR-IOV and DPDK interfaces on the same compute node

This section describes how to deploy SR-IOV and DPDK interfaces on the same Compute node.

Note

This guide provides examples for CPU assignments, memory allocation, and NIC configurations that may vary from your topology and use case. See the Network Functions Virtualization Product Guide and the Network Functions Virtualization Planning Guide to understand the hardware and configuration options.

The process to create and deploy SR-IOV and DPDK interfaces on the same Compute node includes:

  • Set the parameters for SR-IOV role and OVS-DPDK in the network_environment.yaml file.
  • Configure the compute.yaml file with an SR-IOV interface and a DPDK interface.
  • Deploy the overcloud with this updated set of roles.
  • Create the appropriate OpenStack flavor, networks, and ports to support these interface types.

We recommend the following network settings:

  • Use floating IP addresses for the guest instances.
  • Create a router and attach it to the DPDK VXLAN network (the management network).
  • Use SR-IOV for the provider network.
  • Boot the guest instance with two ports attached. We recommend you use cloud-init for the guest instance to set the default route for the management network.
  • Add the floating IP address to booted guest instance.
Note

If needed, use SR-IOV bonding for the guest instance and ensure both SR-IOV interfaces exist on the same NUMA node for optimum performance.

You must install and configure the undercloud before you can deploy the compute node in the overcloud. See the Director Installation and Usage Guide for details.

Note

Ensure that you create an OpenStack flavor that match this custom role.

5.1. Modifying the first-boot.yaml file

Modify the first-boot.yaml file to set up OVS and DPDK parameters and to configure tuned for CPU affinity.

  1. Add additional resources.

      resources:
        userdata:
          type: OS::Heat::MultipartMime
          properties:
            parts:
            - config: {get_resource: set_ovs_config}
            - config: {get_resource: set_dpdk_params}
            - config: {get_resource: install_tuned}
            - config: {get_resource: compute_kernel_args}
  2. Set the OVS Configuration.

      set_ovs_config:
        type: OS::Heat::SoftwareConfig
        properties:
          config:
            str_replace:
              template: |
                #!/bin/bash
                FORMAT=$COMPUTE_HOSTNAME_FORMAT
                if [[ -z $FORMAT ]] ; then
                  FORMAT="compute" ;
                else
                  # Assumption: only %index% and %stackname% are the variables in Host name format
                  FORMAT=$(echo $FORMAT | sed  's/\%index\%//g' | sed 's/\%stackname\%//g') ;
                fi
                if [[ $(hostname) == *$FORMAT* ]] ; then
                  if [ -f /usr/lib/systemd/system/openvswitch-nonetwork.service ]; then
                    ovs_service_path="/usr/lib/systemd/system/openvswitch-nonetwork.service"
                  elif [ -f /usr/lib/systemd/system/ovs-vswitchd.service ]; then
                    ovs_service_path="/usr/lib/systemd/system/ovs-vswitchd.service"
                  fi
                  grep -q "RuntimeDirectoryMode=.*" $ovs_service_path
                  if [ "$?" -eq 0 ]; then
                    sed -i 's/RuntimeDirectoryMode=.*/RuntimeDirectoryMode=0775/' $ovs_service_path
                  else
                    echo "RuntimeDirectoryMode=0775" >> $ovs_service_path
                  fi
                  grep -Fxq "Group=qemu" $ovs_service_path
                  if [ ! "$?" -eq 0 ]; then
                    echo "Group=qemu" >> $ovs_service_path
                  fi
                  grep -Fxq "UMask=0002" $ovs_service_path
                  if [ ! "$?" -eq 0 ]; then
                    echo "UMask=0002" >> $ovs_service_path
                  fi
                  ovs_ctl_path='/usr/share/openvswitch/scripts/ovs-ctl'
                  grep -q "umask 0002 \&\& start_daemon \"\$OVS_VSWITCHD_PRIORITY\"" $ovs_ctl_path
                  if [ ! "$?" -eq 0 ]; then
                    sed -i 's/start_daemon \"\$OVS_VSWITCHD_PRIORITY.*/umask 0002 \&\& start_daemon \"$OVS_VSWITCHD_PRIORITY\" \"$OVS_VSWITCHD_WRAPPER\" \"$@\"/' $ovs_ctl_path
                  fi
                fi
              params:
                $COMPUTE_HOSTNAME_FORMAT: {get_param: ComputeHostnameFormat}
  3. Set the DPDK parameters.

      set_dpdk_params:
        type: OS::Heat::SoftwareConfig
        properties:
          config:
            str_replace:
              template: |
                #!/bin/bash
                set -x
                get_mask()
                {
                  local list=$1
                  local mask=0
                  declare -a bm
                  max_idx=0
                  for core in $(echo $list | sed 's/,/ /g')
                  do
                      index=$(($core/32))
                      bm[$index]=0
                      if [ $max_idx -lt $index ]; then
                         max_idx=$(($index))
                      fi
                  done
                  for ((i=$max_idx;i>=0;i--));
                  do
                      bm[$i]=0
                  done
                  for core in $(echo $list | sed 's/,/ /g')
                  do
                      index=$(($core/32))
                      temp=$((1<<$(($core % 32))))
                      bm[$index]=$((${bm[$index]} | $temp))
                  done
    
                  printf -v mask "%x" "${bm[$max_idx]}"
                  for ((i=$max_idx-1;i>=0;i--));
                  do
                      printf -v hex "%08x" "${bm[$i]}"
                      mask+=$hex
                  done
                  printf "%s" "$mask"
                }
    
                FORMAT=$COMPUTE_HOSTNAME_FORMAT
                if [[ -z $FORMAT ]] ; then
                  FORMAT="compute" ;
                else
                  # Assumption: only %index% and %stackname% are the variables in Host name format
                  FORMAT=$(echo $FORMAT | sed  's/\%index\%//g' | sed 's/\%stackname\%//g') ;
                fi
                if [[ $(hostname) == *$FORMAT* ]] ; then
                  pmd_cpu_mask=$( get_mask $PMD_CORES )
                  host_cpu_mask=$( get_mask $LCORE_LIST )
                  socket_mem=$(echo $SOCKET_MEMORY | sed s/\'//g )
                  ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
                  ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem=$socket_mem
                  ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=$pmd_cpu_mask
                  ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=$host_cpu_mask
                fi
              params:
                $COMPUTE_HOSTNAME_FORMAT: {get_param: ComputeHostnameFormat}
                $LCORE_LIST: {get_param: HostCpusList}
                $PMD_CORES: {get_param: NeutronDpdkCoreList}
                $SOCKET_MEMORY: {get_param: NeutronDpdkSocketMemory}
  4. Set the tuned configuration to provide CPU affinity.

      install_tuned:
        type: OS::Heat::SoftwareConfig
        properties:
          config:
            str_replace:
              template: |
                #!/bin/bash
                FORMAT=$COMPUTE_HOSTNAME_FORMAT
                if [[ -z $FORMAT ]] ; then
                  FORMAT="compute" ;
                else
                  # Assumption: only %index% and %stackname% are the variables in Host name format
                  FORMAT=$(echo $FORMAT | sed  's/\%index\%//g' | sed 's/\%stackname\%//g') ;
                fi
                if [[ $(hostname) == *$FORMAT* ]] ; then
                  # Install the tuned package
                  yum install -y tuned-profiles-cpu-partitioning
    
                  tuned_conf_path="/etc/tuned/cpu-partitioning-variables.conf"
                  if [ -n "$TUNED_CORES" ]; then
                    grep -q "^isolated_cores" $tuned_conf_path
                    if [ "$?" -eq 0 ]; then
                      sed -i 's/^isolated_cores=.*/isolated_cores=$TUNED_CORES/' $tuned_conf_path
                    else
                      echo "isolated_cores=$TUNED_CORES" >> $tuned_conf_path
                    fi
                    tuned-adm profile cpu-partitioning
                  fi
                fi
              params:
                $COMPUTE_HOSTNAME_FORMAT: {get_param: ComputeHostnameFormat}
                $TUNED_CORES: {get_param: HostIsolatedCoreList}
  5. Set the kernel arguments.

      compute_kernel_args:
        type: OS::Heat::SoftwareConfig
        properties:
          config:
            str_replace:
              template: |
                #!/bin/bash
                FORMAT=$COMPUTE_HOSTNAME_FORMAT
                if [[ -z $FORMAT ]] ; then
                  FORMAT="compute" ;
                else
                  # Assumption: only %index% and %stackname% are the variables in Host name format
                  FORMAT=$(echo $FORMAT | sed  's/\%index\%//g' | sed 's/\%stackname\%//g') ;
                fi
                if [[ $(hostname) == *$FORMAT* ]] ; then
                  sed 's/^\(GRUB_CMDLINE_LINUX=".*\)"/\1 $KERNEL_ARGS isolcpus=$TUNED_CORES"/g' -i /etc/default/grub ;
                  grub2-mkconfig -o /etc/grub2.cfg
                  reboot
                fi
              params:
                $KERNEL_ARGS: {get_param: ComputeKernelArgs}
                $COMPUTE_HOSTNAME_FORMAT: {get_param: ComputeHostnameFormat}
                $TUNED_CORES: {get_param: HostIsolatedCoreList}

5.2. Configuring tuned for CPU affinity

This example uses the sample post-install.yaml file.

  1. Set the tuned configuration to enable CPU affinity.

      resources:
        ExtraDeployments:
          type: OS::Heat::StructuredDeployments
          properties:
            servers:  {get_param: servers}
            config: {get_resource: ExtraConfig}
            # Do this on CREATE/UPDATE (which is actually the default)
            actions: ['CREATE', 'UPDATE']
    
        ExtraConfig:
          type: OS::Heat::SoftwareConfig
          properties:
            group: script
            config:
              str_replace:
                template: |
                  #!/bin/bash
    
                  set -x
                  FORMAT=$COMPUTE_HOSTNAME_FORMAT
                  if [[ -z $FORMAT ]] ; then
                    FORMAT="compute" ;
                  else
                    # Assumption: only %index% and %stackname% are the variables in Host name format
                    FORMAT=$(echo $FORMAT | sed  's/\%index\%//g' | sed 's/\%stackname\%//g') ;
                  fi
                  if [[ $(hostname) == *$FORMAT* ]] ; then
                    tuned_service=/usr/lib/systemd/system/tuned.service
                    grep -q "network.target" $tuned_service
                    if [ "$?" -eq 0 ]; then
                      sed -i '/After=.*/s/network.target//g' $tuned_service
                    fi
                    grep -q "Before=.*network.target" $tuned_service
                    if [ ! "$?" -eq 0 ]; then
                      grep -q "Before=.*" $tuned_service
                      if [ "$?" -eq 0 ]; then
                        sed -i 's/^\(Before=.*\)/\1 network.target openvswitch.service/g' $tuned_service
                      else
                        sed -i '/After/i Before=network.target openvswitch.service' $tuned_service
                      fi
                    fi
                    systemctl daemon-reload
                  fi
                params:
                  $COMPUTE_HOSTNAME_FORMAT: {get_param: ComputeHostnameFormat}

5.3. Defining the SR-IOV and OVS-DPDK parameters

Modify the network-environment.yaml file to configure SR-IOV and OVS-DPDK role-specific parameters:

  1. Add the resource mapping for the OVS-DPDK and SR-IOV services to the network-environment.yaml file along with the network configuration for these nodes:

      resource_registry:
        # Specify the relative/absolute path to the config files you want to use for override the default.
        OS::TripleO::Compute::Net::SoftwareConfig: nic-configs/compute.yaml
        OS::TripleO::Controller::Net::SoftwareConfig: nic-configs/controller.yaml
        OS::TripleO::NodeUserData: first-boot.yaml
        OS::TripleO::NodeExtraConfigPost: post-install.yaml
  2. Define the flavors:

      OvercloudControlFlavor: controller
      OvercloudComputeFlavor: compute
  3. Define the tunnel type:

      # The tunnel type for the tenant network (vxlan or gre). Set to '' to disable tunneling.
      NeutronTunnelTypes: 'vxlan'
      # The tenant network type for Neutron (vlan or vxlan).
      NeutronNetworkType: 'vlan'
  4. Configure the parameters for SR-IOV:

      NeutronSupportedPCIVendorDevs: ['8086:154d', '8086:10ed']
      NovaPCIPassthrough:
        - devname: "ens2f1"
          physical_network: "tenant"
    
      NeutronPhysicalDevMappings: "tenant:ens2f1"
      NeutronSriovNumVFs: "ens2f1:5"
      NeutronEnableIsolatedMetadata: true
      NeutronEnableForceMetadata: true
      # Global MTU.
      NeutronGlobalPhysnetMtu: 9000
      # Configure the classname of the firewall driver to use for implementing security groups.
      NeutronOVSFirewallDriver: openvswitch
  5. Configure the parameters for OVS-DPDK:

      ########################
      # OVS DPDK configuration
      ## NeutronDpdkCoreList and NeutronDpdkMemoryChannels are REQUIRED settings.
      ## Attempting to deploy DPDK without appropriate values will cause deployment to fail or lead to unstable deployments.
      # List of cores to be used for DPDK Poll Mode Driver
      NeutronDpdkCoreList: "'1,17,9,25'"
      # Number of memory channels to be used for DPDK
      NeutronDpdkMemoryChannels: "4"
      # NeutronDpdkSocketMemory
      NeutronDpdkSocketMemory: "'1024,1024'"
      # NeutronDpdkDriverType
      NeutronDpdkDriverType: "vfio-pci"
      # The vhost-user socket directory for OVS
      NeutronVhostuserSocketDir: "/var/run/openvswitch"
    
      ########################
      # Additional settings
      ########################
      # Reserved RAM for host processes
      NovaReservedHostMemory: 2048
      # A list or range of physical CPU cores to reserve for virtual machine processes.
      # Example: NovaVcpuPinSet: ['4-12','^8'] will reserve cores from 4-12 excluding 8
      NovaVcpuPinSet: "2,3,4,5,6,7,18,19,20,21,22,23,10,11,12,13,14,15,26,27,28,29,30,31"
      # An array of filters used by Nova to filter a node.These filters will be applied in the order they are listed,
      # so place your most restrictive filters first to make the filtering process more efficient.
      NovaSchedulerDefaultFilters: "RamFilter,ComputeFilter,AvailabilityZoneFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,PciPassthroughFilter,NUMATopologyFilter"
      # Kernel arguments for Compute node
      ComputeKernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=32 iommu=pt intel_iommu=on"
      # A list or range of physical CPU cores to be tuned.
      # The given args will be appended to the tuned cpu-partitioning profile.
      HostIsolatedCoreList: "1,2,3,4,5,6,7,9,10,17,18,19,20,21,22,23,11,12,13,14,15,25,26,27,28,29,30,31"
      # List of logical cores to be used by ovs-dpdk processess (dpdk-lcore-mask)
      HostCpusList: "'0,16,8,24'"
    Note

    You must assign at least one CPU (with sibling thread) on each NUMA node with or without DPDK NICs present for DPDK PMD to avoid failures in creating guest instances.

  6. Configure the remainder of the network-environment.yaml file to override the default parameters from the neutron-ovs-dpdk-agent.yaml and neutron-sriov-agent.yaml files as needed for your OpenStack deployment.

See the Network Functions Virtualization Planning Guide for details on how to determine the best values for the OVS-DPDK parameters that you set in the network-environment.yaml file to optimize your OpenStack network for OVS-DPDK.

5.4. Configuring the Compute node for SR-IOV and DPDK interfaces

This example uses the sample the compute.yaml file to support SR-IOV and DPDK interfaces.

  1. Create the control plane Linux bond for an isolated network:

      type: linux_bond
      name: bond_api
      bonding_options: "mode=active-backup"
      use_dhcp: false
      dns_servers: {get_param: DnsServers}
      members:
        -
          type: interface
          name: nic3
          primary: true
        -
          type: interface
          name: nic4
  2. Assign VLANs to this Linux bond:

      type: vlan
      vlan_id: {get_param: InternalApiNetworkVlanID}
      device: bond_api
      addresses:
        -
          ip_netmask: {get_param: InternalApiIpSubnet}
  3. Set a bridge with a DPDK port to link to the controller:

      type: ovs_user_bridge
      name: br-link0
      ovs_extra:
        -
          str_replace:
            template: set port br-link0 tag=_VLAN_TAG_
            params:
              _VLAN_TAG_: {get_param: TenantNetworkVlanID}
      addresses:
        -
          ip_netmask: {get_param: TenantIpSubnet}
      use_dhcp: false
      members:
        -
          type: ovs_dpdk_port
          name: dpdk0
          mtu: 9000
          ovs_extra:
          - set interface $DEVICE mtu_request=$MTU
          members:
            -
              type: interface
              name: nic5
              primary: true
    Note

    To include multiple DPDK devices, repeat the type code section for each DPDK device you want to add.

    Note

    When using OVS-DPDK, all bridges on the same Compute node should be of type ovs_user_bridge. The director may accept the configuration, but Red Hat OpenStack Platform does not support mixing ovs_bridge and ovs_user_bridge on the same node.

  4. Create the SR-IOV interface to the Controller:

      - type: interface
        name: ens2f1
        mtu: 9000
        use_dhcp: false
        defroute: false
        nm_controlled: true
        hotplug: true

5.5. Deploying the overcloud

The following example defines the overcloud_deploy.sh Bash script that deploys both OVS-DPDK and SR-IOV:

#!/bin/bash

openstack overcloud deploy \
--templates \
-e /usr/share/openstack-tripleo-heat-templates/environments/neutron-ovs-dpdk.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/neutron-sriov.yaml \
-e /home/stack/ospd-10-vxlan-vlan-dpdk-sriov-ctlplane-bonding/network-environment.yaml

5.6. Creating a flavor and deploying an instance with SR-IOV and DPDK interfaces

After you have completed configuring SR-IOV and DPDK interfaces on the same compute node, you need to create a flavor and deploy an instance by performing the following steps:

  1. Create a flavor:

    # openstack flavor create --vcpus 6 --ram 4096 --disk 40 compute

    Where:

    • compute is the flavor name.
    • 4096 is the memory size in MB.
    • 40 is the disk size in GB (default 0G).
    • 6 is the number of vCPUs.
  2. Set the flavor for large pages:

    # openstack flavor set compute --property hw:mem_page_size=1GB
  3. Create the external network:

    # openstack network create --external external
  4. Create the networks for SR-IOV and DPDK:

    # openstack network create --name net-dpdk
    # openstack network create --name net-sriov
    # openstack subnet create --subnet-range <cidr/prefix> --network net-dpdk  net-dpdk-subnet
    # openstack subnet create --subnet-range <cidr/prefix> --network net-sriov  net-sriov-subnet
  5. Create the SR-IOV port.

    1. Use vnic-type direct to create an SR-IOV VF port:

      # openstack port create --network net-sriov --vnic-type direct sriov_port
    2. Use vnic-type direct-physical to create an SR-IOV PF port:

      # openstack port create --network net-sriov --vnic-type direct-physical sriov_port
  6. Create a router and attach to the DPDK VXLAN network:

    # openstack router create router1
    # openstack router add subnet router1 net-dpdk-subnet
  7. Create a floating IP address and associate it with the guest instance port:

    # openstack floating ip create --floating-ip-address FLOATING-IP external
  8. Deploy an instance:

    # openstack server create --flavor compute --image rhel_7.3 --nic port-id=sriov_port --nic net-id=NET_DPDK_ID vm1

Where:

  • compute is the flavor name or ID.
  • rhel_7.3 is the image (name or ID) used to create an instance.
  • sriov_port is the name of the port created in the previous step.
  • NET_DPDK_ID is the DPDK network ID.
  • vm1 is the name of the instance.

You have now deployed an instance that uses an SR-IOV interface and a DPDK interface on the same Compute node.

Note

For instances with more interfaces, you can use cloud-init. See Table 3.1 in Create an Instance for details.