Chapter 7. Planning your OVS-DPDK deployment

To optimize your Open vSwitch with Data Plane Development Kit (OVS-DPDK) deployment for NFV, you should understand how OVS-DPDK uses the Compute node hardware (CPU, NUMA nodes, memory, NICs) and the considerations for determining the individual OVS-DPDK parameters based on your Compute node.

Important

When using OVS-DPDK and the OVS native firewall (a stateful firewall based on conntrack), you can track only packets that use ICMPv4, ICMPv6, TCP, and UDP protocols. OVS marks all other types of network traffic as invalid.

See NFV performance considerations for a high-level introduction to CPUs and NUMA topology.

7.1. OVS-DPDK with CPU partitioning and NUMA topology

OVS-DPDK partitions the hardware resources for host, guests, and itself. The OVS-DPDK Poll Mode Drivers (PMDs) run DPDK active loops, which require dedicated CPU cores. Therefore you must allocate some CPUs, and huge pages, to OVS-DPDK.

A sample partitioning includes 16 cores per NUMA node on dual-socket Compute nodes. The traffic requires additional NICs because you cannot share NICs between the host and OVS-DPDK.

OpenStack NFV Hardware Capacities 464931 0118 OVS DPDK
Note

You must reserve DPDK PMD threads on both NUMA nodes, even if a NUMA node does not have an associated DPDK NIC.

For optimum OVS-DPDK performance, reserve a block of memory local to the NUMA node. Choose NICs associated with the same NUMA node that you use for memory and CPU pinning. Ensure that both bonded interfaces are from NICs on the same NUMA node.

7.2. Workflows and derived parameters

This feature is available in this release as a Technology Preview, and therefore is not fully supported by Red Hat. It should only be used for testing, and should not be deployed in a production environment. For more information about Technology Preview features, see Scope of Coverage Details.

You can use the Red Hat OpenStack Platform Workflow (mistral) service to derive parameters based on the capabilities of your available bare-metal nodes. Workflows use a YAML file to define a set of tasks and actions to perform. You can use a pre-defined workbook, derive_params.yaml, in the directory tripleo-common/workbooks/. This workbook provides workflows to derive each supported parameter from the results of Bare Metal introspection. The derive_params.yaml workflows use the formulas from tripleo-common/workbooks/derive_params_formulas.yaml to calculate the derived parameters.

Note

You can modify derive_params_formulas.yaml to suit your environment.

The derive_params.yaml workbook assumes all nodes for a particular composable role have the same hardware specifications. The workflow considers the flavor-profile association and nova placement scheduler to match nodes associated with a role, then uses the introspection data from the first node that matches the role.

For more information about Workflows, see Troubleshooting Workflows and Executions

You can use the -p or --plan-environment-file option to add a custom plan_environment.yaml file, containing a list of workbooks and any input values, to the openstack overcloud deploy command. The resultant workflows merge the derived parameters back into the custom plan_environment.yaml, where they are available for the overcloud deployment.

For details on how to use the --plan-environment-file option in your deployment, see Plan Environment Metadata.

7.3. Derived OVS-DPDK parameters

The workflows in derive_params.yaml derive the DPDK parameters associated with the role that uses the ComputeNeutronOvsDpdk service.

The workflows can automatically derive the following parameters for OVS-DPDK. The NovaVcpuPinSet parameter is now deprecated, and is replaced by NovaComputeCpuDedicatedSet for dedicated, pinned workflows:

  • IsolCpusList
  • KernelArgs
  • NovaReservedHostMemory
  • NovaComputeCpuDedicatedSet
  • OvsDpdkSocketMemory
  • OvsPmdCoreList
Note

To avoid errors, you must configure role-specific tagging for role-specific parameters.

The OvsDpdkMemoryChannels parameter cannot be derived from the introspection memory bank data because the format of memory slot names are inconsistent across different hardware environments.

In most cases, the default number of OvsDpdkMemoryChannels is four. Consult your hardware manual to determine the number of memory channels per socket, and update the default number with this value.

For more information about workflow parameters, see Section 8.1, “Deriving DPDK parameters with workflows”.

7.4. Calculating OVS-DPDK parameters manually

This section describes how OVS-DPDK uses parameters within the director network_environment.yaml heat templates to configure the CPU and memory for optimum performance. Use this information to evaluate the hardware support on your Compute nodes and how to partition the hardware to optimize your OVS-DPDK deployment.

Note

For more information on an how to generate these values with the derived_parameters.yaml workflow instead, see Overview of workflows and derived parameters.

Note

Always pair CPU sibling threads, or logical CPUs, together in the physical core when allocating CPU cores.

For details on how to determine the CPU and NUMA nodes on your Compute nodes, see Discovering your NUMA node topology. Use this information to map CPU and other parameters to support the host, guest instance, and OVS-DPDK process needs.

7.4.1. CPU parameters

OVS-DPDK uses the following parameters for CPU partitioning:

OvsPmdCoreList

Provides the CPU cores that are used for the DPDK poll mode drivers (PMD). Choose CPU cores that are associated with the local NUMA nodes of the DPDK interfaces. Use OvsPmdCoreList for the pmd-cpu-mask value in OVS. Use the following recommendations for OvsPmdCoreList:

  • Pair the sibling threads together.
  • Performance depends on the number of physical cores allocated for this PMD Core list. On the NUMA node which is associated with DPDK NIC, allocate the required cores.
  • For NUMA nodes with a DPDK NIC, determine the number of physical cores required based on the performance requirement, and include all the sibling threads or logical CPUs for each physical core.
  • For NUMA nodes without DPDK NICs, allocate the sibling threads or logical CPUs of any physical core except the first physical core of the NUMA node.
Note

You must reserve DPDK PMD threads on both NUMA nodes, even if a NUMA node does not have an associated DPDK NIC.

NovaComputeCpuDedicatedSet

A comma-separated list or range of physical host CPU numbers to which processes for pinned instance CPUs can be scheduled. For example, NovaComputeCpuDedicatedSet: [4-12,^8,15] reserves cores from 4-12 and 15, excluding 8.

  • Exclude all cores from the OvsPmdCoreList.
  • Include all remaining cores.
  • Pair the sibling threads together.
NovaComputeCpuSharedSet
A comma-separated list or range of physical host CPU numbers used to determine the host CPUs for instance emulator threads.
IsolCpusList

A set of CPU cores isolated from the host processes. IsolCpusList is the isolated_cores value in the cpu-partitioning-variable.conf file for the tuned-profiles-cpu-partitioning component. Use the following recommendations for IsolCpusList:

  • Match the list of cores in OvsPmdCoreList and NovaComputeCpuDedicatedSet.
  • Pair the sibling threads together.
DerivePciWhitelistEnabled

To reserve virtual functions (VF) for VMs, use the NovaPCIPassthrough parameter to create a list of VFs passed through to Nova. VFs excluded from the list remain available for the host.

For each VF in the list, populate the address parameter with a regular expression that resolves to the address value.

The following is an example of the manual list creation process. If NIC partitioning is enabled in a device named eno2, list the PCI addresses of the VFs with the following command:

[heat-admin@compute-0 ~]$ ls -lh /sys/class/net/eno2/device/ | grep virtfn
lrwxrwxrwx. 1 root root    0 Apr 16 09:58 virtfn0 -> ../0000:18:06.0
lrwxrwxrwx. 1 root root    0 Apr 16 09:58 virtfn1 -> ../0000:18:06.1
lrwxrwxrwx. 1 root root    0 Apr 16 09:58 virtfn2 -> ../0000:18:06.2
lrwxrwxrwx. 1 root root    0 Apr 16 09:58 virtfn3 -> ../0000:18:06.3
lrwxrwxrwx. 1 root root    0 Apr 16 09:58 virtfn4 -> ../0000:18:06.4
lrwxrwxrwx. 1 root root    0 Apr 16 09:58 virtfn5 -> ../0000:18:06.5
lrwxrwxrwx. 1 root root    0 Apr 16 09:58 virtfn6 -> ../0000:18:06.6
lrwxrwxrwx. 1 root root    0 Apr 16 09:58 virtfn7 -> ../0000:18:06.7

In this case, the VFs 0, 4, and 6 are used by eno2 for NIC Partitioning. Manually configure NovaPCIPassthrough to include VFs 1-3, 5, and 7, and consequently exclude VFs 0,4, and 6, as in the following example:

NovaPCIPassthrough:
  - physical_network: "sriovnet2"
  address: {"domain": ".*", "bus": "18", "slot": "06", "function": "[1-3]"}
  - physical_network: "sriovnet2"
  address: {"domain": ".*", "bus": "18", "slot": "06", "function": "[5]"}
  - physical_network: "sriovnet2"
  address: {"domain": ".*", "bus": "18", "slot": "06", "function": "[7]"}

7.4.2. Memory parameters

OVS-DPDK uses the following memory parameters:

OvsDpdkMemoryChannels

Maps memory channels in the CPU per NUMA node. OvsDpdkMemoryChannels is the other_config:dpdk-extra="-n <value>" value in OVS. Observe the following recommendations for OvsDpdkMemoryChannels:

  • Use dmidecode -t memory or your hardware manual to determine the number of memory channels available.
  • Use ls /sys/devices/system/node/node* -d to determine the number of NUMA nodes.
  • Divide the number of memory channels available by the number of NUMA nodes.
NovaReservedHostMemory

Reserves memory in MB for tasks on the host. NovaReservedHostMemory is the reserved_host_memory_mb value for the Compute node in nova.conf. Observe the following recommendation for NovaReservedHostMemory:

  • Use the static recommended value of 4096 MB.
OvsDpdkSocketMemory

Specifies the amount of memory in MB to pre-allocate from the hugepage pool, per NUMA node. OvsDpdkSocketMemory is the other_config:dpdk-socket-mem value in OVS. Observe the following recommendations for OvsDpdkSocketMemory:

  • Provide as a comma-separated list.
  • For a NUMA node without a DPDK NIC, use the static recommendation of 1024 MB (1GB)
  • Calculate the OvsDpdkSocketMemory value from the MTU value of each NIC on the NUMA node.
  • The following equation approximates the value for OvsDpdkSocketMemory:

    • MEMORY_REQD_PER_MTU = (ROUNDUP_PER_MTU + 800) * (4096 * 64) Bytes

      • 800 is the overhead value.
      • 4096 * 64 is the number of packets in the mempool.
  • Add the MEMORY_REQD_PER_MTU for each of the MTU values set on the NUMA node and add another 512 MB as buffer. Round the value up to a multiple of 1024.

Sample Calculation - MTU 2000 and MTU 9000

DPDK NICs dpdk0 and dpdk1 are on the same NUMA node 0, and configured with MTUs 9000, and 2000 respectively. The sample calculation to derive the memory required is as follows:

  1. Round off the MTU values to the nearest multiple of 1024 bytes.

    The MTU value of 9000 becomes 9216 bytes.
    The MTU value of 2000 becomes 2048 bytes.
  2. Calculate the required memory for each MTU value based on these rounded byte values.

    Memory required for 9000 MTU = (9216 + 800) * (4096*64) = 2625634304
    Memory required for 2000 MTU = (2048 + 800) * (4096*64) = 746586112
  3. Calculate the combined total memory required, in bytes.

    2625634304 + 746586112 + 536870912 = 3909091328 bytes.

    This calculation represents (Memory required for MTU of 9000) + (Memory required for MTU of 2000) + (512 MB buffer).

  4. Convert the total memory required into MB.

    3909091328 / (1024*1024) = 3728 MB.
  5. Round this value up to the nearest 1024.

    3724 MB rounds up to 4096 MB.
  6. Use this value to set OvsDpdkSocketMemory.

        OvsDpdkSocketMemory: "4096,1024"

Sample Calculation - MTU 2000

DPDK NICs dpdk0 and dpdk1 are on the same NUMA node 0, and each are configured with MTUs of 2000. The sample calculation to derive the memory required is as follows:

  1. Round off the MTU values to the nearest multiple of 1024 bytes.

    The MTU value of 2000 becomes 2048 bytes.
  2. Calculate the required memory for each MTU value based on these rounded byte values.

    Memory required for 2000 MTU = (2048 + 800) * (4096*64) = 746586112
  3. Calculate the combined total memory required, in bytes.

    746586112 + 536870912 = 1283457024 bytes.

    This calculation represents (Memory required for MTU of 2000) + (512 MB buffer).

  4. Convert the total memory required into MB.

    1283457024 / (1024*1024) = 1224 MB.
  5. Round this value up to the nearest multiple of 1024.

    1224 MB rounds up to 2048 MB.
  6. Use this value to set OvsDpdkSocketMemory.

        OvsDpdkSocketMemory: "2048,1024"

7.4.3. Networking parameters

OvsDpdkDriverType
Sets the driver type used by DPDK. Use the default value of vfio-pci.
NeutronDatapathType
Datapath type for OVS bridges. DPDK uses the default value of netdev.
NeutronVhostuserSocketDir
Sets the vhost-user socket directory for OVS. Use /var/lib/vhost_sockets for vhost client mode.

7.4.4. Other parameters

NovaSchedulerDefaultFilters
Provides an ordered list of filters that the Compute node uses to find a matching Compute node for a requested guest instance.
VhostuserSocketGroup
Sets the vhost-user socket directory group. The default value is qemu. Set VhostuserSocketGroup to hugetlbfs so that the ovs-vswitchd and qemu processes can access the shared huge pages and unix socket that configures the virtio-net device. This value is role-specific and should be applied to any role leveraging OVS-DPDK.
KernelArgs

Provides multiple kernel arguments to /etc/default/grub for the Compute node at boot time. Add the following values based on your configuration:

  • hugepagesz: Sets the size of the huge pages on a CPU. This value can vary depending on the CPU hardware. Set to 1G for OVS-DPDK deployments (default_hugepagesz=1GB hugepagesz=1G). Use this command to check for the pdpe1gb CPU flag that confirms your CPU supports 1G.

    lshw -class processor | grep pdpe1gb
  • hugepages count: Sets the number of huge pages available based on available host memory. Use most of your available memory, except NovaReservedHostMemory. You must also configure the huge pages count value within the flavor of your Compute nodes.
  • iommu: For Intel CPUs, add "intel_iommu=on iommu=pt"
  • isolcpus: Sets the CPU cores for tuning. This value matches IsolCpusList.

For more information about CPU isolation, see the Red Hat Knowledgebase solution OpenStack CPU isolation guidance for RHEL 8 and RHEL 9

7.4.5. Instance extra specifications

Before deploying instances in an NFV environment, create a flavor that utilizes CPU pinning, huge pages, and emulator thread pinning.

hw:cpu_policy
When this parameter is set to dedicated, the guest uses pinned CPUs. Instances created from a flavor with this parameter set have an effective overcommit ratio of 1:1. The default value is shared.
hw:mem_page_size

Set this parameter to a valid string of a specific value with standard suffix (For example, 4KB, 8MB, or 1GB). Use 1GB to match the hugepagesz boot parameter. Calculate the number of huge pages available for the virtual machines by subtracting OvsDpdkSocketMemory from the boot parameter. The following values are also valid:

  • small (default) - The smallest page size is used
  • large - Only use large page sizes. (2MB or 1GB on x86 architectures)
  • any - The compute driver can attempt to use large pages, but defaults to small if none available.
hw:emulator_threads_policy
Set the value of this parameter to share so that emulator threads are locked to CPUs that you’ve identified in the heat parameter, NovaComputeCpuSharedSet. If an emulator thread is running on a vCPU with the poll mode driver (PMD) or real-time processing, you can experience negative effects, such as packet loss.

7.5. Two NUMA node example OVS-DPDK deployment

The Compute node in the following example includes two NUMA nodes:

  • NUMA 0 has cores 0-7. The sibling thread pairs are (0,1), (2,3), (4,5), and (6,7)
  • NUMA 1 has cores 8-15. The sibling thread pairs are (8,9), (10,11), (12,13), and (14,15).
  • Each NUMA node connects to a physical NIC, namely NIC1 on NUMA 0, and NIC2 on NUMA 1.
OpenStack NFV NUMA Nodes 453316 0717 ECE OVS DPDK Deployment
Note

Reserve the first physical cores or both thread pairs on each NUMA node (0,1 and 8,9) for non-datapath DPDK processes.

This example also assumes a 1500 MTU configuration, so the OvsDpdkSocketMemory is the same for all use cases:

OvsDpdkSocketMemory: "1024,1024"

NIC 1 for DPDK, with one physical core for PMD

In this use case, you allocate one physical core on NUMA 0 for PMD. You must also allocate one physical core on NUMA 1, even though DPDK is not enabled on the NIC for that NUMA node. The remaining cores are allocated for guest instances. The resulting parameter settings are:

OvsPmdCoreList: "2,3,10,11"
NovaComputeCpuDedicatedSet: "4,5,6,7,12,13,14,15"

NIC 1 for DPDK, with two physical cores for PMD

In this use case, you allocate two physical cores on NUMA 0 for PMD. You must also allocate one physical core on NUMA 1, even though DPDK is not enabled on the NIC for that NUMA node. The remaining cores are allocated for guest instances. The resulting parameter settings are:

OvsPmdCoreList: "2,3,4,5,10,11"
NovaComputeCpuDedicatedSet: "6,7,12,13,14,15"

NIC 2 for DPDK, with one physical core for PMD

In this use case, you allocate one physical core on NUMA 1 for PMD. You must also allocate one physical core on NUMA 0, even though DPDK is not enabled on the NIC for that NUMA node. The remaining cores are allocated for guest instances. The resulting parameter settings are:

OvsPmdCoreList: "2,3,10,11"
NovaComputeCpuDedicatedSet: "4,5,6,7,12,13,14,15"

NIC 2 for DPDK, with two physical cores for PMD

In this use case, you allocate two physical cores on NUMA 1 for PMD. You must also allocate one physical core on NUMA 0, even though DPDK is not enabled on the NIC for that NUMA node. The remaining cores are allocated for guest instances. The resulting parameter settings are:

OvsPmdCoreList: "2,3,10,11,12,13"
NovaComputeCpuDedicatedSet: "4,5,6,7,14,15"

NIC 1 and NIC2 for DPDK, with two physical cores for PMD

In this use case, you allocate two physical cores on each NUMA node for PMD. The remaining cores are allocated for guest instances. The resulting parameter settings are:

OvsPmdCoreList: "2,3,4,5,10,11,12,13"
NovaComputeCpuDedicatedSet: "6,7,14,15"

7.6. Topology of an NFV OVS-DPDK deployment

This example deployment shows an OVS-DPDK configuration and consists of two virtual network functions (VNFs) with two interfaces each:

  • The management interface, represented by mgt.
  • The data plane interface.

In the OVS-DPDK deployment, the VNFs operate with inbuilt DPDK that supports the physical interface. OVS-DPDK enables bonding at the vSwitch level. For improved performance in your OVS-DPDK deployment, it is recommended that you separate kernel and OVS-DPDK NICs. To separate the management (mgt) network, connected to the Base provider network for the virtual machine, ensure you have additional NICs. The Compute node consists of two regular NICs for the Red Hat OpenStack Platform API management that can be reused by the Ceph API but cannot be shared with any OpenStack project.

NFV OVS-DPDK deployment

NFV OVS-DPDK topology

The following image shows the topology for OVS-DPDK for NFV. It consists of Compute and Controller nodes with 1 or 10 Gbps NICs, and the director node.

NFV OVS-DPDK Topology