Chapter 5. TX Drops on Instance VHU Interfaces with Open vSwitch DPDK

Use this procedure to troubleshoot transmit drops on instance vhost-user (VHU) interface.

5.1. Symptom

Packets go from the vswitch to the guest using the virtio transport without passing through the kernel or qemu processes. This is done by exchanging packets with the VHU interface.

The VHU is mostly implemented by DPDK librte_vhost that also offers functions to send or receive batches of packets. The backend of VHU is a virtio ring provided by qemu to exchange packets with the virtual machine. The virtio ring has a special format comprised of descriptors and buffers.

The TX/RX (transmit/receive) statistics are for OpenvSwitch (OVS). This means that transmit statistics relate directly to receive statistics for the VM.

If the VM does not process packets fast enough, the OVS TX queue overflows and drops packets.

5.1.1. Explanation for Packet Drops

A saturated virtio ring causes TX drops on the vhost-user device. The virtio ring is located in the guest’s memory and it works like a queue where the vhost-user pushes packets and the VM consumes them. If the VM is not fast enough to consume the packets, the virtio ring runs out of buffers and the vhost-user drops packets.

Use the Perf and Ftrace tools to troubleshoot packet drops.

  • Use Perf to count the number of scheduler switches, which could show whether the qemu thread preempted.
  • Use Ftrace to show the reason for preemption, as well as how long it took.

Reasons for preemption include:

  • Time Interrupt (kernel ticks):

    These add the cost of at least two context switches. The timer interrupt can also run read-copy update (RCU) callbacks which can take an unpredictable amount of time.

  • CPU power management and hyperthreading

You can find these tools in the following packages:

  • PERF: perf rpm in rhel-7-server-rpms/7Server/x86_64. For more information, see About Perf
  • FTRACE: trace-cmd info rhel-7-server-rpms/7Server/x86_64. For more information, see About Ftrace

5.1.2. Explanation for other drops

Prior to OVS 2.9, vHost user ports were created in dpdkvhostuser mode. In this mode, OVS acts as the vhost server, and QEMU acts as the client. When an instance goes down or restarts, the vhost user port on the OVS bridge, still active, drops packets destined for the VM. This increases the tx_drop_counter:

In the following example, the VM was stopped with nova stop <UUID>:

[root@overcloud-compute-0 network-scripts]# ovs-vsctl list interface vhubd172106-73 | grep _state
admin_state         : up
link_state          : down

This is similar to what happens when the kernel port is shut down with ip link set dev <br internal port name> down and frames are dropped in userspace.

When the VM is up, it connects to the same vhu socket and will start emptying the virtio ring buffer. TX is no longer interrupted and normal network traffic resumes.

5.1.3. Increasing the TX and RX queue lengths for DPDK

You can change TX and RX queue lengths for DPDK with the following OpenStack director template modifications:

NovaComputeExtraConfig:
        nova::compute::libvirt::rx_queue_size: '"1024"'
        nova::compute::libvirt::tx_queue_size: '"1024"'

The following example shows validation checks:

[root@overcloud-compute-1 ~]# ovs-vsctl get interface vhu9a9b0feb-2e status
{features="0x0000000150208182", mode=client, num_of_vrings="2", numa="0",
socket="/var/lib/vhost_sockets/vhu9a9b0feb-2e", status=connected, "vring_0_size"="1024",
"vring_1_size"="1024"}

[root@overcloud-compute-1 ~]# virsh dumpxml instance-00000017 | grep rx
<driver rx_queue_size='1024' tx_queue_size='1024'/>
<driver rx_queue_size='1024' tx_queue_size='1024'/>

Due to kernel limitations, you cannot increase the queue size beyond 1024.

Note

If you plan for PXE boot to be available for neutron networks over DPDK, you must verify that the PXE version supports 1024 bytes.

5.2. Diagnosis

You can see TX drops towards the vhost user ports when the guest cannot receive packets. TCP is designed to recover from packet loss, which occurs in normal network conditions. NFVi has strict requirements with less tolerance for packet drops.

Use DPDK-accelerated OVS, as the kernel datapath is too slow for NFVi. Additionally, it is important to deploy DPDK-enabled guests that can match the packet processing speed of the host.

5.3. Solution

Ensure that the vCPUs allocated to the VM are only processing tasks for the guests.

  • Check that the cluster was deployed with the heat following template parameters:

    • IsolcpusList: Removes CPUs from scheduling
    • NovaVcpuPinSet: Assigns CPUs for pinning
    • NovaComputeCpuSharedSet: Allocates CPUs for emulator thread pinning

Example:

    parameter_defaults:
      ComputeOvsDpdkParameters:
        KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=32 iommu=pt intel_iommu=on isolcpus=2-19,22-39"
        IsolCpusList: "2-19,22-39"
        NovaVcpuPinSet: ['4-19,24-39']
        NovaReservedHostMemory: 4096
        OvsDpdkSocketMemory: "3072,1024"
        OvsDpdkMemoryChannels: "4"
        OvsDpdkCoreList: "0,20,1,21"
        OvsPmdCoreList: "2,22,3,23"
        NovaComputeCpuSharedSet: [0,20,1,21]
  • Ensure that VMs are deployed with a flavor that takes advantage of pinned CPUs and the emulator pool set.

Example:

openstack flavor create --ram <size_mb> --disk <size_gb> -\
-vcpus <vcpus> --property dpdk=true \
--property hw:mem_page_size=1G \
--property hw:cpu_policy=dedicated \
--property hw:emulator_threads_policy=share <flavor>

If you allocate completely dedicated CPU resources to the instance and still observe network packet loss, ensure that the instance is properly tuned and DPDK enabled.