Validating DPDK performance on OpenShift

Updated -

Validating DPDK performance on OpenShift

This article describes the build and the deployment of a traffic generating application inside a container. The traffic generator validates Data Plane Development Kit (DPDK) line rate performance on OpenShift Container Platform. It is used with the following network elements:
- Hardware networking
- Cloud elements
- Physical and virtual functions
- Test application

Traffic testing environment

The following diagram shows the components of a traffic-testing environment:

DPDK test environment

  • Traffic generator: An application that can generate high-volume packet traffic.
  • SR-IOV-supporting NIC: A network interface card compatible with Single Root I/O Virtualization. The card runs a number of virtual functions on a physical interface.
  • Physical Function (PF): A PCI Express (PCIe) function of a network adapter that supports the single root I/O virtualization (SR-IOV) interface.
  • Virtual Function (VF): A lightweight PCIe function on a network adapter that supports Single Root I/O virtualization (SR-IOV). The VF is associated with the PCIe Physical Function (PF) on the network adapter, and represents a virtualized instance of the network adapter.
  • Switch: Network switch. Nodes can also be connected back-to-back.
  • testpmd: An example application included with DPDK. The testpmd application can be used to test the DPDK in a packet forwarding mode. testpmd is also an example of how to build a fully-fledged application using the DPDK SDK.
  • worker 0 and worker 1: OpenShift Container Platform nodes.

Running the validation

For this stage of the development of the test environment, you must do the following:
1. Build the TRex container image
2. Deploy an OpenShift Container Platform cluster
3. Create TRex configuration from template
4. Define a TRex pod

Building the TRex container image

To build the TRex container image, run the following build script:

FROM quay.io/centos/centos:stream8

ARG TREX_VERSION=2.87
ENV TREX_VERSION ${TREX_VERSION}

# install requirements
RUN dnf install -y --nodocs git wget procps python3 vim python3-pip pciutils gettext https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm && dnf clean all
RUN dnf install -y --nodocs hostname iproute net-tools ethtool nmap iputils perf numactl sysstat htop rdma-core-devel libibverbs libibverbs-devel net-tools && dnf clean all

# install trex server
WORKDIR /opt/
RUN wget --no-check-certificate https://trex-tgn.cisco.com/trex/release/v${TREX_VERSION}.tar.gz && \
   tar -xzf v${TREX_VERSION}.tar.gz && \
   mv v${TREX_VERSION} trex && \
   rm v${TREX_VERSION}.tar.gz

WORKDIR /opt/trex

Deploying an OpenShift Container Platform cluster

You must deploy and configure both the Node Tuning Operator and the SR-IOV Network Operator. See the OpenShift Container Platform 4.10 Documentation for more information. See also Additional Resources at the end of this article.

Example performance profile

The following code block illustrates a typical performance profile:

apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
 name: performance
spec:
 globallyDisableIrqLoadBalancing: true
 cpu:
   isolated: 21-51,73-103
   reserved: 0-20,52-72
 hugepages:
   defaultHugepagesSize: 1G
   pages:
   - count: 32
   size: 1G
 numa:
   topologyPolicy: "single-numa-node"
 nodeSelector:
   node-role.kubernetes.io/worker-cnf: ""

Description
- isolated: Defines the isolated CPUs for guaranteed workloads.
- defaultHugepagesSize: Defines the default hugepages size: typically set to 1G.
- topologyPolicy: Defines the Topology policy. The policy should always allocate from a single NUMA. If that is not possible, block the pod deployment.

Example SR-IOV network policy

The following code block illustrates a typical SR-IOV network policy:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
 name: dpdk-nic-1
 namespace: openshift-sriov-network-operator
spec:
 deviceType: vfio-pci
 needVhostNet: true
 nicSelector:
   pfNames: ["ens3f0"]
 nodeSelector:
   node-role.kubernetes.io/worker-cnf: ""
 numVfs: 5
 priority: 99
 resourceName: dpdk_nic_1
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
 name: dpdk-nic-2
 namespace: openshift-sriov-network-operator
spec:
 deviceType: vfio-pci
 nicSelector:
   pfNames: ["ens3f1"]
 nodeSelector:
   node-role.kubernetes.io/worker-cnf: ""
 numVfs: 5
 priority: 99
 resourceName: dpdk_nic_2

Note: for Mellanix, use deviceType: netdevice and Rdma: True

Example SR-IOV network

The following code block illustrates a typical SR-IOV network:

---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
 name: dpdk-network-1-vlan
 namespace: openshift-sriov-network-operator
spec:
 ipam: '{"type": "host-local","ranges": [[{"subnet": "10.0.1.0/24"}]],"dataDir":
   "/run/my-orchestrator/container-ipam-state-1"}'
 networkNamespace: seba
 spoofChk: "on"
 trust: "on"
 vlan: 2004
 resourceName: dpdk_nic_1
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
 name: dpdk-network-2
 namespace: openshift-sriov-network-operator
spec:
 ipam: '{"type": "host-local","ranges": [[{"subnet": "10.0.2.0/24"}]],"dataDir":
   "/run/my-orchestrator/container-ipam-state-2"}'
 networkNamespace: seba
 spoofChk: "on"
 trust: "on"
 resourceName: dpdk_nic_2

Note: Here, we are using vlan tag for the VFs as an example. This is not mandatory

Defining a TRex pod

The following yaml file defines a TRex pod:

---
apiVersion: v1
kind: Namespace
metadata:
  name: dpdk
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: trex-info-for-config
  namespace: dpdk
data:
  PORT_BANDWIDTH_GB: "25"
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: trex-config-template
data:
  trex_cfg.yaml : |
    - port_limit: 2
      version: 2
      interfaces:
        - "${PCIDEVICE_OPENSHIFT_IO_DPDK_NIC_1}"
        - "${PCIDEVICE_OPENSHIFT_IO_DPDK_NIC_2}"
      port_bandwidth_gb: ${PORT_BANDWIDTH_GB}
      port_info:
        - ip: 10.10.10.2
          default_gw: 10.10.10.1
        - ip: 10.10.20.2
          default_gw: 10.10.20.1
      platform:
        master_thread_id: $MASTER
        latency_thread_id: $LATENCY
        dual_if:
          - socket: ${SOCKET}
            threads: [${CPU}]
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: trex-tests
  namespace: dpdk
data:
  testpmd.py : |
    from trex_stl_lib.api import *

    from testpmd_addr import *

    # Wild local MACs
    mac_localport0='50:00:00:00:00:01'
    mac_localport1='50:00:00:00:00:02'

    class STLS1(object):

        def __init__ (self):
            self.fsize  =64; # the size of the packet
            self.number = 0

        def create_stream (self, direction = 0):

            size = self.fsize - 4; # HW will add 4 bytes ethernet FCS
            dport = 1026 + self.number
            self.number = self.number + 1
            if direction == 0:
                base_pkt =  Ether(dst=mac_telco0,src=mac_localport0)/IP(src="16.0.0.1",dst=ip_telco0)/UDP(dport=15,sport=1026)
            else:
                base_pkt =  Ether(dst=mac_telco1,src=mac_localport1)/IP(src="16.1.0.1",dst=ip_telco1)/UDP(dport=16,sport=1026)
            #pad = max(0, size - len(base_pkt)) * 'x'
            pad = (60 - len(base_pkt)) * 'x'

            return STLStream(
                packet =
                STLPktBuilder(
                    pkt = base_pkt / pad
                ),
                mode = STLTXCont())

        def create_stats_stream (self, rate_pps = 1000, pgid = 7, direction = 0):

            size = self.fsize - 4; # HW will add 4 bytes ethernet FCS
            if direction == 0:
                base_pkt =  Ether(dst=mac_telco0,src=mac_localport0)/IP(src="17.0.0.1",dst=ip_telco0)/UDP(dport=dport,sport=1026)
            else:
                base_pkt =  Ether(dst=mac_telco1,src=mac_localport1)/IP(src="17.1.0.1",dst=ip_telco1)/UDP(dport=dport,sport=1026)
            pad = max(0, size - len(base_pkt)) * 'x'

            return STLStream(
                packet =
                STLPktBuilder(
                    pkt = base_pkt / pad
                ),
                mode = STLTXCont(pps = rate_pps),
                flow_stats = STLFlowLatencyStats(pg_id = pgid))
            #flow_stats = STLFlowStats(pg_id = pgid))

        def get_streams (self, direction = 0, **kwargs):
            # create multiple streams, one stream per core...
            s = []
            for i in range(14):
                s.append(self.create_stream(direction = direction))
            #if direction == 0:
            #    s.append(self.create_stats_stream(rate_pps=1000, pgid=10, direction = direction))
            #else:
            #    s.append(self.create_stats_stream(rate_pps=1000, pgid=11, direction = direction))

            return s

    # dynamic load - used for trex console or simulator
    def register():
        return STLS1()

    testpmd_addr.py: |
    # wild second XL710 mac
    mac_telco0 = '60:00:00:00:00:01'
    # we don’t care of the IP in this phase
    ip_telco0  = '10.0.0.1'
    # wild first XL710 mac
    mac_telco1 = '60:00:00:00:00:02'
    ip_telco1 = '10.1.1.1'
---
apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/networks: '[
      {
       "name": "dpdk-network-1-vlan",
       "mac": "50:00:00:00:00:01",
       "namespace": "dpdk"
      },
      {
       "name": "dpdk-network-2",
       "mac": "50:00:00:00:00:02",
       "namespace": "dpdk"
      }
    ]'
    cpu-load-balancing.crio.io: "disable"
    cpu-quota.crio.io: "disable"
    irq-load-balancing.crio.io: "disable"
  labels:
    app: trex
  name: trex
  namespace: dpdk
spec:
  runtimeClassName: performance-performance
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
              - key: app
                operator: In
                values:
                  - dpdk
          topologyKey: kubernetes.io/hostname
  containers:
    - command:
        - /bin/bash
        - -c
        - sleep INF
      image: <TREX-IMAGE>
      imagePullPolicy: Always
      name: trex
      envFrom:
        - configMapRef:
            name: trex-info-for-config
      resources:
        limits:
          cpu: "16"
          hugepages-1Gi: 8Gi
          memory: 1Gi
        requests:
          cpu: "16"
          hugepages-1Gi: 8Gi
          memory: 1Gi
      securityContext:

        capabilities:
          add:
            - IPC_LOCK
            - SYS_RESOURCE
            - NET_RAW
            - NET_ADMIN
        runAsUser: 0
      volumeMounts:
        - name: trex-config-template
          mountPath: /opt/templates/
        - name: trex-tests
          mountPath: /opt/tests/
        - mountPath: /mnt/huge
          name: hugepages
        - name: modules
          mountPath: /lib/modules
  terminationGracePeriodSeconds: 5
  volumes:
    - name: modules
      hostPath:
        path: /lib/modules
    - configMap:
        name: trex-info-for-config
      name: trex-info-for-config
    - name: trex-config-template
      configMap:
        name: trex-config-template
    - name: trex-tests
      configMap:
        name: trex-tests
    - emptyDir:
        medium: HugePages
      name: hugepages

Description
- PCIDEVICE_OPENSHIFT_IO_DPDK_NIC_1: The PCI deviceID for the first network.
- PCIDEVICE_OPENSHIFT_IO_DPDK_NIC_2: The PCI deviceID for the second network.
- Trex-config-template: The template configuration. Create this before starting the trex process.
- TREX-IMAGE: the TRex image built before the deployment

Create TRex configuration from template

Access the TRex pod with oc -n dpdk rsh trex the copy the template file cp /opt/templates/trex_cfg.yaml /etc/trex_cfg.yaml
and edit the file using the following parameters

port_limit: 2
 version: 2
 interfaces: ["{PCIDEVICE_1}","{PCIDEVICE_2}"]
 port_bandwidth_gb: {PORT_BANDWIDTH_GB}
 port_info:
   - ip: 10.10.10.2
     default_gw: 10.10.10.1
   - ip: 10.10.20.2
     default_gw: 10.10.20.1
 platform:
   master_thread_id: {MASTER}
   latency_thread_id: {LATENCY}
   dual_if:
     - socket: {SOCKET}
       threads: [{CPUS}]

replace the parameters with the following configuration

PCIDEVICE_1 : check the PCI address that was allocated for the trex pod running env | grep PCIDEVICE_OPENSHIFT_IO

PCIDEVICE_2 : check the PCI address that was allocated for the trex pod running env | grep PCIDEVICE_OPENSHIFT_IO

PORT_BANDWIDTH_GB: the port speed in GB (example: 25)

MASTER: select the master/UI CPU. check the available CPUs for the pod using cat /sys/fs/cgroup/cpuset/cpuset.cpus. Check also for sibling CPUs example for cpu0 (cat /sys/devices/system/cpu/cpu0/topology/core_cpus_list)

LATENCY: select the latency measurement CPU (better to use the sibling from the MASTER CPU selected

SOCKET: select the Numa where all the allocated CPUs are (you can check using the lspcu | grep

CPUS: add all the CPUs from cat /sys/fs/cgroup/cpuset/cpuset.cpus excluding the ones used by MASTER and LATENCY

Output Example:

- port_limit: 2
  version: 2
  interfaces:
    - "0000:d8:02.1"
    - "0000:d8:0a.3"
  port_bandwidth_gb: 25
  port_info:
    - ip: 10.10.10.2
      default_gw: 10.10.10.1
    - ip: 10.10.20.2
      default_gw: 10.10.20.1
  platform:
    master_thread_id: 21
    latency_thread_id: 23
    dual_if:
      - socket: 1
        threads: [25,27,29,31,33,35,73,75,77,79,81,83,85,87]

Then you can lunch TRex with the following command:

./t-rex-64 --no-ofed-check --no-hw-flow-stat -i -c 14

Test configuration

The test is injected into the container using a configmap. Using this method, you do not need to recreate the pod if you change the code. Apply the configmap again and k8s will mount the new files.
The following is an example testpmd.py:

from trex_stl_lib.api import *

   from testpmd_addr import *

   # Wild local MACs
   mac_localport0='50:00:00:00:00:01'
   mac_localport1='50:00:00:00:00:02'

   class STLS1(object):

  def __init__ (self):
  self.fsize  =64; # the size of the packet
  self.number = 0

  def create_stream (self, direction = 0):

  size = self.fsize - 4; # HW will add 4 bytes ethernet FCS
  dport = 1026 + self.number
  self.number = self.number + 1
  if direction == 0:
base_pkt =  Ether(dst=mac_telco0,src=mac_localport0)/IP(src="16.0.0.1",dst=ip_telco0)/UDP(dport=15,sport=1026)
  else:
base_pkt =  Ether(dst=mac_telco1,src=mac_localport1)/IP(src="16.1.0.1",dst=ip_telco1)/UDP(dport=16,sport=1026)
  #pad = max(0, size - len(base_pkt)) * 'x'
  pad = (60 - len(base_pkt)) * 'x'

  return STLStream(
  packet =
  STLPktBuilder(
  pkt = base_pkt / pad
  ),
   mode = STLTXCont())

  def create_stats_stream (self, rate_pps = 1000, pgid = 7, direction = 0):

  size = self.fsize - 4; # HW will add 4 bytes ethernet FCS
  if direction == 0:
base_pkt =  Ether(dst=mac_telco0,src=mac_localport0)/IP(src="17.0.0.1",dst=ip_telco0)/UDP(dport=dport,sport=1026)
  else:
base_pkt =  Ether(dst=mac_telco1,src=mac_localport1)/IP(src="17.1.0.1",dst=ip_telco1)/UDP(dport=dport,sport=1026)
  pad = max(0, size - len(base_pkt)) * 'x'

  return STLStream(
  packet =
  STLPktBuilder(
  pkt = base_pkt / pad
  ),
   mode = STLTXCont(pps = rate_pps),
   flow_stats = STLFlowLatencyStats(pg_id = pgid))

  def get_streams (self, direction = 0, **kwargs):
  # create multiple streams, one stream per core...
  s = []
  for i in range(14):
   s.append(self.create_stream(direction = direction))
  return s

   # dynamic load - used for trex console or simulator
   def register():
  return STLS1()

The following is an example testpmd_addr.py:

# wild second XL710 mac
   mac_telco0 = '60:00:00:00:00:01'
   # we don't care of the IP in this phase
   ip_telco0  = '10.0.0.1'
   # wild first XL710 mac
   mac_telco1 = '60:00:00:00:00:02'
   ip_telco1 = '10.1.1.1'

Commands for TRex

From a new terminal access the TRex pod oc -n dpdk rsh trex, and start the connection to TRex. You can then load the traffic generator stream.

./trex-console
> tui
> # start -f /opt/tests/testpmd.py -m <number-of-packets> -p <ports to use>
> start -f /opt/tests/testpmd.py -m 24mpps -p 0
> stop -a

This code example runs the traffic generator, sending 24 million packets per second using port 0 and receiving from port 1:

Example output

Global Statistitcs

connection   : localhost, Port 4501   total_tx_L2  : 12.76 Gbps
version  : STL @ v2.87total_tx_L1  : 16.74 Gbps
cpu_util.: 29.65% @ 14 cores (14 per dual port)   total_rx : 12.09 Gbps
rx_cpu_util. : 0.0% / 0 pps   total_pps: 24.91 Mpps
async_util.  : 0.02% / 10.48 Kbps drop_rate: 0 bps
total_cps.   : 0 cps  queue_full   : 0 pkts

Port Statistics

   port| 0 | 1 |   total
-----------+-------------------+-------------------+------------------
owner  |  root |  root |
link   |UP |UP |
state  |  IDLE |  TRANSMITTING |
speed  |   25 Gb/s |   25 Gb/s |
CPU util.  |  0.0% |29.65% |
-- |   |   |
Tx bps L2  | 0 bps |12.76 Gbps |12.76 Gbps
Tx bps L1  | 0 bps |16.74 Gbps |16.74 Gbps
Tx pps | 0 pps |24.91 Mpps |24.91 Mpps
Line Util. |   0 % |   66.97 % |
---|   |   |
Rx bps |12.09 Gbps | 0 bps |12.09 Gbps
Rx pps |20.99 Mpps | 0 pps |20.99 Mpps
----   |   |   |
opackets   |2654147973 |1223919476 |3878067449
ipackets   |1030488553 |2012583907 |3043072460
obytes |  180482062450 |   78330844946 |  258812907396
ibytes |   74195178964 |  136855711844 |  211050890808
tx-pkts|2.65 Gpkts |1.22 Gpkts |3.88 Gpkts
rx-pkts|1.03 Gpkts |2.01 Gpkts |3.04 Gpkts
tx-bytes   | 180.48 GB |  78.33 GB | 258.81 GB
rx-bytes   |   74.2 GB | 136.86 GB | 211.05 GB
-----  |   |   |
oerrors| 0 | 0 | 0
ierrors| 0 | 0 | 0

On the other side of the traffic testing environment is a machine that can process the packets and send them back to the TRex machine on port 1. Possible usages are testpmd, vpp and ovs-dpdk.

Additional resources

Comments