Red Hat Training

A Red Hat training course is available for Red Hat OpenStack Platform

Chapter 22. SR-IOV support for virtual networking

First introduced in RHEL OpenStack Platform 6, single root I/O virtualization (SR-IOV) support was extended to virtual machine networking. SR-IOV enables OpenStack to put aside the previous requirement for virtual bridges, and instead extends the physical NIC’s capabilities directly through to the instance. In addition, support for IEEE 802.1br allows virtual NICs to integrate with, and be managed by, the physical switch.

Note

For information on Network Function Virtualization (NFV), see the NFV Configuration Guide.

22.1. Configure SR-IOV in your Red Hat OpenStack Platform deployment

SR-IOV adds support for the concept of a virtual function which, while presented as a PCI device on the hardware, is a virtual interface that is provided by the physical function.

This chapter contains procedures for configuring SR-IOV to pass a physical NIC through to a virtual instance. These steps assume a deployment using a Controller node, an OpenStack Networking (neutron) node, and multiple Compute (nova) nodes.

Note: Virtual machine instances using SR-IOV virtual function (VF) ports and virtual machine instances using regular ports (for example, linked to Open vSwitch bridge), can communicate with each other across the network assuming that the appropriate L2 configuration (flat, VLAN) is in place. At present, there is a limitation where instances using SR-IOV ports and instances using regular vSwitch ports which reside on the same Compute node cannot communicate with each other if they are sharing the same Physical Function (PF) on the network adapter.

22.2. Create Virtual Functions on the Compute node

Perform these steps on all Compute nodes with supported hardware.

Note: Please refer to this article for details on supported drivers.

This procedure configures a system to passthrough an Intel 82576 network device. Virtual Functions are also created, which can then be used by instances for SR-IOV access to the device.

1. Ensure that Intel VT-d or AMD IOMMU are enabled in the system’s BIOS. Refer to the machine’s BIOS configuration menu, or other means available from the manufacturer.

2. Ensure that Intel VT-d or AMD IOMMU are enabled in the operating system:

  • For Intel VT-d systems, refer to the procedure here.
  • For AMD IOMMU systems, refer to the procedure here.

3. Run the lspci command to ensure the network device is recognized by the system:

[root@compute ~]# lspci | grep 82576

The network device is included in the results:

03:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
03:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)

4. Perform these steps to activate Virtual Functions on the Compute node:

4a. Remove the kernel module. This will allow it to be configured in the next step:

[root@compute ~]# modprobe -r igb

Note: The module used by the SRIOV-supported NIC should be used in step 4 , rather than igb for other NICs (for example, ixgbe or mlx4_core). Confirm the driver by running the ethtool command. In this example, em1 is the PF to use:

[root@compute ~]# ethtool -i em1 | grep ^driver

4b. Start the module with max_vfs set to 7 (or up to the maximum supported).

[root@compute ~]# modprobe igb max_vfs=7

4c. Make the Virtual Functions persistent:

[root@compute ~]# echo "options igb max_vfs=7" >>/etc/modprobe.d/igb.conf

Note: For Red Hat Enterprise Linux 7, to make the aforementioned changes persistent, rebuild the initial ramdisk image after completing step 4.

Note: Regarding the persistence of the settings in steps 4c. and 4d.: The modprobe command enables Virtual Functions on all NICs that use the same kernel module, and makes the change persist through system reboots. It is possible to enable VFs for only a specific NIC, however there are some possible issues that can result. For example, this command enables VFs for the enp4s0f1 interface:

# echo 7 > /sys/class/net/enp4s0f1/device/sriov_numvfs

However, this setting will not persist after a reboot. A possible workaround is to add this to rc.local, but this has its own limitation, as described in the note below:

# chmod +x /etc/rc.d/rc.local
# echo "echo 7 > /sys/class/net/enp4s0f1/device/sriov_numvfs" >> /etc/rc.local

Note: Since the addition of systemd, Red Hat Enterprise Linux starts services in parallel, rather than in series. This means that rc.local no longer executes at a predictable point in the boot process. As a result, unexpected behavior can occur, and this configuration is not recommended.

4d. Activate Intel VT-d in the kernel by appending the intel_iommu=pt and igb.max_vfs=7 parameters to the kernel command line. You can either change your current settings if you are going to always boot the kernel this way, or you can create a custom menu entry with these parameters, in which case your system will boot with these parameters by default, but you will also be able to boot the kernel without these parameters if need be.

To change your current kernel command line parameters, run the following command:

[root@compute ~]# grubby --update-kernel=ALL --args="intel_iommu=pt igb.max_vfs=7"

For more information on using grubby, see Configuring GRUB 2 Using the grubby Tool in the System Administrator’s Guide.

Note: If using a Dell Power Edge R630 node, you will need to use intel_iommu=on instead of intel_iommu=pt. You can enable this using grubby:

# grubby --update-kernel=ALL --args="intel_iommu=on"

To create a custom menu entry:

i. Find the default entry in grub:

[root@compute ~]# grub2-editenv list
saved_entry=Red Hat Enterprise Linux Server (3.10.0-123.9.2.el7.x86_64) 7.0 (Maipo)

ii. a. Copy the desired menuentry starting with the value of saved_entry from /boot/grub2/grub.cfg to /etc/grub.d/40_custom. The entry begins with the line starting with "menuentry" and ends with a line containing "}" b. Change the title after menuentry c. Add intel_iommu=pt igb.max_vfs=7 to the end of the line starting with linux16.

For example:

menuentry 'Red Hat Enterprise Linux Server, with Linux 3.10.0-123.el7.x86_64 - SRIOV' --class red --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-3.10.0-123.el7.x86_64-advanced-4718717c-73ad-4f5f-800f-f415adfccd01' {
    load_video
    set gfxpayload=keep
    insmod gzio
    insmod part_msdos
    insmod ext2
    set root='hd0,msdos2'
    if [ x$feature_platform_search_hint = xy ]; then
      search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos2 --hint-efi=hd0,msdos2 --hint-baremetal=ahci0,msdos2 --hint='hd0,msdos2'  5edd1db4-1ebc-465c-8212-552a9c97456e
    else
      search --no-floppy --fs-uuid --set=root 5edd1db4-1ebc-465c-8212-552a9c97456e
    fi
    linux16 /vmlinuz-3.10.0-123.el7.x86_64 root=UUID=4718717c-73ad-4f5f-800f-f415adfccd01 ro vconsole.font=latarcyrheb-sun16 biosdevname=0 crashkernel=auto  vconsole.keymap=us nofb console=ttyS0,115200 LANG=en_US.UTF-8 intel_iommu=pt igb.max_vfs=7
    initrd16 /initramfs-3.10.0-123.el7.x86_64.img
}

iii. Update grub.cfg to apply the change config file:

[root@compute ~]# grub2-mkconfig -o /boot/grub2/grub.cfg

iv. Change the default entry:

[root@compute ~]# grub2-set-default 'Red Hat Enterprise Linux Server, with Linux 3.10.0-123.el7.x86_64 - SRIOV'

v. Create the dist.conf configuration file.

Note: Before performing this step, review the section describing the effects of allow_unsafe_interrupts: Review the allow_unsafe_interrupts setting.

[root@compute ~]# echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > /etc/modprobe.d/dist.conf

5. Reboot the server to apply the new kernel parameters:

[root@compute ~]# systemctl reboot

6. Review the SR-IOV kernel module on the Compute node. Confirm that the module has been loaded by running lsmod:

[root@compute ~]#  lsmod |grep igb

The filtered results will include the necessary module:

igb    87592  0
dca    6708    1 igb

7. Review the PCI vendor ID Make a note of the PCI vendor ID (in vendor_id:product_id format) of your network adapter. Extract this from the output of the lspci command using the -nn flag. For example:

[root@compute ~]# lspci  -nn | grep -i 82576
05:00.0 Ethernet controller [0200]: Intel Corporation 82576 Gigabit Network Connection [8086:10c9] (rev 01)
05:00.1 Ethernet controller [0200]: Intel Corporation 82576 Gigabit Network Connection [8086:10c9] (rev 01)
05:10.0 Ethernet controller [0200]: Intel Corporation 82576 Virtual Function [8086:10ca] (rev 01)

Note: This parameter may differ depending on your network adapter hardware.

8. Review the new Virtual Functions Use lspci to list the newly-created VFs:

[root@compute ~]# lspci | grep 82576

The results will now include the device plus the Virtual Functions:

0b:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
0b:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection(rev 01)
0b:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:10.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:10.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:10.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:10.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:10.5 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:10.6 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:10.7 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:11.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:11.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:11.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:11.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:11.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:11.5 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)

22.3. Configure SR-IOV on the Network Node

OpenStack Networking (neutron) uses a ML2 mechanism driver to support SR-IOV. Perform these steps on the Network node to configure the SR-IOV driver. In this procedure, you add the mechanism driver, ensure that vlan is among the enabled drivers, and then define the VLAN ranges:

1. Enable sriovnicswitch in the /etc/neutron/plugins/ml2/ml2_conf.ini file. For example, this configuration enables the SR-IOV mechanism driver alongside Open vSwitch.

Note: sriovnicswitch does not support the current interface drivers for DHCP Agent, so openvswitch (or other mechanism driver with VLAN support) is a requirement when using sriovnicswitch.

[ml2]
tenant_network_types = vlan
type_drivers = vlan
mechanism_drivers = openvswitch, sriovnicswitch
[ml2_type_vlan]
network_vlan_ranges = physnet1:15:20
  • network_vlan_ranges - In this example, physnet1 is used as the network label, followed by the specified VLAN range of 15-20.

Note: The mechanism driver sriovnicswitch currently supports only the flat and vlan drivers. However, enabling sriovnicswitch does not limit you to only having flat or vlan tenant networks. VXLAN and GRE, among others, can still be used for instances that are not using SR-IOV ports.

2. Optional - The supported vendor_id/product_id couples are 15b3:1004, 8086:10ca. Specify your NIC vendor’s product ID if it differs from these. In addition, you will need to modify this list if PF passthrough is being used. For example:

[ml2_sriov]
supported_pci_vendor_devs = 15b3:1004,8086:10ca

3. Restart the neutron-server service to apply the configuration:

[root@network ~]# systemctl restart neutron-server.service

22.4. Configure SR-IOV on the Controller Node

1. To allow proper scheduling of SR-IOV devices, the Compute scheduler needs to use FilterScheduler with the PciPassthroughFilter filter. Apply this configuration in the nova.conf file on the Controller node. For example:

scheduler_available_filters=nova.scheduler.filters.all_filters
scheduler_default_filters=RetryFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,CoreFilter, PciPassthroughFilter

2. Restart the Compute scheduler to apply the change:

[root@compute ~]# systemctl restart openstack-nova-scheduler.service

22.5. Configure SR-IOV in Compute

On all Compute nodes, associate the available VFs with each physical network:

1. Define the entries in the nova.conf file. This example adds the VF network matching enp5s0f1, and tags physical_network as physnet1, the network label previously configured in network_vlan_ranges.

pci_passthrough_whitelist={"devname": "enp5s0f1", "physical_network":"physnet1"}

This example adds the PF network matching vendor ID 8086, and tags physical_network as physnet1: ~ pci_passthrough_whitelist = \{"vendor_id": "8086","product_id": "10ac", "physical_network":"physnet1"} ~

PCI passthrough whitelist entries use the following syntax:

["device_id": "<id>",] ["product_id": "<id>",]
["address": "[[[[<domain>]:]<bus>]:][<slot>][.[<function>]]" |
"devname": "Ethernet Interface Name",]
"physical_network":"Network label string"
  • id - The id setting accepts the * wildcard value, or a valid device/product id. You can use lspci to list the valid device names.
  • address - The address value uses the same syntax as displayed by lspci using the -s switch.
  • devname - The devname is a valid PCI device name. You can list the available names using ifconfig -a. This entry must correspond to either a PF or VF value that is associated with a vNIC. If the device defined by the address or devname corresponds to a SR-IOV PF, all the VFs under the PF will match the entry. It is possible to associate 0 or more tags with an entry.
  • physical_network - When using SR-IOV networking, "physical_network" is used to define the physical network that devices are attached to.

You can configure multiple whitelist entries per host. The fields device_id, product_id, and address or devname will be matched against PCI devices that are returned as a result of querying libvirt.

2. Apply the changes by restarting the nova-compute service:

[root@compute ~]# systemctl restart openstack-nova-compute

22.6. Enable the OpenStack Networking SR-IOV agent

1. Install the sriov-nic-agent package in order to complete the following steps:

[root@compute ~]# yum install openstack-neutron-sriov-nic-agent

2. Enable NoopFirewallDriver in the /etc/neutron/plugins/ml2/openvswitch_agent.ini file:

[root@compute ~]# openstack-config --set /etc/neutron/plugins/ml2/openvswitch_agent.ini securitygroup firewall_driver neutron.agent.firewall.NoopFirewallDriver

3. Add mappings to the /etc/neutron/plugins/ml2/sriov_agent.ini file. In this example, physnet1 is the physical network, and enp4s0f1 is the physical function. Leave exclude_devices blank to allow the agent to manage all associated VFs.

[sriov_nic]
physical_device_mappings = physnet1:enp4s0f1
exclude_devices =

4. Optional - Exclude VFs To exclude specific VFs from agent configuration, list them in the sriov_nic section. For example:

exclude_devices = eth1:0000:07:00.2; 0000:07:00.3, eth2:0000:05:00.1; 0000:05:00.2

5. Start the OpenStack Networking SR-IOV agent:

[root@compute ~]# systemctl enable neutron-sriov-nic-agent.service
[root@compute ~]# systemctl start neutron-sriov-nic-agent.service

22.7. Configure an instance to use the SR-IOV port

Overview of SR-IOV functions

Using SR-IOV, you can give an instance direct access to a NIC by using Physical Functions (PFs) and Virtual Functions (VFs). PFs use VFs to allow multiple instances to have direct access to the same PCI card. As a result, the PCI card can be thought of as being logically partitioned into VFs for use by multiple instances. SR-IOV is thereby different to PCI passthrough, which only allows one instance to have exclusive access to the PCI device.

Note

SR-IOV NICs cannot concurrently bind to instances when using both PFs and VFs. Due to memory address protection, an instance should not have control of the PF if other instances are using VFs. In other words, unless a single instance is using the card by binding directly to the PF (almost equivalent to PCI passthrough), in most cases neutron will pass VFs to instances and let the host control the PF.

Limitations of Virtual Functions

  • Physical Function passthrough might be more appropriate for certain use cases. For example:

    • Residential vCPE
    • BNG/BRAS (IPoE or PPPoE)
    • PE for VPLS or VLL
    • VPN L3 (using multiple VLANs by port)
    • Other use cases where specific traffic handling is required, depending on level 2 encapsulation. For example, QinQ encapsulation for MAN networks.
  • When using VFs, your server NIC port may end up blocking traffic. This is expected behavior that helps mitigate spoofing attacks from instances sharing VFs from the same NIC. As a result, your NIC vendor might recommend allowing promiscuous unicast, disabling antispoofing, and disabling ingress VLAN filtering.
  • A NIC should be configured to use either physical functions or virtual functions; not both at the same time.

Example configuration

In this example, the SR-IOV port is added to the web network.

1. Retrieve the list of available networks:

[root@network ~]# neutron net-list
+--------------------------------------+---------+------------------------------------------------------+
| id                                   | name    | subnets                                              |
+--------------------------------------+---------+------------------------------------------------------+
| 3c97eb09-957d-4ed7-b80e-6f052082b0f9 | corp    | 78328449-796b-49cc-96a8-1daba7a910be 172.24.4.224/28 |
| 721d555e-c2e8-4988-a66f-f7cbe493afdb | web     | 140e936e-0081-4412-a5ef-d05bacf3d1d7 10.0.0.0/24     |
+--------------------------------------+---------+------------------------------------------------------+

The result lists the networks that have been created in OpenStack Networking, and includes subnet details.

2. Create the port inside the web network:

[root@network ~]# neutron port-create web --name sr-iov --binding:vnic-type direct
Created a new port:
+-----------------------+---------------------------------------------------------------------------------+
| Field                 | Value                                                                           |
+-----------------------+---------------------------------------------------------------------------------+
| admin_state_up        | True                                                                            |
| allowed_address_pairs |                                                                                 |
| binding:host_id       |                                                                                 |
| binding:profile       | {}                                                                              |
| binding:vif_details   | {}                                                                              |
| binding:vif_type      | unbound                                                                         |
| binding:vnic_type     | normal                                                                          |
| device_id             |                                                                                 |
| device_owner          |                                                                                 |
| fixed_ips             | {"subnet_id": "140e936e-0081-4412-a5ef-d05bacf3d1d7", "ip_address": "10.0.0.2"} |
| id                    | a2122b4d-c9a9-4a40-9b67-ca514ea10a1b                                            |
| mac_address           | fa:16:3e:b1:53:b3                                                               |
| name                  | sr-iov                                                                          |
| network_id            | 721d555e-c2e8-4988-a66f-f7cbe493afdb                                            |
| security_groups       | 3f06b19d-ec28-427b-8ec7-db2699c63e3d                                            |
| status                | DOWN                                                                            |
| tenant_id             | 7981849293f24ed48ed19f3f30e69690                                                |
+-----------------------+---------------------------------------------------------------------------------+

3. Create an instance using the new port.

Create a new instance named webserver01, and configure it to use the new port, using the port ID from the previous output in the id field:

Note: You can retrieve a list of available images and their UUIDs using the glance image-list command.

[root@compute ~]# nova boot --flavor m1.tiny --image 59a66200-45d2-4b21-982b-d06bc26ff2d0  --nic port-id=a2122b4d-c9a9-4a40-9b67-ca514ea10a1b  webserver01

Your new instance webserver01 has been created and configured to use the SR-IOV port.

22.8. Review the allow_unsafe_interrupts setting

Platform support for interrupt remapping is required to fully isolate a guest with assigned devices from the host. Without such support, the host may be vulnerable to interrupt injection attacks from a malicious guest. In an environment where guests are trusted, the admin may opt-in to still allow PCI device assignment using the allow_unsafe_interrupts option. Review whether you need to enable allow_unsafe_interrupts on your host. If the IOMMU on the host supports interrupt remapping, then there is no need to enable this feature.

1. Use dmesg to confirm whether your host supports IOMMU interrupt remapping:

[root@compute ~]# dmesg |grep ecap

If bit 3 of the ecap (0xf020ff → …​1111) is 1 , this indicates that the IOMMU supports interrupt remapping.

2. Confirm whether IRQ remapping is enabled:

[root@compute ~]# dmesg |grep "Enabled IRQ"
[    0.033413] Enabled IRQ remapping in x2apic mode

Note: "IRQ remapping" can be disabled manually by adding intremap=off to grub.conf.

3. If the host’s IOMMU does not support interrupt remapping, you will need to enable allow_unsafe_assigned_interrupts=1 in the kvm module.

22.9. Add a Physical Function to an Instance

You can configure Compute to expose Physical Functions (PFs) to instances. This use case helps NFV applications by granting instances full control over the physical port, allowing them to use some of the functionality not available to Virtual Functions (VF). These instances can then bypass some of the limitations certain cards impose on VFs, or can exclusively use the full bandwidth of the port.

In the event that none of the child VFs are assigned to instances, compute can allow the assignment of a free PF to instances. Once a PF is assigned, then none of its VFs will be available. The VFs will be available again once the instance is shutdown and the PF becomes free. This also works in reverse: Compute will prevent a PF from being assigned if one of its VFs is already assigned.

22.9.1. Configure Compute for Physical Functions

Expose the physical functions by adding device_type: type-PF to your nova.conf whitelist. For example:

pci_passthrough_whitelist={"product_id":"10ed", "vendor_id":"8086", "physical_network":"physnet1",'device_type': 'type-PF}
22.9.2. Configure Physical Functions

You can manage SR-IOV PFs as if they were neutron ports. Neutron supports the vnic_type of direct-physical, with the resulting vNIC then used by nova to select a PF on a host and perform passthrough to a guest (using the new VIF type). As a result, nova will update the neutron port with the MAC address of the selected PF on the host.

For example, to create a PF on a network called Network1:

$ neutron port-create Network1 --name pf-port --binding:vnic_type direct-physical

You can then attach the resulting port to an instance for PF access.

22.9.3. Expose SR-IOV physical functions VLAN Tags to Guests

Solutions that require access to physical functions may also need to manipulate network settings in the same way as they would with the virtual functions. Although network awareness for the passed-through physical functions has been implemented for some time, the VLAN tags, set on the associated neutron port, were ignored. The latest release of Red Hat OpenStack Platform changes this behavior and sends the VLAN tags information to the instances.

The following example shows the way in which the guest operating system will receive the VLAN related information:

{
  "devices": [
    {
        "type": "nic",
        "bus": "pci",
        "address": "0000:00:02.0",
        "mac": "01:22:22:42:22:21",
        "tags": ["nfvfunc1"]
        "vlan": 1000
    }..., ]
}

You can use the provided information to set up the network connection, for example by creating a corresponding configuration in a ifcfg-<name> file.

While it is possible to control the traffic going to virtual functions by using the provided VLAN tags to filter out any unwanted packets, there is no such mechanism in Compute or Networking that would be able to provide same functionality to physical functions, because the operating system user has no control over either the physical interface or the physical connection to the switch.

It is therefore important that administrators manually set up the physical network, mapped to the whitelisted PCI device, to allow only traffic that is intended for the particular user or tenant. For instance, you can make the necessary network separation to secure the setup by configuring the top-of-rack switches to map specific PCI devices to physical networks.

22.10. Additional considerations

  • When selecting a vNIC type, note that vnic_type=macvtap is not currently supported.
  • VM migration with SR-IOV attached instances is not supported.
  • Security Groups can not currently be used with SR-IOV enabled ports.