Chapter 22. SR-IOV support for virtual networking

First introduced in RHEL OpenStack Platform 6, single root I/O virtualization (SR-IOV) support was extended to virtual machine networking. SR-IOV enables OpenStack to put aside the previous requirement for virtual bridges, and instead extends the physical NIC’s capabilities directly through to the instance. In addition, support for IEEE 802.1br allows virtual NICs to integrate with, and be managed by, the physical switch.

22.1. Configure SR-IOV in your RHEL OpenStack Platform deployment

SR-IOV adds support for the concept of a virtual function which, while presented as a PCI device on the hardware, is a virtual interface that is provided by the physical function.

This chapter contains procedures for configuring SR-IOV to pass a physical NIC through to a virtual instance. These steps assume a deployment using a Controller node, an OpenStack Networking (neutron) node, and multiple Compute (nova) nodes.

Note

Virtual machine instances can use SR-IOV ports or regular vSwitch ports. If a flat or VLAN L2 configuration is in place, SR-IOV ports and regular vSwitch ports can communicate with each other across the network or from different physical functions on the same compute node. If the instances both reside on the same compute node and share the physical function on the network adapter, they can only communicate if both use the same type of port (both use SR-IOV or both use regular vSwitch).

22.2. Create Virtual Functions on the Compute node

Perform these steps on all Compute nodes with supported hardware.

Note: Please refer to this article for details on supported drivers.

This procedure configures a system to passthrough an Intel 82576 network device. Virtual Functions are also created, which can then be used by instances for SR-IOV access to the device.

1. Ensure that Intel VT-d or AMD IOMMU are enabled in the system’s BIOS. Refer to the machine’s BIOS configuration menu, or other means available from the manufacturer.

2. Ensure that Intel VT-d or AMD IOMMU are enabled in the operating system:

  • For Intel VT-d systems, refer to the procedure here.
  • For AMD IOMMU systems, refer to the procedure here.

3. Run the lspci command to ensure the network device is recognized by the system:

[root@compute ~]# lspci | grep 82576

The network device is included in the results:

03:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
03:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)

4. Perform these steps to activate Virtual Functions on the Compute node:

4a. Remove the kernel module. This will allow it to be configured in the next step:

[root@compute ~]# modprobe -r igb

Note: The module used by the SRIOV-supported NIC should be used in step 4 , rather than igb for other NICs (for example, ixgbe or mlx4_core). Confirm the driver by running the ethtool command. In this example, em1 is the PF (Physical Function) we want to use:

[root@compute ~]# ethtool -i em1 | grep ^driver

4b. Start the module with max_vfs set to 7 (or up to the maximum supported).

[root@compute ~]# modprobe igb max_vfs=7

4c. Make the Virtual Functions persistent:

[root@compute ~]# echo "options igb max_vfs=7" >>/etc/modprobe.d/igb.conf

Note: For Red Hat Enterprise Linux 7, to make the aforementioned changes persistent, rebuild the initial ramdisk image after completing step 4.

Note: Regarding the persistence of the settings in steps 4c. and 4d.: The modprobe command enables Virtual Functions on all NICs that use the same kernel module, and makes the change persist through system reboots. It is possible to enable VFs for only a specific NIC, however there are some possible issues that can result. For example, this command enables VFs for the enp4s0f1 interface:

# echo 7 > /sys/class/net/enp4s0f1/device/sriov_numvfs

However, this setting will not persist after a reboot. A possible workaround is to add this to rc.local, but this has its own limitation, as described in the note below:

# chmod +x /etc/rc.d/rc.local
# echo "echo 7 > /sys/class/net/enp4s0f1/device/sriov_numvfs" >> /etc/rc.local

Note: Since the addition of systemd, Red Hat Enterprise Linux starts services in parallel, rather than in series. This means that rc.local no longer executes at a predictable point in the boot process. As a result, unexpected behavior can occur, and this configuration is not recommended.

4d. Activate Intel VT-d in the kernel by appending the intel_iommu=pt and igb.max_vfs=7 parameters to the kernel command line. You can either change your current settings if you are going to always boot the kernel this way, or you can create a custom menu entry with these parameters, in which case your system will boot with these parameters by default, but you will also be able to boot the kernel without these parameters if need be.

To change your current kernel command line parameters, run the following command:

[root@compute ~]# grubby --update-kernel=ALL --args="intel_iommu=pt igb.max_vfs=7"

For more information on using grubby, see Configuring GRUB 2 Using the grubby Tool in the System Administrator’s Guide.

Note: If using a Dell Power Edge R630 node, you will need to use intel_iommu=on instead of intel_iommu=pt. You can enable this using grubby:

# grubby --update-kernel=ALL --args="intel_iommu=on"

To create a custom menu entry:

i. Find the default entry in grub:

[root@compute ~]# grub2-editenv list
saved_entry=Red Hat Enterprise Linux Server (3.10.0-123.9.2.el7.x86_64) 7.0 (Maipo)

ii. a. Copy the desired menuentry starting with the value of saved_entry from /boot/grub2/grub.cfg to /etc/grub.d/40_custom. The entry begins with the line starting with "menuentry" and ends with a line containing "}" b. Change the title after menuentry c. Add intel_iommu=pt igb.max_vfs=7 to the end of the line starting with linux16.

For example:

menuentry 'Red Hat Enterprise Linux Server, with Linux 3.10.0-123.el7.x86_64 - SRIOV' --class red --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-3.10.0-123.el7.x86_64-advanced-4718717c-73ad-4f5f-800f-f415adfccd01' {
    load_video
    set gfxpayload=keep
    insmod gzio
    insmod part_msdos
    insmod ext2
    set root='hd0,msdos2'
    if [ x$feature_platform_search_hint = xy ]; then
      search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos2 --hint-efi=hd0,msdos2 --hint-baremetal=ahci0,msdos2 --hint='hd0,msdos2'  5edd1db4-1ebc-465c-8212-552a9c97456e
    else
      search --no-floppy --fs-uuid --set=root 5edd1db4-1ebc-465c-8212-552a9c97456e
    fi
    linux16 /vmlinuz-3.10.0-123.el7.x86_64 root=UUID=4718717c-73ad-4f5f-800f-f415adfccd01 ro vconsole.font=latarcyrheb-sun16 biosdevname=0 crashkernel=auto  vconsole.keymap=us nofb console=ttyS0,115200 LANG=en_US.UTF-8 intel_iommu=pt igb.max_vfs=7
    initrd16 /initramfs-3.10.0-123.el7.x86_64.img
}

iii. Update grub.cfg to apply the change config file:

[root@compute ~]# grub2-mkconfig -o /boot/grub2/grub.cfg

iv. Change the default entry:

[root@compute ~]# grub2-set-default 'Red Hat Enterprise Linux Server, with Linux 3.10.0-123.el7.x86_64 - SRIOV'

v. Create the dist.conf configuration file.

Note: Before performing this step, review the section describing the effects of allow_unsafe_interrupts: Review the allow_unsafe_interrupts setting.

[root@compute ~]# echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > /etc/modprobe.d/dist.conf

5. Reboot the server to apply the new kernel parameters:

[root@compute ~]# systemctl reboot

6. Review the SR-IOV kernel module on the Compute node. Confirm that the module has been loaded by running lsmod:

[root@compute ~]#  lsmod |grep igb

The filtered results will include the necessary module:

igb    87592  0
dca    6708    1 igb

7. Review the PCI vendor ID Make a note of the PCI vendor ID (in vendor_id:product_id format) of your network adapter. Extract this from the output of the lspci command using the -nn flag. For example:

[root@compute ~]# lspci  -nn | grep -i 82576
05:00.0 Ethernet controller [0200]: Intel Corporation 82576 Gigabit Network Connection [8086:10c9] (rev 01)
05:00.1 Ethernet controller [0200]: Intel Corporation 82576 Gigabit Network Connection [8086:10c9] (rev 01)
05:10.0 Ethernet controller [0200]: Intel Corporation 82576 Virtual Function [8086:10ca] (rev 01)

Note: This parameter may differ depending on your network adapter hardware.

8. Review the new Virtual Functions Use lspci to list the newly-created VFs:

[root@compute ~]# lspci | grep 82576

The results will now include the device plus the Virtual Functions:

0b:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
0b:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection(rev 01)
0b:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:10.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:10.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:10.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:10.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:10.5 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:10.6 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:10.7 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:11.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:11.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:11.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:11.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:11.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:11.5 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)

22.3. Configure SR-IOV on the Network Node

OpenStack Networking (neutron) uses a ML2 mechanism driver to support SR-IOV. Perform these steps on the Network node to configure the SR-IOV driver. In this procedure, you add the mechanism driver, ensure that vlan is among the enabled drivers, and then define the VLAN ranges:

1. Enable sriovnicswitch in the /etc/neutron/plugins/ml2/ml2_conf.ini file. For example, this configuration enables the SR-IOV mechanism driver alongside Open vSwitch.

Note

sriovnicswitch does not support the current interface drivers for DHCP Agent. To use DHCP-assigned addresses with SR-IOV, configure the neutron-dhcp-agent on the network nodes in such a way that the nodes will use the openvswitch interface (or other mechanism driver with VLAN support).

[ml2]
tenant_network_types = vlan
type_drivers = vlan
mechanism_drivers = openvswitch, sriovnicswitch
[ml2_type_vlan]
network_vlan_ranges = physnet1:15:20
  • network_vlan_ranges - In this example, physnet1 is used as the network label, followed by the specified VLAN range of 15-20.

Note: The mechanism driver sriovnicswitch currently supports only the flat and vlan drivers. However, enabling sriovnicswitch does not limit you to only having flat or vlan tenant networks. VXLAN and GRE, among others, can still be used for instances that are not using SR-IOV ports.

2. Optional - If you require VF link state and admin state management, and your vendor supports these features, then enable this option in the /etc/neutron/plugins/ml2/sriov_agent.ini file:

[root@network ~]# openstack-config --set /etc/neutron/plugins/ml2/sriov_agent.ini ml2_sriov agent_required True

3. Optional - The supported vendor_id/product_id couples are 15b3:1004, 8086:10ca. Specify your NIC vendor’s product ID if it differs from these. For example:

[ml2_sriov]
supported_pci_vendor_devs = 15b3:1004,8086:10ca

4. Configure neutron-server.service to use the ml2_conf_sriov.ini file. For example:

[root@network ~]# vi /usr/lib/systemd/system/neutron-server.service

[Service]
Type=notify
User=neutron
ExecStart=/usr/bin/neutron-server --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini --config-file /etc/neutron/plugins/ml2/sriov_agent.ini  --log-file /var/log/neutron/server.log

5. Restart the neutron-server service to apply the configuration:

[root@network ~]# systemctl restart neutron-server.service

22.4. Configure SR-IOV on the Controller Node

1. To allow proper scheduling of SR-IOV devices, the Compute scheduler needs to use FilterScheduler with the PciPassthroughFilter filter. Apply this configuration in the nova.conf file on the Controller node. For example:

scheduler_available_filters=nova.scheduler.filters.all_filters
scheduler_default_filters=RetryFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,CoreFilter, PciPassthroughFilter

2. Restart the Compute scheduler to apply the change:

[root@compute ~]# systemctl restart openstack-nova-scheduler.service

22.5. Configure SR-IOV in Compute

On all Compute nodes, associate the available VFs with each physical network:

1. Define the entries in the nova.conf file. This example adds the VF network matching enp5s0f1, and tags physical_network as physnet1, the network label previously configured in network_vlan_ranges.

pci_passthrough_whitelist={"devname": "enp5s0f1", "physical_network":"physnet1"}

This example adds the PF network matching vendor ID 8086, and tags physical_network as physnet1: ~ pci_passthrough_whitelist = \{"vendor_id": "8086","product_id": "10ac", "physical_network":"physnet1"} ~

PCI passthrough whitelist entries use the following syntax:

["device_id": "<id>",] ["product_id": "<id>",]
["address": "[[[[<domain>]:]<bus>]:][<slot>][.[<function>]]" |
"devname": "Ethernet Interface Name",]
"physical_network":"Network label string"
  • id - The id setting accepts the * wildcard value, or a valid device/product id. You can use lspci to list the valid device names.
  • address - The address value uses the same syntax as displayed by lspci using the -s switch.
  • devname - The devname is a valid PCI device name. You can list the available names using ifconfig -a. This entry must correspond to either a PF or VF value that is associated with a vNIC. If the device defined by the address or devname corresponds to a SR-IOV PF, all the VFs under the PF will match the entry. It is possible to associate 0 or more tags with an entry.
  • physical_network - When using SR-IOV networking, "physical_network" is used to define the physical network that devices are attached to.

You can configure multiple whitelist entries per host. The fields device_id, product_id, and address or devname will be matched against PCI devices that are returned as a result of querying libvirt.

2. Apply the changes by restarting the nova-compute service:

[root@compute ~]# systemctl restart openstack-nova-compute

22.6. Enable the OpenStack Networking SR-IOV agent

The OpenStack Networking SR-IOV agent enables management of the admin_state port. This agent integrates with the network adapter, allowing administrators to toggle the up/down administrative state of Virtual Functions.

In addition, if agent_required=True has been configured on the OpenStack Networking (neutron) server, you must run the OpenStack Networking SR-IOV Agent on each Compute node.

Note: Not all NIC vendors currently support port status management using this agent.

1. Install the sriov-nic-agent package in order to complete the following steps:

[root@compute ~]# yum install openstack-neutron-sriov-nic-agent

2. Enable NoopFirewallDriver in the /etc/neutron/plugins/ml2/openvswitch_agent.ini file:

[root@compute ~]# openstack-config --set /etc/neutron/plugins/ml2/openvswitch_agent.ini securitygroup firewall_driver neutron.agent.firewall.NoopFirewallDriver

3. Add mappings to the /etc/neutron/plugins/ml2/sriov_agent.ini file. In this example, physnet1 is the physical network, and enp4s0f1 is the physical function. Leave exclude_devices blank to allow the agent to manage all associated VFs.

[sriov_nic]
physical_device_mappings = physnet1:enp4s0f1
exclude_devices =

4. Optional - Exclude VFs To exclude specific VFs from agent configuration, list them in the sriov_nic section. For example:

exclude_devices = eth1:0000:07:00.2; 0000:07:00.3, eth2:0000:05:00.1; 0000:05:00.2

5. Configure neutron-sriov-nic-agent.service to use the ml2_conf_sriov.ini file. For example:

[root@compute ~]# vi /usr/lib/systemd/system/neutron-sriov-nic-agent.service

[Service]
Type=simple
User=neutron
ExecStart=/usr/bin/neutron-sriov-nic-agent --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --log-file /var/log/neutron/sriov-nic-agent.log --config-file /etc/neutron/plugins/ml2/sriov_agent.ini

6. Start the OpenStack Networking SR-IOV agent:

[root@compute ~]# systemctl enable neutron-sriov-nic-agent.service
[root@compute ~]# systemctl start neutron-sriov-nic-agent.service

22.7. Configure an instance to use the SR-IOV port

In this example, the SR-IOV port is added to the web network.

1. Retrieve the list of available networks

[root@network ~]# neutron net-list
+--------------------------------------+---------+------------------------------------------------------+
| id                                   | name    | subnets                                              |
+--------------------------------------+---------+------------------------------------------------------+
| 3c97eb09-957d-4ed7-b80e-6f052082b0f9 | corp    | 78328449-796b-49cc-96a8-1daba7a910be 172.24.4.224/28 |
| 721d555e-c2e8-4988-a66f-f7cbe493afdb | web     | 140e936e-0081-4412-a5ef-d05bacf3d1d7 10.0.0.0/24     |
+--------------------------------------+---------+------------------------------------------------------+

The result lists the networks that have been created in OpenStack Networking, and includes subnet details.

2. Create the port inside the web network

[root@network ~]# neutron port-create web --name sr-iov --binding:vnic-type direct
Created a new port:
+-----------------------+---------------------------------------------------------------------------------+
| Field                 | Value                                                                           |
+-----------------------+---------------------------------------------------------------------------------+
| admin_state_up        | True                                                                            |
| allowed_address_pairs |                                                                                 |
| binding:host_id       |                                                                                 |
| binding:profile       | {}                                                                              |
| binding:vif_details   | {}                                                                              |
| binding:vif_type      | unbound                                                                         |
| binding:vnic_type     | normal                                                                          |
| device_id             |                                                                                 |
| device_owner          |                                                                                 |
| fixed_ips             | {"subnet_id": "140e936e-0081-4412-a5ef-d05bacf3d1d7", "ip_address": "10.0.0.2"} |
| id                    | a2122b4d-c9a9-4a40-9b67-ca514ea10a1b                                            |
| mac_address           | fa:16:3e:b1:53:b3                                                               |
| name                  | sr-iov                                                                          |
| network_id            | 721d555e-c2e8-4988-a66f-f7cbe493afdb                                            |
| security_groups       | 3f06b19d-ec28-427b-8ec7-db2699c63e3d                                            |
| status                | DOWN                                                                            |
| tenant_id             | 7981849293f24ed48ed19f3f30e69690                                                |
+-----------------------+---------------------------------------------------------------------------------+
Note

SR-IOV ports allow the virtual machine to access a virtual network using SR-IOV virtual functions. For the SR-IOV ports, you can either enable or disable the source MAC spoof checking. This feature is useful for bonding configurations inside guests.

To send a L2 packet from a source that is different from the source MAC address of the packet, run the following steps:

  1. Enable the ML2 port security extension driver in the /etc/neutron/plugins/ml2/ml2_conf.ini file:

    extension_drivers = port_security
  2. Restart the neutron-server:

    # systemctl restart neutron-server
  3. Update the OpenStack Networking port as follows:

    # neutron port-update --no-security-groups  <port-id>
  4. Enable the spookchk control for SR-IOV ports by removing the option to disable the MAC spoofing option:

    # neutron port-update <port_id> ---port-security-enabled=False

3. Create an instance using the new port Create a new instance named webserver01, and configure it to use the new port, using the port ID from the previous output in the id field:

Note: You can retrieve a list of available images and their UUIDs using the glance image-list command.

[root@compute ~]# nova boot --flavor m1.tiny --image 59a66200-45d2-4b21-982b-d06bc26ff2d0  --nic port-id=a2122b4d-c9a9-4a40-9b67-ca514ea10a1b  webserver01

Your new instance webserver01 has been created and configured to use the SR-IOV port.

22.8. Review the allow_unsafe_interrupts setting

Platform support for interrupt remapping is required to fully isolate a guest with assigned devices from the host. Without such support, the host may be vulnerable to interrupt injection attacks from a malicious guest. In an environment where guests are trusted, the admin may opt-in to still allow PCI device assignment using the allow_unsafe_interrupts option. Review whether you need to enable allow_unsafe_interrupts on your host. If the IOMMU on the host supports interrupt remapping, then there is no need to enable this feature.

1. Use dmesg to confirm whether your host supports IOMMU interrupt remapping:

[root@compute ~]# dmesg |grep ecap

If bit 3 of the ecap (0xf020ff → …​1111) is 1 , this indicates that the IOMMU supports interrupt remapping.

2. Confirm whether IRQ remapping is enabled:

[root@compute ~]# dmesg |grep "Enabled IRQ"
[    0.033413] Enabled IRQ remapping in x2apic mode

Note: "IRQ remapping" can be disabled manually by adding intremap=off to grub.conf.

3. If the host’s IOMMU does not support interrupt remapping, you will need to enable allow_unsafe_assigned_interrupts=1 in the kvm module.

22.9. Additional considerations

  • When selecting a vNIC type, note that vnic_type=macvtap is not currently supported.
  • VM migration with SR-IOV attached instances is not supported.
  • Security Groups can not currently be used with SR-IOV enabled ports.