Red Hat Training
A Red Hat training course is available for Red Hat OpenStack Platform
Chapter 22. SR-IOV support for virtual networking
First introduced in RHEL OpenStack Platform 6, single root I/O virtualization (SR-IOV) support was extended to virtual machine networking. SR-IOV enables OpenStack to put aside the previous requirement for virtual bridges, and instead extends the physical NIC’s capabilities directly through to the instance. In addition, support for IEEE 802.1br allows virtual NICs to integrate with, and be managed by, the physical switch.
For information on Network Function Virtualization (NFV), see the NFV Configuration Guide.
22.1. Configure SR-IOV in your Red Hat OpenStack Platform deployment
SR-IOV adds support for the concept of a virtual function which, while presented as a PCI device on the hardware, is a virtual interface that is provided by the physical function.
This chapter contains procedures for configuring SR-IOV to pass a physical NIC through to a virtual instance. These steps assume a deployment using a Controller node, an OpenStack Networking (neutron) node, and multiple Compute (nova) nodes.
Note: Virtual machine instances using SR-IOV virtual function (VF) ports and virtual machine instances using regular ports (for example, linked to Open vSwitch bridge), can communicate with each other across the network assuming that the appropriate L2 configuration (flat, VLAN) is in place. At present, there is a limitation where instances using SR-IOV ports and instances using regular vSwitch ports which reside on the same Compute node cannot communicate with each other if they are sharing the same Physical Function (PF) on the network adapter.
22.2. Create Virtual Functions on the Compute node
Perform these steps on all Compute nodes with supported hardware.
Note: Please refer to this article for details on supported drivers.
This procedure configures a system to passthrough an Intel 82576 network device. Virtual Functions are also created, which can then be used by instances for SR-IOV access to the device.
1. Ensure that Intel VT-d or AMD IOMMU are enabled in the system’s BIOS. Refer to the machine’s BIOS configuration menu, or other means available from the manufacturer.
2. Ensure that Intel VT-d or AMD IOMMU are enabled in the operating system:
3. Run the lspci command to ensure the network device is recognized by the system:
[root@compute ~]# lspci | grep 82576
The network device is included in the results:
03:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 03:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
4. Perform these steps to activate Virtual Functions on the Compute node:
4a. Remove the kernel module. This will allow it to be configured in the next step:
[root@compute ~]# modprobe -r igb
Note: The module used by the SRIOV-supported NIC should be used in step 4 , rather than igb
for other NICs (for example, ixgbe
or mlx4_core
). Confirm the driver by running the ethtool command. In this example, em1
is the PF to use:
[root@compute ~]# ethtool -i em1 | grep ^driver
4b. Start the module with max_vfs set to 7 (or up to the maximum supported).
[root@compute ~]# modprobe igb max_vfs=7
4c. Make the Virtual Functions persistent:
[root@compute ~]# echo "options igb max_vfs=7" >>/etc/modprobe.d/igb.conf
Note: For Red Hat Enterprise Linux 7, to make the aforementioned changes persistent, rebuild the initial ramdisk image after completing step 4.
Note: Regarding the persistence of the settings in steps 4c. and 4d.: The modprobe
command enables Virtual Functions on all NICs that use the same kernel module, and makes the change persist through system reboots. It is possible to enable VFs for only a specific NIC, however there are some possible issues that can result. For example, this command enables VFs for the enp4s0f1
interface:
# echo 7 > /sys/class/net/enp4s0f1/device/sriov_numvfs
However, this setting will not persist after a reboot. A possible workaround is to add this to rc.local, but this has its own limitation, as described in the note below:
# chmod +x /etc/rc.d/rc.local # echo "echo 7 > /sys/class/net/enp4s0f1/device/sriov_numvfs" >> /etc/rc.local
Note: Since the addition of systemd, Red Hat Enterprise Linux starts services in parallel, rather than in series. This means that rc.local no longer executes at a predictable point in the boot process. As a result, unexpected behavior can occur, and this configuration is not recommended.
4d. Activate Intel VT-d in the kernel by appending the intel_iommu=pt and igb.max_vfs=7 parameters to the kernel command line. You can either change your current settings if you are going to always boot the kernel this way, or you can create a custom menu entry with these parameters, in which case your system will boot with these parameters by default, but you will also be able to boot the kernel without these parameters if need be.
• To change your current kernel command line parameters, run the following command:
[root@compute ~]# grubby --update-kernel=ALL --args="intel_iommu=pt igb.max_vfs=7"
For more information on using grubby, see Configuring GRUB 2 Using the grubby Tool in the System Administrator’s Guide.
Note: If using a Dell Power Edge R630 node, you will need to use intel_iommu=on
instead of intel_iommu=pt
. You can enable this using grubby:
# grubby --update-kernel=ALL --args="intel_iommu=on"
• To create a custom menu entry:
i. Find the default entry in grub:
[root@compute ~]# grub2-editenv list saved_entry=Red Hat Enterprise Linux Server (3.10.0-123.9.2.el7.x86_64) 7.0 (Maipo)
ii. a. Copy the desired menuentry starting with the value of saved_entry from /boot/grub2/grub.cfg to /etc/grub.d/40_custom. The entry begins with the line starting with "menuentry" and ends with a line containing "}" b. Change the title after menuentry c. Add intel_iommu=pt igb.max_vfs=7 to the end of the line starting with linux16.
For example:
menuentry 'Red Hat Enterprise Linux Server, with Linux 3.10.0-123.el7.x86_64 - SRIOV' --class red --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-3.10.0-123.el7.x86_64-advanced-4718717c-73ad-4f5f-800f-f415adfccd01' { load_video set gfxpayload=keep insmod gzio insmod part_msdos insmod ext2 set root='hd0,msdos2' if [ x$feature_platform_search_hint = xy ]; then search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos2 --hint-efi=hd0,msdos2 --hint-baremetal=ahci0,msdos2 --hint='hd0,msdos2' 5edd1db4-1ebc-465c-8212-552a9c97456e else search --no-floppy --fs-uuid --set=root 5edd1db4-1ebc-465c-8212-552a9c97456e fi linux16 /vmlinuz-3.10.0-123.el7.x86_64 root=UUID=4718717c-73ad-4f5f-800f-f415adfccd01 ro vconsole.font=latarcyrheb-sun16 biosdevname=0 crashkernel=auto vconsole.keymap=us nofb console=ttyS0,115200 LANG=en_US.UTF-8 intel_iommu=pt igb.max_vfs=7 initrd16 /initramfs-3.10.0-123.el7.x86_64.img }
iii. Update grub.cfg to apply the change config file:
[root@compute ~]# grub2-mkconfig -o /boot/grub2/grub.cfg
iv. Change the default entry:
[root@compute ~]# grub2-set-default 'Red Hat Enterprise Linux Server, with Linux 3.10.0-123.el7.x86_64 - SRIOV'
v. Create the dist.conf configuration file.
Note: Before performing this step, review the section describing the effects of allow_unsafe_interrupts: Review the allow_unsafe_interrupts setting.
[root@compute ~]# echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > /etc/modprobe.d/dist.conf
5. Reboot the server to apply the new kernel parameters:
[root@compute ~]# systemctl reboot
6. Review the SR-IOV kernel module on the Compute node. Confirm that the module has been loaded by running lsmod:
[root@compute ~]# lsmod |grep igb
The filtered results will include the necessary module:
igb 87592 0 dca 6708 1 igb
7. Review the PCI vendor ID Make a note of the PCI vendor ID (in vendor_id:product_id format) of your network adapter. Extract this from the output of the lspci command using the -nn flag. For example:
[root@compute ~]# lspci -nn | grep -i 82576 05:00.0 Ethernet controller [0200]: Intel Corporation 82576 Gigabit Network Connection [8086:10c9] (rev 01) 05:00.1 Ethernet controller [0200]: Intel Corporation 82576 Gigabit Network Connection [8086:10c9] (rev 01) 05:10.0 Ethernet controller [0200]: Intel Corporation 82576 Virtual Function [8086:10ca] (rev 01)
Note: This parameter may differ depending on your network adapter hardware.
8. Review the new Virtual Functions Use lspci to list the newly-created VFs:
[root@compute ~]# lspci | grep 82576
The results will now include the device plus the Virtual Functions:
0b:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 0b:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection(rev 01) 0b:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 0b:10.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 0b:10.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 0b:10.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 0b:10.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 0b:10.5 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 0b:10.6 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 0b:10.7 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 0b:11.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 0b:11.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 0b:11.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 0b:11.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 0b:11.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 0b:11.5 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
22.3. Configure SR-IOV on the Network Node
OpenStack Networking (neutron) uses a ML2 mechanism driver to support SR-IOV. Perform these steps on the Network node to configure the SR-IOV driver. In this procedure, you add the mechanism driver, ensure that vlan
is among the enabled drivers, and then define the VLAN ranges:
1. Enable sriovnicswitch in the /etc/neutron/plugins/ml2/ml2_conf.ini file. For example, this configuration enables the SR-IOV mechanism driver alongside Open vSwitch.
Note: sriovnicswitch does not support the current interface drivers for DHCP Agent, so openvswitch (or other mechanism driver with VLAN support) is a requirement when using sriovnicswitch.
[ml2] tenant_network_types = vlan type_drivers = vlan mechanism_drivers = openvswitch, sriovnicswitch [ml2_type_vlan] network_vlan_ranges = physnet1:15:20
- network_vlan_ranges - In this example, physnet1 is used as the network label, followed by the specified VLAN range of 15-20.
Note: The mechanism driver sriovnicswitch
currently supports only the flat and vlan drivers. However, enabling sriovnicswitch
does not limit you to only having flat or vlan tenant networks. VXLAN and GRE, among others, can still be used for instances that are not using SR-IOV ports.
2. Optional - The supported vendor_id/product_id couples are 15b3:1004, 8086:10ca. Specify your NIC vendor’s product ID if it differs from these. In addition, you will need to modify this list if PF passthrough is being used. For example:
[ml2_sriov] supported_pci_vendor_devs = 15b3:1004,8086:10ca
3. Restart the neutron-server service to apply the configuration:
[root@network ~]# systemctl restart neutron-server.service
22.4. Configure SR-IOV on the Controller Node
1. To allow proper scheduling of SR-IOV devices, the Compute scheduler needs to use FilterScheduler with the PciPassthroughFilter filter. Apply this configuration in the nova.conf file on the Controller node. For example:
scheduler_available_filters=nova.scheduler.filters.all_filters scheduler_default_filters=RetryFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,CoreFilter, PciPassthroughFilter
2. Restart the Compute scheduler to apply the change:
[root@compute ~]# systemctl restart openstack-nova-scheduler.service
22.5. Configure SR-IOV in Compute
On all Compute nodes, associate the available VFs with each physical network:
1. Define the entries in the nova.conf file. This example adds the VF network matching enp5s0f1, and tags physical_network as physnet1, the network label previously configured in network_vlan_ranges.
pci_passthrough_whitelist={"devname": "enp5s0f1", "physical_network":"physnet1"}
This example adds the PF network matching vendor ID 8086, and tags physical_network as physnet1: ~ pci_passthrough_whitelist = \{"vendor_id": "8086","product_id": "10ac", "physical_network":"physnet1"} ~
PCI passthrough whitelist entries use the following syntax:
["device_id": "<id>",] ["product_id": "<id>",] ["address": "[[[[<domain>]:]<bus>]:][<slot>][.[<function>]]" | "devname": "Ethernet Interface Name",] "physical_network":"Network label string"
- id - The id setting accepts the * wildcard value, or a valid device/product id. You can use lspci to list the valid device names.
- address - The address value uses the same syntax as displayed by lspci using the -s switch.
- devname - The devname is a valid PCI device name. You can list the available names using ifconfig -a. This entry must correspond to either a PF or VF value that is associated with a vNIC. If the device defined by the address or devname corresponds to a SR-IOV PF, all the VFs under the PF will match the entry. It is possible to associate 0 or more tags with an entry.
- physical_network - When using SR-IOV networking, "physical_network" is used to define the physical network that devices are attached to.
You can configure multiple whitelist entries per host. The fields device_id, product_id, and address or devname will be matched against PCI devices that are returned as a result of querying libvirt.
2. Apply the changes by restarting the nova-compute service:
[root@compute ~]# systemctl restart openstack-nova-compute
22.6. Enable the OpenStack Networking SR-IOV agent
1. Install the sriov-nic-agent package in order to complete the following steps:
[root@compute ~]# yum install openstack-neutron-sriov-nic-agent
2. Enable NoopFirewallDriver in the /etc/neutron/plugins/ml2/openvswitch_agent.ini file:
[root@compute ~]# openstack-config --set /etc/neutron/plugins/ml2/openvswitch_agent.ini securitygroup firewall_driver neutron.agent.firewall.NoopFirewallDriver
3. Add mappings to the /etc/neutron/plugins/ml2/sriov_agent.ini file. In this example, physnet1 is the physical network, and enp4s0f1 is the physical function. Leave exclude_devices blank to allow the agent to manage all associated VFs.
[sriov_nic] physical_device_mappings = physnet1:enp4s0f1 exclude_devices =
4. Optional - Exclude VFs To exclude specific VFs from agent configuration, list them in the sriov_nic section. For example:
exclude_devices = eth1:0000:07:00.2; 0000:07:00.3, eth2:0000:05:00.1; 0000:05:00.2
5. Start the OpenStack Networking SR-IOV agent:
[root@compute ~]# systemctl enable neutron-sriov-nic-agent.service [root@compute ~]# systemctl start neutron-sriov-nic-agent.service
22.7. Configure an instance to use the SR-IOV port
Overview of SR-IOV functions
Using SR-IOV, you can give an instance direct access to a NIC by using Physical Functions (PFs) and Virtual Functions (VFs). PFs use VFs to allow multiple instances to have direct access to the same PCI card. As a result, the PCI card can be thought of as being logically partitioned into VFs for use by multiple instances. SR-IOV is thereby different to PCI passthrough, which only allows one instance to have exclusive access to the PCI device.
SR-IOV NICs cannot concurrently bind to instances when using both PFs and VFs. Due to memory address protection, an instance should not have control of the PF if other instances are using VFs. In other words, unless a single instance is using the card by binding directly to the PF (almost equivalent to PCI passthrough), in most cases neutron will pass VFs to instances and let the host control the PF.
Limitations of Virtual Functions
Physical Function passthrough might be more appropriate for certain use cases. For example:
- Residential vCPE
- BNG/BRAS (IPoE or PPPoE)
- PE for VPLS or VLL
- VPN L3 (using multiple VLANs by port)
- Other use cases where specific traffic handling is required, depending on level 2 encapsulation. For example, QinQ encapsulation for MAN networks.
- When using VFs, your server NIC port may end up blocking traffic. This is expected behavior that helps mitigate spoofing attacks from instances sharing VFs from the same NIC. As a result, your NIC vendor might recommend allowing promiscuous unicast, disabling antispoofing, and disabling ingress VLAN filtering.
- A NIC should be configured to use either physical functions or virtual functions; not both at the same time.
Example configuration
In this example, the SR-IOV port is added to the web
network.
1. Retrieve the list of available networks:
[root@network ~]# neutron net-list +--------------------------------------+---------+------------------------------------------------------+ | id | name | subnets | +--------------------------------------+---------+------------------------------------------------------+ | 3c97eb09-957d-4ed7-b80e-6f052082b0f9 | corp | 78328449-796b-49cc-96a8-1daba7a910be 172.24.4.224/28 | | 721d555e-c2e8-4988-a66f-f7cbe493afdb | web | 140e936e-0081-4412-a5ef-d05bacf3d1d7 10.0.0.0/24 | +--------------------------------------+---------+------------------------------------------------------+
The result lists the networks that have been created in OpenStack Networking, and includes subnet details.
2. Create the port inside the web
network:
[root@network ~]# neutron port-create web --name sr-iov --binding:vnic-type direct Created a new port: +-----------------------+---------------------------------------------------------------------------------+ | Field | Value | +-----------------------+---------------------------------------------------------------------------------+ | admin_state_up | True | | allowed_address_pairs | | | binding:host_id | | | binding:profile | {} | | binding:vif_details | {} | | binding:vif_type | unbound | | binding:vnic_type | normal | | device_id | | | device_owner | | | fixed_ips | {"subnet_id": "140e936e-0081-4412-a5ef-d05bacf3d1d7", "ip_address": "10.0.0.2"} | | id | a2122b4d-c9a9-4a40-9b67-ca514ea10a1b | | mac_address | fa:16:3e:b1:53:b3 | | name | sr-iov | | network_id | 721d555e-c2e8-4988-a66f-f7cbe493afdb | | security_groups | 3f06b19d-ec28-427b-8ec7-db2699c63e3d | | status | DOWN | | tenant_id | 7981849293f24ed48ed19f3f30e69690 | +-----------------------+---------------------------------------------------------------------------------+
3. Create an instance using the new port.
Create a new instance named webserver01, and configure it to use the new port, using the port ID from the previous output in the id field:
Note: You can retrieve a list of available images and their UUIDs using the glance image-list command.
[root@compute ~]# nova boot --flavor m1.tiny --image 59a66200-45d2-4b21-982b-d06bc26ff2d0 --nic port-id=a2122b4d-c9a9-4a40-9b67-ca514ea10a1b webserver01
Your new instance webserver01 has been created and configured to use the SR-IOV port.
22.8. Review the allow_unsafe_interrupts setting
Platform support for interrupt remapping is required to fully isolate a guest with assigned devices from the host. Without such support, the host may be vulnerable to interrupt injection attacks from a malicious guest. In an environment where guests are trusted, the admin may opt-in to still allow PCI device assignment using the allow_unsafe_interrupts option. Review whether you need to enable allow_unsafe_interrupts on your host. If the IOMMU on the host supports interrupt remapping, then there is no need to enable this feature.
1. Use dmesg to confirm whether your host supports IOMMU interrupt remapping:
[root@compute ~]# dmesg |grep ecap
If bit 3 of the ecap (0xf020ff → …1111) is 1 , this indicates that the IOMMU supports interrupt remapping.
2. Confirm whether IRQ remapping is enabled:
[root@compute ~]# dmesg |grep "Enabled IRQ" [ 0.033413] Enabled IRQ remapping in x2apic mode
Note: "IRQ remapping" can be disabled manually by adding intremap=off to grub.conf.
3. If the host’s IOMMU does not support interrupt remapping, you will need to enable allow_unsafe_assigned_interrupts=1 in the kvm module.
22.9. Add a Physical Function to an Instance
You can configure Compute to expose Physical Functions (PFs) to instances. This use case helps NFV applications by granting instances full control over the physical port, allowing them to use some of the functionality not available to Virtual Functions (VF). These instances can then bypass some of the limitations certain cards impose on VFs, or can exclusively use the full bandwidth of the port.
In the event that none of the child VFs are assigned to instances, compute can allow the assignment of a free PF to instances. Once a PF is assigned, then none of its VFs will be available. The VFs will be available again once the instance is shutdown and the PF becomes free. This also works in reverse: Compute will prevent a PF from being assigned if one of its VFs is already assigned.
22.9.1. Configure Compute for Physical Functions
Expose the physical functions by adding device_type: type-PF to your nova.conf
whitelist. For example:
pci_passthrough_whitelist={"product_id":"10ed", "vendor_id":"8086", "physical_network":"physnet1",'device_type': 'type-PF}
22.9.2. Configure Physical Functions
You can manage SR-IOV PFs as if they were neutron ports. Neutron supports the vnic_type
of direct-physical
, with the resulting vNIC then used by nova to select a PF on a host and perform passthrough to a guest (using the new VIF type). As a result, nova will update the neutron port with the MAC address of the selected PF on the host.
For example, to create a PF on a network called Network1
:
$ neutron port-create Network1 --name pf-port --binding:vnic_type direct-physical
You can then attach the resulting port to an instance for PF access.
22.9.3. Expose SR-IOV physical functions VLAN Tags to Guests
Solutions that require access to physical functions may also need to manipulate network settings in the same way as they would with the virtual functions. Although network awareness for the passed-through physical functions has been implemented for some time, the VLAN tags, set on the associated neutron port, were ignored. The latest release of Red Hat OpenStack Platform changes this behavior and sends the VLAN tags information to the instances.
The following example shows the way in which the guest operating system will receive the VLAN related information:
{ "devices": [ { "type": "nic", "bus": "pci", "address": "0000:00:02.0", "mac": "01:22:22:42:22:21", "tags": ["nfvfunc1"] "vlan": 1000 }..., ] }
You can use the provided information to set up the network connection, for example by creating a corresponding configuration in a ifcfg-<name>
file.
While it is possible to control the traffic going to virtual functions by using the provided VLAN tags to filter out any unwanted packets, there is no such mechanism in Compute or Networking that would be able to provide same functionality to physical functions, because the operating system user has no control over either the physical interface or the physical connection to the switch.
It is therefore important that administrators manually set up the physical network, mapped to the whitelisted PCI device, to allow only traffic that is intended for the particular user or tenant. For instance, you can make the necessary network separation to secure the setup by configuring the top-of-rack switches to map specific PCI devices to physical networks.
22.10. Additional considerations
- When selecting a vNIC type, note that vnic_type=macvtap is not currently supported.
- VM migration with SR-IOV attached instances is not supported.
- Security Groups can not currently be used with SR-IOV enabled ports.