Red Hat Training

A Red Hat training course is available for Red Hat Enterprise Linux

16.2. PCI Device Assignment with SR-IOV Devices

A PCI network device (specified in the domain XML by the <source> element) can be directly connected to the guest using direct device assignment (sometimes referred to as passthrough). Due to limitations in standard single-port PCI ethernet card driver design, only Single Root I/O Virtualization (SR-IOV) virtual function (VF) devices can be assigned in this manner; to assign a standard single-port PCI or PCIe Ethernet card to a guest, use the traditional <hostdev> device definition.

     <devices>
    <interface type='hostdev'>
      <driver name='vfio'/>
      <source>
        <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
      </source>
      <mac address='52:54:00:6d:90:02'>
      <virtualport type='802.1Qbh'>
        <parameters profileid='finance'/>
      </virtualport>
    </interface>
  </devices>

Figure 16.9. XML example for PCI device assignment

Developed by the PCI-SIG (PCI Special Interest Group), the Single Root I/O Virtualization (SR-IOV) specification is a standard for a type of PCI device assignment that can share a single device to multiple virtual machines. SR-IOV improves device performance for virtual machines.
How SR-IOV works

Figure 16.10. How SR-IOV works

SR-IOV enables a Single Root Function (for example, a single Ethernet port), to appear as multiple, separate, physical devices. A physical device with SR-IOV capabilities can be configured to appear in the PCI configuration space as multiple functions. Each device has its own configuration space complete with Base Address Registers (BARs).
SR-IOV uses two PCI functions:
  • Physical Functions (PFs) are full PCIe devices that include the SR-IOV capabilities. Physical Functions are discovered, managed, and configured as normal PCI devices. Physical Functions configure and manage the SR-IOV functionality by assigning Virtual Functions.
  • Virtual Functions (VFs) are simple PCIe functions that only process I/O. Each Virtual Function is derived from a Physical Function. The number of Virtual Functions a device may have is limited by the device hardware. A single Ethernet port, the Physical Device, may map to many Virtual Functions that can be shared to virtual machines.
The hypervisor can assign one or more Virtual Functions to a virtual machine. The Virtual Function's configuration space is then assigned to the configuration space presented to the guest.
Each Virtual Function can only be assigned to a single guest at a time, as Virtual Functions require real hardware resources. A virtual machine can have multiple Virtual Functions. A Virtual Function appears as a network card in the same way as a normal network card would appear to an operating system.
The SR-IOV drivers are implemented in the kernel. The core implementation is contained in the PCI subsystem, but there must also be driver support for both the Physical Function (PF) and Virtual Function (VF) devices. An SR-IOV capable device can allocate VFs from a PF. The VFs appear as PCI devices which are backed on the physical PCI device by resources such as queues and register sets.

16.2.1. Advantages of SR-IOV

SR-IOV devices can share a single physical port with multiple virtual machines.
When an SR-IOV VF is assigned to a virtual machine, it can be configured to (transparently to the virtual machine) place all network traffic leaving the VF onto a particular VLAN. The virtual machine cannot detect that its traffic is being tagged for a VLAN, and will be unable to change or eliminate this tagging.
Virtual Functions have near-native performance and provide better performance than paravirtualized drivers and emulated access. Virtual Functions provide data protection between virtual machines on the same physical server as the data is managed and controlled by the hardware.
These features allow for increased virtual machine density on hosts within a data center.
SR-IOV is better able to utilize the bandwidth of devices with multiple guests.

16.2.2. Using SR-IOV

This section covers the use of PCI passthrough to assign a Virtual Function of an SR-IOV capable multiport network card to a virtual machine as a network device.
SR-IOV Virtual Functions (VFs) can be assigned to virtual machines by adding a device entry in <hostdev> with the virsh edit or virsh attach-device command. However, this can be problematic because unlike a regular network device, an SR-IOV VF network device does not have a permanent unique MAC address, and is assigned a new MAC address each time the host is rebooted. Because of this, even if the guest is assigned the same VF after a reboot, when the host is rebooted the guest determines its network adapter to have a new MAC address. As a result, the guest believes there is new hardware connected each time, and will usually require re-configuration of the guest's network settings.
libvirt contains the <interface type='hostdev'> interface device. Using this interface device, libvirt will first perform any network-specific hardware/switch initialization indicated (such as setting the MAC address, VLAN tag, or 802.1Qbh virtualport parameters), then perform the PCI device assignment to the guest.
Using the <interface type='hostdev'> interface device requires:
  • an SR-IOV-capable network card,
  • host hardware that supports either the Intel VT-d or the AMD IOMMU extensions
  • the PCI address of the VF to be assigned.

Important

Assignment of an SR-IOV device to a virtual machine requires that the host hardware supports the Intel VT-d or the AMD IOMMU specification.
To attach an SR-IOV network device on an Intel or an AMD system, follow this procedure:

Procedure 16.8. Attach an SR-IOV network device on an Intel or AMD system

  1. Enable Intel VT-d or the AMD IOMMU specifications in the BIOS and kernel

    On an Intel system, enable Intel VT-d in the BIOS if it is not enabled already. See Procedure 16.1, “Preparing an Intel system for PCI device assignment” for procedural help on enabling Intel VT-d in the BIOS and kernel.
    Skip this step if Intel VT-d is already enabled and working.
    On an AMD system, enable the AMD IOMMU specifications in the BIOS if they are not enabled already. See Procedure 16.2, “Preparing an AMD system for PCI device assignment” for procedural help on enabling IOMMU in the BIOS.
  2. Verify support

    Verify if the PCI device with SR-IOV capabilities is detected. This example lists an Intel 82576 network interface card which supports SR-IOV. Use the lspci command to verify whether the device was detected.
    # lspci
    03:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
    03:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
    Note that the output has been modified to remove all other devices.
  3. Activate Virtual Functions

    Run the following command:
    # echo ${num_vfs} > /sys/class/net/enp14s0f0/device/sriov_numvfs
  4. Make the Virtual Functions persistent

    To make the Virtual Functions persistent across reboots, use the editor of your choice to create an udev rule similar to the following, where you specify the intended number of VFs (in this example, 2), up to the limit supported by the network interface card. In the following example, replace enp14s0f0 with the PF network device name(s) and adjust the value of ENV{ID_NET_DRIVER} to match the driver in use:
    # vim /etc/udev/rules.d/enp14s0f0.rules
    ACTION=="add", SUBSYSTEM=="net", ENV{ID_NET_DRIVER}=="ixgbe",
    ATTR{device/sriov_numvfs}="2"
    
    This will ensure the feature is enabled at boot-time.
  5. Inspect the new Virtual Functions

    Using the lspci command, list the newly added Virtual Functions attached to the Intel 82576 network device. (Alternatively, use grep to search for Virtual Function, to search for devices that support Virtual Functions.)
    # lspci | grep 82576
    0b:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
    0b:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
    0b:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
    0b:10.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
    0b:10.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
    0b:10.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
    0b:10.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
    0b:10.5 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
    0b:10.6 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
    0b:10.7 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
    0b:11.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
    0b:11.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
    0b:11.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
    0b:11.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
    0b:11.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
    0b:11.5 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
    The identifier for the PCI device is found with the -n parameter of the lspci command. The Physical Functions correspond to 0b:00.0 and 0b:00.1. All Virtual Functions have Virtual Function in the description.
  6. Verify devices exist with virsh

    The libvirt service must recognize the device before adding a device to a virtual machine. libvirt uses a similar notation to the lspci output. All punctuation characters, : and ., in lspci output are changed to underscores (_).
    Use the virsh nodedev-list command and the grep command to filter the Intel 82576 network device from the list of available host devices. 0b is the filter for the Intel 82576 network devices in this example. This may vary for your system and may result in additional devices.
    # virsh nodedev-list | grep 0b
    pci_0000_0b_00_0
    pci_0000_0b_00_1
    pci_0000_0b_10_0
    pci_0000_0b_10_1
    pci_0000_0b_10_2
    pci_0000_0b_10_3
    pci_0000_0b_10_4
    pci_0000_0b_10_5
    pci_0000_0b_10_6
    pci_0000_0b_11_7
    pci_0000_0b_11_1
    pci_0000_0b_11_2
    pci_0000_0b_11_3
    pci_0000_0b_11_4
    pci_0000_0b_11_5
    The PCI addresses for the Virtual Functions and Physical Functions should be in the list.
  7. Get device details with virsh

    The pci_0000_0b_00_0 is one of the Physical Functions and pci_0000_0b_10_0 is the first corresponding Virtual Function for that Physical Function. Use the virsh nodedev-dumpxml command to get device details for both devices.
    # virsh nodedev-dumpxml pci_0000_03_00_0
    <device>
      <name>pci_0000_03_00_0</name>
      <path>/sys/devices/pci0000:00/0000:00:01.0/0000:03:00.0</path>
      <parent>pci_0000_00_01_0</parent>
      <driver>
        <name>igb</name>
      </driver>
      <capability type='pci'>
        <domain>0</domain>
        <bus>3</bus>
        <slot>0</slot>
        <function>0</function>
        <product id='0x10c9'>82576 Gigabit Network Connection</product>
        <vendor id='0x8086'>Intel Corporation</vendor>
        <capability type='virt_functions'>
          <address domain='0x0000' bus='0x03' slot='0x10' function='0x0'/>
          <address domain='0x0000' bus='0x03' slot='0x10' function='0x2'/>
          <address domain='0x0000' bus='0x03' slot='0x10' function='0x4'/>
          <address domain='0x0000' bus='0x03' slot='0x10' function='0x6'/>
          <address domain='0x0000' bus='0x03' slot='0x11' function='0x0'/>
          <address domain='0x0000' bus='0x03' slot='0x11' function='0x2'/>
          <address domain='0x0000' bus='0x03' slot='0x11' function='0x4'/>
        </capability>
        <iommuGroup number='14'>
          <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
          <address domain='0x0000' bus='0x03' slot='0x00' function='0x1'/>
        </iommuGroup>
      </capability>
    </device>
    # virsh nodedev-dumpxml pci_0000_03_11_5
    <device>
      <name>pci_0000_03_11_5</name>
      <path>/sys/devices/pci0000:00/0000:00:01.0/0000:03:11.5</path>
      <parent>pci_0000_00_01_0</parent>
      <driver>
        <name>igbvf</name>
      </driver>
      <capability type='pci'>
        <domain>0</domain>
        <bus>3</bus>
        <slot>17</slot>
        <function>5</function>
        <product id='0x10ca'>82576 Virtual Function</product>
        <vendor id='0x8086'>Intel Corporation</vendor>
        <capability type='phys_function'>
          <address domain='0x0000' bus='0x03' slot='0x00' function='0x1'/>
        </capability>
        <iommuGroup number='35'>
          <address domain='0x0000' bus='0x03' slot='0x11' function='0x5'/>
        </iommuGroup>
      </capability>
    </device>
    This example adds the Virtual Function pci_0000_03_10_2 to the virtual machine in Step 8. Note the bus, slot and function parameters of the Virtual Function: these are required for adding the device.
    Copy these parameters into a temporary XML file, such as /tmp/new-interface.xml for example.
       <interface type='hostdev' managed='yes'>
         <source>
           <address type='pci' domain='0x0000' bus='0x03' slot='0x10' function='0x2'/>
         </source>
       </interface>

    Note

    When the virtual machine starts, it should see a network device of the type provided by the physical adapter, with the configured MAC address. This MAC address will remain unchanged across host and guest reboots.
    The following <interface> example shows the syntax for the optional <mac address>, <virtualport>, and <vlan> elements. In practice, use either the <vlan> or <virtualport> element, not both simultaneously as shown in the example:
    ...
     <devices>
       ...
       <interface type='hostdev' managed='yes'>
         <source>
           <address type='pci' domain='0' bus='11' slot='16' function='0'/>
         </source>
         <mac address='52:54:00:6d:90:02'>
         <vlan>
            <tag id='42'/>
         </vlan>
         <virtualport type='802.1Qbh'>
           <parameters profileid='finance'/>
         </virtualport>
       </interface>
       ...
     </devices>
    If you do not specify a MAC address, one will be automatically generated. The <virtualport> element is only used when connecting to an 802.11Qbh hardware switch. The <vlan> element will transparently put the guest's device on the VLAN tagged 42.
  8. Add the Virtual Function to the virtual machine

    Add the Virtual Function to the virtual machine using the following command with the temporary file created in the previous step. This attaches the new device immediately and saves it for subsequent guest restarts.
    virsh attach-device MyGuest /tmp/new-interface.xml --live --config
    
    Specifying the --live option with virsh attach-device attaches the new device to the running guest. Using the --config option ensures the new device is available after future guest restarts.

    Note

    The --live option is only accepted when the guest is running. virsh will return an error if the --live option is used on a non-running guest.
The virtual machine detects a new network interface card. This new card is the Virtual Function of the SR-IOV device.

16.2.3. Configuring PCI Assignment with SR-IOV Devices

SR-IOV network cards provide multiple VFs that can each be individually assigned to a guest virtual machines using PCI device assignment. Once assigned, each behaves as a full physical network device. This permits many guest virtual machines to gain the performance advantage of direct PCI device assignment, while only using a single slot on the host physical machine.
These VFs can be assigned to guest virtual machines in the traditional manner using the <hostdev> element. However, SR-IOV VF network devices do not have permanent unique MAC addresses, which causes problems where the guest virtual machine's network settings need to be re-configured each time the host physical machine is rebooted. To fix this, you need to set the MAC address prior to assigning the VF to the host physical machine after every boot of the guest virtual machine. In order to assign this MAC address, as well as other options, see the following procedure:

Procedure 16.9. Configuring MAC addresses, vLAN, and virtual ports for assigning PCI devices on SR-IOV

The <hostdev> element cannot be used for function-specific items like MAC address assignment, vLAN tag ID assignment, or virtual port assignment, because the <mac>, <vlan>, and <virtualport> elements are not valid children for <hostdev>. Instead, these elements can be used with the hostdev interface type: <interface type='hostdev'>. This device type behaves as a hybrid of an <interface> and <hostdev>. Thus, before assigning the PCI device to the guest virtual machine, libvirt initializes the network-specific hardware/switch that is indicated (such as setting the MAC address, setting a vLAN tag, or associating with an 802.1Qbh switch) in the guest virtual machine's XML configuration file. For information on setting the vLAN tag, see Section 17.16, “Setting vLAN Tags”.
  1. Gather information

    In order to use <interface type='hostdev'>, you must have an SR-IOV-capable network card, host physical machine hardware that supports either the Intel VT-d or AMD IOMMU extensions, and you must know the PCI address of the VF that you wish to assign.
  2. Shut down the guest virtual machine

    Using virsh shutdown command, shut down the guest virtual machine (here named guestVM).
    # virsh shutdown guestVM
  3. Open the XML file for editing

    # virsh edit guestVM.xml
    Optional: For the XML configuration file that was created by the virsh save command, run:
    # virsh save-image-edit guestVM.xml --running 
    The configuration file, in this example guestVM.xml, opens in your default editor. For more information, see Section 20.7.5, “Editing the Guest Virtual Machine Configuration”
  4. Edit the XML file

    Update the configuration file (guestVM.xml) to have a <devices> entry similar to the following:
    
     <devices>
       ...
       <interface type='hostdev' managed='yes'>
         <source>
           <address type='pci' domain='0x0' bus='0x00' slot='0x07' function='0x0'/> <!--these values can be decimal as well-->
         </source>
         <mac address='52:54:00:6d:90:02'/>                                         <!--sets the mac address-->
         <virtualport type='802.1Qbh'>                                              <!--sets the virtual port for the 802.1Qbh switch-->
           <parameters profileid='finance'/>
         </virtualport>
         <vlan>                                                                     <!--sets the vlan tag-->
          <tag id='42'/>
         </vlan>
       </interface>
       ...
     </devices>
    
    

    Figure 16.11. Sample domain XML for hostdev interface type

    Note

    If you do not provide a MAC address, one will be automatically generated, just as with any other type of interface device. In addition, the <virtualport> element is only used if you are connecting to an 802.11Qgh hardware switch. 802.11Qbg (also known as "VEPA") switches are currently not supported.
  5. Restart the guest virtual machine

    Run the virsh start command to restart the guest virtual machine you shut down in step 2. See Section 20.6, “Starting, Resuming, and Restoring a Virtual Machine” for more information.
     # virsh start guestVM 
    When the guest virtual machine starts, it sees the network device provided to it by the physical host machine's adapter, with the configured MAC address. This MAC address remains unchanged across guest virtual machine and host physical machine reboots.

16.2.4. Setting PCI device assignment from a pool of SR-IOV virtual functions

Hard coding the PCI addresses of particular Virtual Functions (VFs) into a guest's configuration has two serious limitations:
  • The specified VF must be available any time the guest virtual machine is started. Therefore, the administrator must permanently assign each VF to a single guest virtual machine (or modify the configuration file for every guest virtual machine to specify a currently unused VF's PCI address each time every guest virtual machine is started).
  • If the guest virtual machine is moved to another host physical machine, that host physical machine must have exactly the same hardware in the same location on the PCI bus (or the guest virtual machine configuration must be modified prior to start).
It is possible to avoid both of these problems by creating a libvirt network with a device pool containing all the VFs of an SR-IOV device. Once that is done, configure the guest virtual machine to reference this network. Each time the guest is started, a single VF will be allocated from the pool and assigned to the guest virtual machine. When the guest virtual machine is stopped, the VF will be returned to the pool for use by another guest virtual machine.

Procedure 16.10. Creating a device pool

  1. Shut down the guest virtual machine

    Using virsh shutdown command, shut down the guest virtual machine, here named guestVM.
    # virsh shutdown guestVM
  2. Create a configuration file

    Using your editor of choice, create an XML file (named passthrough.xml, for example) in the /tmp directory. Make sure to replace pf dev='eth3' with the netdev name of your own SR-IOV device's Physical Function (PF).
    The following is an example network definition that will make available a pool of all VFs for the SR-IOV adapter with its PF at "eth3' on the host physical machine:
          
    <network>
       <name>passthrough</name> <!-- This is the name of the file you created -->
       <forward mode='hostdev' managed='yes'>
         <pf dev='myNetDevName'/>  <!-- Use the netdev name of your SR-IOV devices PF here -->
       </forward>
    </network>
          
    
    

    Figure 16.12. Sample network definition domain XML

  3. Load the new XML file

    Enter the following command, replacing /tmp/passthrough.xml with the name and location of your XML file you created in the previous step:
    # virsh net-define /tmp/passthrough.xml
  4. Restarting the guest

    Run the following, replacing passthrough.xml with the name of your XML file you created in the previous step:
     # virsh net-autostart passthrough # virsh net-start passthrough 
  5. Re-start the guest virtual machine

    Run the virsh start command to restart the guest virtual machine you shutdown in the first step (example uses guestVM as the guest virtual machine's domain name). See Section 20.6, “Starting, Resuming, and Restoring a Virtual Machine” for more information.
     # virsh start guestVM 
  6. Initiating passthrough for devices

    Although only a single device is shown, libvirt will automatically derive the list of all VFs associated with that PF the first time a guest virtual machine is started with an interface definition in its domain XML like the following:
             
    <interface type='network'>
       <source network='passthrough'>
    </interface>
          
    
    

    Figure 16.13. Sample domain XML for interface network definition

  7. Verification

    You can verify this by running virsh net-dumpxml passthrough command after starting the first guest that uses the network; you will get output similar to the following:
          
    <network connections='1'>
       <name>passthrough</name>
       <uuid>a6b49429-d353-d7ad-3185-4451cc786437</uuid>
       <forward mode='hostdev' managed='yes'>
         <pf dev='eth3'/>
         <address type='pci' domain='0x0000' bus='0x02' slot='0x10' function='0x1'/>
         <address type='pci' domain='0x0000' bus='0x02' slot='0x10' function='0x3'/>
         <address type='pci' domain='0x0000' bus='0x02' slot='0x10' function='0x5'/>
         <address type='pci' domain='0x0000' bus='0x02' slot='0x10' function='0x7'/>
         <address type='pci' domain='0x0000' bus='0x02' slot='0x11' function='0x1'/>
         <address type='pci' domain='0x0000' bus='0x02' slot='0x11' function='0x3'/>
         <address type='pci' domain='0x0000' bus='0x02' slot='0x11' function='0x5'/>
       </forward>
    </network>
          
    
    

    Figure 16.14. XML dump file passthrough contents

16.2.5. SR-IOV Restrictions

SR-IOV is only thoroughly tested with the following devices:
  • Intel® 82576NS Gigabit Ethernet Controller (igb driver)
  • Intel® 82576EB Gigabit Ethernet Controller (igb driver)
  • Intel® 82599ES 10 Gigabit Ethernet Controller (ixgbe driver)
  • Intel® 82599EB 10 Gigabit Ethernet Controller (ixgbe driver)
Other SR-IOV devices may work but have not been tested at the time of release