Chapter 5. Configuring PCI passthrough

You can use PCI passthrough to attach a physical PCI device, such as a graphics card or a network device, to an instance. If you use PCI passthrough for a device, the instance reserves exclusive access to the device for performing tasks, and the device is not available to the host.

Important

Using PCI passthrough with routed provider networks

The Compute service does not support single networks that span multiple provider networks. When a network contains multiple physical networks, the Compute service only uses the first physical network. Therefore, if you are using routed provider networks you must use the same physical_network name across all the Compute nodes.

If you use routed provider networks with VLAN or flat networks, you must use the same physical_network name for all segments. You then create multiple segments for the network and map the segments to the appropriate subnets.

To enable your cloud users to create instances with PCI devices attached, you must complete the following:

  1. Designate Compute nodes for PCI passthrough.
  2. Configure the Compute nodes for PCI passthrough that have the required PCI devices.
  3. Deploy the overcloud.
  4. Create a flavor for launching instances with PCI devices attached.

Prerequisites

  • The Compute nodes have the required PCI devices.

5.1. Designating Compute nodes for PCI passthrough

To designate Compute nodes for instances with physical PCI devices attached, you must create a new role file to configure the PCI passthrough role, and configure a new overcloud flavor and PCI passthrough resource class to use to tag the Compute nodes for PCI passthrough.

Procedure

  1. Log in to the undercloud as the stack user.
  2. Source the stackrc file:

    [stack@director ~]$ source ~/stackrc
  3. Generate a new roles data file named roles_data_pci_passthrough.yaml that includes the Controller, Compute, and ComputePCI roles:

    (undercloud)$ openstack overcloud roles \
     generate -o /home/stack/templates/roles_data_pci_passthrough.yaml \
     Compute:ComputePCI Compute Controller
  4. Open roles_data_pci_passthrough.yaml and edit or add the following parameters and sections:

    Section/ParameterCurrent valueNew value

    Role comment

    Role: Compute

    Role: ComputePCI

    Role name

    name: Compute

    name: ComputePCI

    description

    Basic Compute Node role

    PCI Passthrough Compute Node role

    HostnameFormatDefault

    %stackname%-novacompute-%index%

    %stackname%-novacomputepci-%index%

    deprecated_nic_config_name

    compute.yaml

    compute-pci-passthrough.yaml

  5. Register the PCI passthrough Compute nodes for the overcloud by adding them to your node definition template, node.json or node.yaml. For more information, see Registering nodes for the overcloud in the Director Installation and Usage guide.
  6. Inspect the node hardware:

    (undercloud)$ openstack overcloud node introspect \
     --all-manageable --provide

    For more information, see Creating an inventory of the bare-metal node hardware in the Director Installation and Usage guide.

  7. Create the compute-pci-passthrough overcloud flavor for PCI passthrough Compute nodes:

    (undercloud)$ openstack flavor create --id auto \
     --ram <ram_size_mb> --disk <disk_size_gb> \
     --vcpus <no_vcpus> compute-pci-passthrough
    • Replace <ram_size_mb> with the RAM of the bare metal node, in MB.
    • Replace <disk_size_gb> with the size of the disk on the bare metal node, in GB.
    • Replace <no_vcpus> with the number of CPUs on the bare metal node.

      Note

      These properties are not used for scheduling instances. However, the Compute scheduler does use the disk size to determine the root partition size.

  8. Tag each bare metal node that you want to designate for PCI passthrough with a custom PCI passthrough resource class:

    (undercloud)$ openstack baremetal node set \
     --resource-class baremetal.PCI-PASSTHROUGH <node>

    Replace <node> with the ID of the bare metal node.

  9. Associate the compute-pci-passthrough flavor with the custom PCI passthrough resource class:

    (undercloud)$ openstack flavor set \
     --property resources:CUSTOM_BAREMETAL_PCI_PASSTHROUGH=1 \
      compute-pci-passthrough

    To determine the name of a custom resource class that corresponds to a resource class of a Bare Metal service node, convert the resource class to uppercase, replace all punctuation with an underscore, and prefix with CUSTOM_.

    Note

    A flavor can request only one instance of a bare metal resource class.

  10. Set the following flavor properties to prevent the Compute scheduler from using the bare metal flavor properties to schedule instances:

    (undercloud)$ openstack flavor set \
     --property resources:VCPU=0 --property resources:MEMORY_MB=0 \
     --property resources:DISK_GB=0 compute-pci-passthrough
  11. Add the following parameters to the node-info.yaml file to specify the number of PCI passthrough Compute nodes, and the flavor to use for the PCI passthrough designated Compute nodes:

    parameter_defaults:
      OvercloudComputePCIFlavor: compute-pci-passthrough
      ComputePCICount: 3
  12. To verify that the role was created, enter the following command:

    (undercloud)$ openstack overcloud profiles list

5.2. Configuring a PCI passthrough Compute node

To enable your cloud users to create instances with PCI devices attached, you must configure both the Compute nodes that have the PCI devices and the Controller nodes.

Procedure

  1. Create an environment file to configure the Controller node on the overcloud for PCI passthrough, for example, pci_passthrough_controller.yaml.
  2. Add PciPassthroughFilter to the NovaSchedulerDefaultFilters parameter in pci_passthrough_controller.yaml:

    parameter_defaults:
      NovaSchedulerDefaultFilters: ['AvailabilityZoneFilter','ComputeFilter','ComputeCapabilitiesFilter','ImagePropertiesFilter','ServerGroupAntiAffinityFilter','ServerGroupAffinityFilter','PciPassthroughFilter','NUMATopologyFilter']
  3. To specify the PCI alias for the devices on the Controller node, add the following configuration to pci_passthrough_controller.yaml:

    parameter_defaults:
      ...
      ControllerExtraConfig:
        nova::pci::aliases:
          - name: "a1"
            product_id: "1572"
            vendor_id: "8086"
            device_type: "type-PF"

    For more information about configuring the device_type field, see PCI passthrough device type field.

    Note

    If the nova-api service is running in a role different from the Controller role, replace ControllerExtraConfig with the user role in the format <Role>ExtraConfig.

  4. Optional: To set a default NUMA affinity policy for PCI passthrough devices, add numa_policy to the nova::pci::aliases: configuration from step 3:

    parameter_defaults:
      ...
      ControllerExtraConfig:
        nova::pci::aliases:
          - name: "a1"
            product_id: "1572"
            vendor_id: "8086"
            device_type: "type-PF"
            numa_policy: "preferred"
  5. To configure the Compute node on the overcloud for PCI passthrough, create an environment file, for example, pci_passthrough_compute.yaml.
  6. To specify the available PCIs for the devices on the Compute node, use the vendor_id and product_id options to add all matching PCI devices to the pool of PCI devices available for passthrough to instances. For example, to add all Intel® Ethernet Controller X710 devices to the pool of PCI devices available for passthrough to instances, add the following configuration to pci_passthrough_compute.yaml:

    parameter_defaults:
      ...
      ComputePCIParameters:
        NovaPCIPassthrough:
          - vendor_id: "8086"
            product_id: "1572"

    For more information about how to configure NovaPCIPassthrough, see Guidelines for configuring NovaPCIPassthrough.

  7. You must create a copy of the PCI alias on the Compute node for instance migration and resize operations. To specify the PCI alias for the devices on the PCI passthrough Compute node, add the following to pci_passthrough_compute.yaml:

    parameter_defaults:
      ...
      ComputePCIExtraConfig:
        nova::pci::aliases:
          - name: "a1"
            product_id: "1572"
            vendor_id: "8086"
            device_type: "type-PF"
    Note

    The Compute node aliases must be identical to the aliases on the Controller node. Therefore, if you added numa_affinity to nova::pci::aliases in pci_passthrough_controller.yaml, then you must also add it to nova::pci::aliases in pci_passthrough_compute.yaml.

  8. To enable IOMMU in the server BIOS of the Compute nodes to support PCI passthrough, add the KernelArgs parameter to pci_passthrough_compute.yaml. For example, use the following KernalArgs settings to enable an Intel IOMMU:

    parameter_defaults:
      ...
      ComputePCIParameters:
        KernelArgs: "intel_iommu=on iommu=pt"

    To enable an AMD IOMMU, set KernelArgs to "amd_iommu=on iommu=pt".

    Note

    When you first add the KernelArgs parameter to the configuration of a role, the overcloud nodes are automatically rebooted. If required, you can disable the automatic rebooting of nodes and instead perform node reboots manually after each overcloud deployment. For more information, see Configuring manual node reboot to define KernelArgs.

  9. Add your custom environment files to the stack with your other environment files and deploy the overcloud:

    (undercloud)$ openstack overcloud deploy --templates \
      -e [your environment files] \
      -e /home/stack/templates/pci_passthrough_controller.yaml \
      -e /home/stack/templates/pci_passthrough_compute.yaml \
  10. Create and configure the flavors that your cloud users can use to request the PCI devices. The following example requests two devices, each with a vendor ID of 8086 and a product ID of 1572, using the alias defined in step 7:

    (overcloud)# openstack flavor set \
     --property "pci_passthrough:alias"="a1:2" device_passthrough
  11. Optional: To override the default NUMA affinity policy for PCI passthrough devices, you can add the NUMA affinity policy property key to the flavor or the image:

    • To override the default NUMA affinity policy by using the flavor, add the hw:pci_numa_affinity_policy property key:

      (overcloud)# openstack flavor set \
       --property "hw:pci_numa_affinity_policy"="required" \
       device_passthrough

      For more information about the valid values for hw:pci_numa_affinity_policy, see Flavor metadata.

    • To override the default NUMA affinity policy by using the image, add the hw_pci_numa_affinity_policy property key:

      (overcloud)# openstack image set \
       --property hw_pci_numa_affinity_policy=required \
       device_passthrough_image
      Note

      If you set the NUMA affinity policy on both the image and the flavor then the property values must match. The flavor setting takes precedence over the image and default settings. Therefore, the configuration of the NUMA affinity policy on the image only takes effect if the property is not set on the flavor.

Verification

  1. Create an instance with a PCI passthrough device:

    # openstack server create --flavor device_passthrough \
     --image <image> --wait test-pci
  2. Log in to the instance as a cloud user. For more information, see Connecting to an instance.
  3. To verify that the PCI device is accessible from the instance, enter the following command from the instance:

    $ lspci -nn | grep <device_name>

5.3. PCI passthrough device type field

The Compute service categorizes PCI devices into one of three types, depending on the capabilities the devices report. The following lists the valid values that you can set the device_type field to:

type-PF
The device supports SR-IOV and is the parent or root device. Specify this device type to passthrough a device that supports SR-IOV in its entirety.
type-VF
The device is a child device of a device that supports SR-IOV.
type-PCI
The device does not support SR-IOV. This is the default device type if the device_type field is not set.
Note

You must configure the Compute and Controller nodes with the same device_type.

5.4. Guidelines for configuring NovaPCIPassthrough

  • Do not use the devname parameter when configuring PCI passthrough, as the device name of a NIC can change. Instead, use vendor_id and product_id because they are more stable, or use the address of the NIC.
  • To pass through a specific Physical Function (PF), you can use the address parameter because the PCI address is unique to each device. Alternatively, you can use the product_id parameter to pass through a PF, but you must also specify the address of the PF if you have multiple PFs of the same type.
  • To pass through all the Virtual Functions (VFs) specify only the product_id and vendor_id of the VFs that you want to use for PCI passthrough. You must also specify the address of the VF if you are using SRIOV for NIC partitioning and you are running OVS on a VF.
  • To pass through only the VFs for a PF but not the PF itself, you can use the address parameter to specify the PCI address of the PF and product_id to specify the product ID of the VF.

Configuring the address parameter

The address parameter specifies the PCI address of the device. You can set the value of the address parameter using either a String or a dict mapping.

String format

If you specify the address using a string you can include wildcards (*), as shown in the following example:

NovaPCIPassthrough:
  -
     address: "*:0a:00.*"
     physical_network: physnet1
Dictionary format

If you specify the address using the dictionary format you can include regular expression syntax, as shown in the following example:

NovaPCIPassthrough:
  -
     address:
       domain: ".*"
       bus: "02"
       slot: "01"
       function: "[0-2]"
     physical_network: net1
Note

The Compute service restricts the configuration of address fields to the following maximum values:

  • domain - 0xFFFF
  • bus - 0xFF
  • slot - 0x1F
  • function - 0x7

The Compute service supports PCI devices with a 16-bit address domain. The Compute service ignores PCI devices with a 32-bit address domain.