13.6. Enabling PCI passthrough for a GPU device

You can use PCI passthrough to attach a physical PCI device, such as a graphics card, to an instance. If you use PCI passthrough for a device, the instance reserves exclusive access to the device for performing tasks, and the device is not available to the host.

Prerequisites

  • The pciutils package is installed on the physical servers that have the PCI cards.
  • The driver for the GPU device must be installed on the instance that the device is passed through to. Therefore, you need to have created a custom instance image that has the required GPU driver installed. For more information about how to create a custom instance image with the GPU driver installed, see Creating a custom GPU instance image.

Procedure

  1. To determine the vendor ID and product ID for each passthrough device type, enter the following command on the physical server that has the PCI cards:

    # lspci -nn | grep -i <gpu_name>

    For example, to determine the vendor and product ID for an NVIDIA GPU, enter the following command:

    # lspci -nn | grep -i nvidia
    3b:00.0 3D controller [0302]: NVIDIA Corporation TU104GL [Tesla T4] [10de:1eb8] (rev a1)
    d8:00.0 3D controller [0302]: NVIDIA Corporation TU104GL [Tesla T4] [10de:1db4] (rev a1)
  2. To determine if each PCI device has Single Root I/O Virtualization (SR-IOV) capabilities, enter the following command on the physical server that has the PCI cards:

    # lspci -v -s 3b:00.0
    3b:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
              ...
              Capabilities: [bcc] Single Root I/O Virtualization (SR-IOV)
              ...
  3. To configure the Controller node on the overcloud for PCI passthrough, create an environment file, for example, pci_passthru_controller.yaml.
  4. Add PciPassthroughFilter to the NovaSchedulerDefaultFilters parameter in pci_passthru_controller.yaml:

    parameter_defaults:
      NovaSchedulerDefaultFilters: ['AvailabilityZoneFilter','ComputeFilter','ComputeCapabilitiesFilter','ImagePropertiesFilter','ServerGroupAntiAffinityFilter','ServerGroupAffinityFilter','PciPassthroughFilter','NUMATopologyFilter']
  5. To specify the PCI alias for the devices on the Controller node, add the following configuration to pci_passthru_controller.yaml:

    • If the PCI device has SR-IOV capabilities:

      ControllerExtraConfig:
        nova::pci::aliases:
          - name: "t4"
            product_id: "1eb8"
            vendor_id: "10de"
            device_type: "type-PF"
          - name: "v100"
            product_id: "1db4"
            vendor_id: "10de"
            device_type: "type-PF"
    • If the PCI device does not have SR-IOV capabilities:

      ControllerExtraConfig:
        nova::pci::aliases:
          - name: "t4"
            product_id: "1eb8"
            vendor_id: "10de"
          - name: "v100"
            product_id: "1db4"
            vendor_id: "10de"

      For more information on configuring the device_type field, see PCI passthrough device type field.

      注記

      If the nova-api service is running in a role other than the Controller, then replace ControllerExtraConfig with the user role, in the format <Role>ExtraConfig.

  6. To configure the Compute node on the overcloud for PCI passthrough, create an environment file, for example, pci_passthru_compute.yaml.
  7. To specify the available PCIs for the devices on the Compute node, add the following to pci_passthru_compute.yaml:

    parameter_defaults:
      NovaPCIPassthrough:
        - vendor_id: "10de"
          product_id: "1eb8"
  8. You must create a copy of the PCI alias on the Compute node for instance migration and resize operations. To specify the PCI alias for the devices on the Compute node, add the following to pci_passthru_compute.yaml:

    • If the PCI device has SR-IOV capabilities:

      ComputeExtraConfig:
        nova::pci::aliases:
          - name: "t4"
            product_id: "1eb8"
            vendor_id: "10de"
            device_type: "type-PF"
          - name: "v100"
            product_id: "1db4"
            vendor_id: "10de"
            device_type: "type-PF"
    • If the PCI device does not have SR-IOV capabilities:

      ComputeExtraConfig:
        nova::pci::aliases:
          - name: "t4"
            product_id: "1eb8"
            vendor_id: "10de"
          - name: "v100"
            product_id: "1db4"
            vendor_id: "10de"
      注記

      The Compute node aliases must be identical to the aliases on the Controller node.

  9. To enable IOMMU in the server BIOS of the Compute nodes to support PCI passthrough, add the KernelArgs parameter to pci_passthru_compute.yaml:

    parameter_defaults:
      ...
      ComputeParameters:
        KernelArgs: "intel_iommu=on iommu=pt"
  10. Add your custom environment files to the stack with your other environment files and deploy the overcloud:

    (undercloud)$ openstack overcloud deploy --templates \
      -e [your environment files] \
      -e /home/stack/templates/pci_passthru_controller.yaml \
      -e /home/stack/templates/pci_passthru_compute.yaml
  11. Configure a flavor to request the PCI devices. The following example requests two devices, each with a vendor ID of 10de and a product ID of 13f2:

    # openstack flavor set m1.large \
     --property "pci_passthrough:alias"="t4:2"

Verification

  1. Create an instance with a PCI passthrough device:

    # openstack server create --flavor m1.large \
     --image <custom_gpu> --wait test-pci

    Replace <custom_gpu> with the name of your custom instance image that has the required GPU driver installed.

  2. Log in to the instance as a cloud user.
  3. To verify that the GPU is accessible from the instance, enter the following command from the instance:

    $ lspci -nn | grep <gpu_name>
  4. To check the NVIDIA System Management Interface status, enter the following command from the instance:

    $ nvidia-smi

    Example output:

    -----------------------------------------------------------------------------
    | NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
    |---------------------------------------------------------------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===========================================================================|
    |   0  Tesla T4            Off  | 00000000:01:00.0 Off |                    0 |
    | N/A   43C    P0    20W /  70W |      0MiB / 15109MiB |      0%      Default |
    ---------------------------------------------------------------------------
    
    -----------------------------------------------------------------------------
    | Processes:                                                       GPU Memory |
    |  GPU       PID   Type   Process name                             Usage      |
    |=============================================================================|
    |  No running processes found                                                 |
    -----------------------------------------------------------------------------