Red Hat Training

A Red Hat training course is available for Red Hat OpenStack Platform

11.4. Enabling PCI passthrough for a GPU device

You can use PCI passthrough to attach a physical PCI device, such as a graphics card, to an instance. If you use PCI passthrough for a device, the instance reserves exclusive access to the device for performing tasks, and the device is not available to the host.

Prerequisites

  • The pciutils package is installed on the physical servers that have the PCI cards.
  • The GPU driver is available to install on the GPU instances. For more information, see 「Building a custom GPU overcloud image」.

Procedure

  1. To determine the vendor ID and product ID for each passthrough device type, run the following command on the physical server that has the PCI cards:

    # lspci -nn | grep -i <gpu_name>

    For example, to determine the vendor and product ID for an NVIDIA GPU, run the following command:

    # lspci -nn | grep -i nvidia
    3b:00.0 3D controller [0302]: NVIDIA Corporation TU104GL [Tesla T4] [10de:1eb8] (rev a1)
    d8:00.0 3D controller [0302]: NVIDIA Corporation TU104GL [Tesla T4] [10de:1db4] (rev a1)
  2. To configure the Controller node on the overcloud for PCI passthrough, create an environment file, for example, pci_passthru_controller.yaml.
  3. Add PciPassthroughFilter to the NovaSchedulerDefaultFilters parameter in pci_passthru_controller.yaml:

    parameter_defaults:
      NovaSchedulerDefaultFilters: ['RetryFilter','AvailabilityZoneFilter','ComputeFilter','ComputeCapabilitiesFilter','ImagePropertiesFilter','ServerGroupAntiAffinityFilter','ServerGroupAffinityFilter','PciPassthroughFilter','NUMATopologyFilter']
  4. To specify the PCI alias for the devices on the Controller node, add the following to pci_passthru_controller.yaml:

    ControllerExtraConfig:
        nova::pci::aliases:
          -  name: "t4"
             product_id: "1eb8"
             vendor_id: "10de"
          -  name: "v100"
             product_id: "1db4"
             vendor_id: "10de"
    注記

    If the nova-api service is running in a role other than the Controller, then replace ControllerExtraConfig with the user role, in the format <Role>ExtraConfig.

  5. To configure the Compute node on the overcloud for PCI passthrough, create an environment file, for example, pci_passthru_compute.yaml.
  6. To specify the available PCIs for the devices on the Compute node, add the following to pci_passthru_compute.yaml:

    parameter_defaults:
      NovaPCIPassthrough:
        - vendor_id: "10de"
          product_id: "1eb8"
  7. To enable IOMMU in the server BIOS of the Compute nodes to support PCI passthrough, add the KernelArgs parameter to pci_passthru_compute.yaml:

       parameter_defaults:
          ...
          ComputeParameters:
            KernelArgs: "intel_iommu=on iommu=pt"
  8. Deploy the overcloud, adding your custom environment files to the stack along with your other environment files:

    (undercloud) $ openstack overcloud deploy --templates \
      -e [your environment files]
      -e /home/stack/templates/pci_passthru_controller.yaml
      -e /home/stack/templates/pci_passthru_compute.yaml
  9. Configure a flavor to request the PCI devices. The following example requests two devices, each with a vendor ID of 10de and a product ID of 13f2:

    # openstack flavor set m1.large --property "pci_passthrough:alias"="t4:2"

Verification

  1. Create an instance with a PCI passthrough device:

    # openstack server create --flavor m1.large --image rhelgpu --wait test-pci
  2. Log in to the instance as a cloud user.
  3. Install the GPU driver on the instance. For example, run the following script to install an NVIDIA driver:

    $ sh NVIDIA-Linux-x86_64-430.24-grid.run
  4. To verify that the GPU is accessible from the instance, enter the following command from the instance:

    $ lspci -nn | grep <gpu_name>
  5. To check the NVIDIA System Management Interface status, run the following command from the instance:

    $ nvidia-smi

    Example output:

    -----------------------------------------------------------------------------
    | NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
    |---------------------------------------------------------------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===========================================================================|
    |   0  Tesla T4            Off  | 00000000:01:00.0 Off |                    0 |
    | N/A   43C    P0    20W /  70W |      0MiB / 15109MiB |      0%      Default |
    ---------------------------------------------------------------------------
    
    -----------------------------------------------------------------------------
    | Processes:                                                       GPU Memory |
    |  GPU       PID   Type   Process name                             Usage      |
    |=============================================================================|
    |  No running processes found                                                 |
    -----------------------------------------------------------------------------