Chapter 12. Managing NVIDIA vGPU devices

The vGPU feature makes it possible to divide a physical NVIDIA GPU device into multiple virtual devices referred to as mediated devices. These mediated devices can then be assigned to multiple virtual machines (VMs) as virtual GPUs. As a result, these VMs share the performance of a single physical GPU.

Note, however, that assigning a physical GPU to VMs, with or without using mediated devices, makes it impossible for the host to use the GPU.

12.1. Setting up NVIDIA vGPU devices

To set up the NVIDIA vGPU feature, you need to obtain NVIDIA vGPU drivers for your GPU device, create mediated devices, and assign them to the intended virtual machines. For detailed instructions, see below.

Prerequisites

  • Creating mediated vGPU devices is only possible on a limited set of NVIDIA GPUs. For an up-to-date list of these devices, see the NVIDIA GPU Software Documentation.

    If you do not know which GPU your host is using, install the lshw package and use the lshw -C display command. The following example shows the system is using an NVIDIA Tesla P4 GPU, compatible with vGPU.

    # lshw -C display
    
    *-display
           description: 3D controller
           product: GP104GL [Tesla P4]
           vendor: NVIDIA Corporation
           physical id: 0
           bus info: pci@0000:01:00.0
           version: a1
           width: 64 bits
           clock: 33MHz
           capabilities: pm msi pciexpress cap_list
           configuration: driver=vfio-pci latency=0
           resources: irq:16 memory:f6000000-f6ffffff memory:e0000000-efffffff memory:f0000000-f1ffffff

Procedure

  1. Obtain the NVIDIA vGPU drivers and install them on your system. For instructions, see the NVIDIA documentation.
  2. If the NVIDIA software installer did not create the /etc/modprobe.d/nvidia-installer-disable-nouveau.conf file, create a conf file of any name in the /etc/modprobe.d/. Then, add the following lines in the file:

    blacklist nouveau
    options nouveau modeset=0
  3. Regenerate the initial ramdisk for the current kernel, then reboot.

    # dracut --force
    # reboot

    If you need to use a prior supported kernel version with mediated devices, regenerate the initial ramdisk for all installed kernel versions.

    # dracut --regenerate-all --force
    # reboot
  4. Check that the nvidia_vgpu_vfio module has been loaded by the kernel and that the nvidia-vgpu-mgr.service service is running.

    # lsmod | grep nvidia_vgpu_vfio
    nvidia_vgpu_vfio 45011 0
    nvidia 14333621 10 nvidia_vgpu_vfio
    mdev 20414 2 vfio_mdev,nvidia_vgpu_vfio
    vfio 32695 3 vfio_mdev,nvidia_vgpu_vfio,vfio_iommu_type1
    # systemctl status nvidia-vgpu-mgr.service
    nvidia-vgpu-mgr.service - NVIDIA vGPU Manager Daemon
       Loaded: loaded (/usr/lib/systemd/system/nvidia-vgpu-mgr.service; enabled; vendor preset: disabled)
       Active: active (running) since Fri 2018-03-16 10:17:36 CET; 5h 8min ago
     Main PID: 1553 (nvidia-vgpu-mgr)
     [...]
  5. Write a device UUID to the /sys/class/mdev_bus/pci_dev/mdev_supported_types/type-id/create file, where pci_dev is the PCI address of the host GPU, and type-id is an ID of the host GPU type.

    The following example shows how to create a mediated device of the nvidia-63 vGPU type on an NVIDIA Tesla P4 card:

    # uuidgen
    30820a6f-b1a5-4503-91ca-0c10ba58692a
    # echo "30820a6f-b1a5-4503-91ca-0c10ba58692a" > /sys/class/mdev_bus/0000:01:00.0/mdev_supported_types/nvidia-63/create
    Note

    For type-id values for specific GPU devices, see the Virtual GPU software documentation. Note that only Q-series NVIDIA vGPUs, such as GRID P4-2Q, are supported as mediated device GPU types on Linux VMs.

  6. Add the following lines to the <devices/> sections in the XML configurations of guests that you want to share the vGPU resources. Use the UUID value generated by the uuidgen command in the previous step. Each UUID can only be assigned to one guest at a time.

    <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci'>
      <source>
        <address uuid='30820a6f-b1a5-4503-91ca-0c10ba58692a'/>
      </source>
    </hostdev>

Additional resources

  • For the vGPU mediated devices to work properly on the assigned VMs, NVIDIA vGPU guest software licensing needs to be set up for the VMs. For further information and instructions, see the NVIDIA virtual GPU software documentation.

12.2. Removing NVIDIA vGPU devices

To change the configuration of assigned vGPU mediated devices, the existing devices have to be removed from the assigned VMs. For instructions, see below:

Procedure

  • To remove a mediated vGPU device, use the following command when the device is inactive, and replace uuid with the UUID of the device, for example 30820a6f-b1a5-4503-91ca-0c10ba58692a:

    # echo 1 > /sys/bus/mdev/devices/uuid/remove

    Note that attempting to remove a vGPU device that is currently in use by a VM triggers the following error:

    echo: write error: Device or resource busy

12.3. Obtaining NVIDIA vGPU information about your system

To evaluate the capabilities of the vGPU features available to you, you can obtain additional information about mediated devices on your system, such as how many mediated devices of a given type can be created.

Procedure

  • Use the virsh nodedev-list --cap mdev_types and virsh nodedev-dumpxml commands.

    For example, the following output shows available vGPU types if you are using a physical Tesla P4 card:

    $ virsh nodedev-list --cap mdev_types
    pci_0000_01_00_0
    $ virsh nodedev-dumpxml pci_0000_01_00_0
    <...>
      <capability type='mdev_types'>
        <type id='nvidia-70'>
          <name>GRID P4-8A</name>
          <deviceAPI>vfio-pci</deviceAPI>
          <availableInstances>1</availableInstances>
        </type>
        <type id='nvidia-69'>
          <name>GRID P4-4A</name>
          <deviceAPI>vfio-pci</deviceAPI>
          <availableInstances>2</availableInstances>
        </type>
        <type id='nvidia-67'>
          <name>GRID P4-1A</name>
          <deviceAPI>vfio-pci</deviceAPI>
          <availableInstances>8</availableInstances>
        </type>
        <type id='nvidia-65'>
          <name>GRID P4-4Q</name>
          <deviceAPI>vfio-pci</deviceAPI>
          <availableInstances>2</availableInstances>
        </type>
        <type id='nvidia-63'>
          <name>GRID P4-1Q</name>
          <deviceAPI>vfio-pci</deviceAPI>
          <availableInstances>8</availableInstances>
        </type>
        <type id='nvidia-71'>
          <name>GRID P4-1B</name>
          <deviceAPI>vfio-pci</deviceAPI>
          <availableInstances>8</availableInstances>
        </type>
        <type id='nvidia-68'>
          <name>GRID P4-2A</name>
          <deviceAPI>vfio-pci</deviceAPI>
          <availableInstances>4</availableInstances>
        </type>
        <type id='nvidia-66'>
          <name>GRID P4-8Q</name>
          <deviceAPI>vfio-pci</deviceAPI>
          <availableInstances>1</availableInstances>
        </type>
        <type id='nvidia-64'>
          <name>GRID P4-2Q</name>
          <deviceAPI>vfio-pci</deviceAPI>
          <availableInstances>4</availableInstances>
        </type>
      </capability>
    </...>

12.4. Remote desktop streaming services for NVIDIA vGPU

The following remote desktop streaming services have been successfully tested for use with the NVIDIA vGPU feature in RHEL 8 hosts:

  • HP-RGS - Note that it is currently not possible to use HP-RGS with RHEL 8 VMs.
  • Mechdyne TGX - Note that it is currently not possible to use Mechdyne TGX with Windows Server 2016 VMs.
  • NICE DCV - When using this streaming service, Red Hat recommends using fixed resolution settings, as using dynamic resolution in some cases results in a black screen. In addition, it is currently not possible to use NICE DCV with RHEL 8 VMs.