Chapter 2. Assigning virtual GPUs

To set up NVIDIA vGPU devices, you need to:

  1. Obtain and install the correct NVIDIA vGPU driver for your GPU device
  2. Create mediated devices
  3. Assign each mediated device to a virtual machine
  4. Install guest drivers on each virtual machine.

The following procedures explain this process.

2.1. Setting up NVIDIA vGPU devices on the host

Note

Before installing the NVIDIA vGPU driver on the guest operating system, you need to understand the licensing requirements and obtain the correct license credentials.

Prerequisites

  • Your GPU device supports virtual GPU (vGPU) functionality.
  • Your system is listed as a validated server hardware platform.

For more information about supported GPUs and validated platforms, see NVIDIA vGPU CERTIFIED SERVERS on www.nvidia.com.

Procedure

  1. Download and install the NVIDIA driver for the host. For information on getting the driver, see the Drivers page on the NVIDIA website.
  2. If the NVIDIA software installer did not create the /etc/modprobe.d/nvidia-installer-disable-nouveau.conf file, create it manually.
  3. Open /etc/modprobe.d/nvidia-installer-disable-nouveau.conf file in a text editor and add the following lines to the end of the file:

    blacklist nouveau
    options nouveau modeset=0
  4. Regenerate the initial ramdisk for the current kernel, then reboot:

    # dracut --force
    # reboot

    Alternatively, if you need to use a prior supported kernel version with mediated devices, regenerate the initial ramdisk for all installed kernel versions:

    # dracut --regenerate-all --force
    # reboot
  5. Check that the kernel loaded the nvidia_vgpu_vfio module:

    # lsmod | grep nvidia_vgpu_vfio
  6. Check that the nvidia-vgpu-mgr.service service is running:

    # systemctl status nvidia-vgpu-mgr.service

    For example:

    # lsmod | grep nvidia_vgpu_vfio
    nvidia_vgpu_vfio 45011 0
    nvidia 14333621 10 nvidia_vgpu_vfio
    mdev 20414 2 vfio_mdev,nvidia_vgpu_vfio
    vfio 32695 3 vfio_mdev,nvidia_vgpu_vfio,vfio_iommu_type1
    # systemctl status nvidia-vgpu-mgr.service
    nvidia-vgpu-mgr.service - NVIDIA vGPU Manager Daemon
       Loaded: loaded (/usr/lib/systemd/system/nvidia-vgpu-mgr.service; enabled; vendor preset: disabled)
       Active: active (running) since Fri 2018-03-16 10:17:36 CET; 5h 8min ago
     Main PID: 1553 (nvidia-vgpu-mgr)
     [...]
  7. Get a list of available mdev types by entering the following lines in the terminal, or alternatively, in a script:

    for device in /sys/class/mdev_bus/; do for mdev_type in $device/mdev_supported_types/; do
        MDEV_TYPE=$(basename $mdev_type)
        DESCRIPTION=$(cat $mdev_type/description)
        NAME=$(cat $mdev_type/name)
        echo "mdev_type: $MDEV_TYPE --- description: $DESCRIPTION --- name: $NAME";
      done;
    done | sort | uniq
    Note

    For type-id values for specific GPU devices, see Virtual GPU Types in the Virtual GPU software documentation. Notice that only Q-series NVIDIA vGPUs, such as GRID P4-2Q, are supported as mediated device GPU types on Linux virtual machines.

    The output looks similar to the following:

    mdev_type: nvidia-11 --- description: num_heads=2, frl_config=45, framebuffer=512M, max_resolution=2560x1600, max_instance=16 --- name: GRID M60-0B
    mdev_type: nvidia-12 --- description: num_heads=2, frl_config=60, framebuffer=512M, max_resolution=2560x1600, max_instance=16 --- name: GRID M60-0Q
    ...
    mdev_type: nvidia-22 --- description: num_heads=4, frl_config=60, framebuffer=8192M, max_resolution=4096x2160, max_instance=1 --- name: GRID M60-8Q
  8. In the Administration Portal, click ComputeVirtual Machines. Select a virtual machine and click Edit. The Edit Virtual Machine dialog appears.
  9. Click Custom Properties. If you do not see Custom Properties, click Show Advanced Options first.
  10. In the Custom Properties dialog, click Please select a keymdev_type. If you do not see Please select a key, click the + button.
  11. In the text field that appears, enter the GPU type or types that you identified previously. For example: nvidia-12.You can add multiple vGPUs to a virtual machine using a comma-separated list. For example: nvidia22,nvidia22.

    Note

    Multiple vGPUs must be the same mdev type. You cannot, for example use nvidia22,nvidia15.

Now that you finished installing and configuring the GPU on the host, you can proceed to install and configure the vGPU on each virtual machine.

2.2. Installing the vGPU driver on the virtual machine

Procedure

  1. Run the virtual machine and connect to it using a serial console, such as SPICE or VNC.
  2. Download the driver to the virtual machine. For information on getting the driver, see the Drivers page on the NVIDIA website.
  3. Install the vGPU driver, following the instructions in Installing the NVIDIA vGPU Software Graphics Driver in the NVIDIA Virtual GPU software documentation.

    Important

    Linux only: When installing the driver on a Linux guest operating system, you are prompted to update xorg.conf. If you do not update xorg.conf during the installation, you need to update it manually. For more information, see Section 1.5, “Updating and Enabling xorg (Linux Virtual Machines)”.

  4. After the driver finishes installing, reboot the machine. For Windows virtual machines, fully power off the guest from the Administration portal or the VM portal, not from within the guest operating system.

    Important

    Windows only: Powering off the virtual machine from within the Windows guest operating system sometimes sends the virtual machine into hibernate mode, which does not completely clear the memory, possibly leading to subsequent problems. Using the Administration portal or the VM portal to power off the virtual machine forces it to fully clean the memory.

  5. Run the virtual machine and connect to it using one of the supported remote desktop protocols, such as Mechdyne TGX, and verify that the vGPU is recognized by opening the NVIDIA Control Panel. On Windows, you can alternatively open the Windows Device Manager. The vGPU should appear under Display adapters. For more information, see the NVIDIA vGPU Software Graphics Driver in the NVIDIA Virtual GPU software documentation.
  6. Set up NVIDIA vGPU guest software licensing for each vGPU and add the license credentials in the NVIDIA control panel. For more information, see How NVIDIA vGPU Software Licensing Is Enforced in the NVIDIA Virtual GPU Software Documentation.

2.3. Removing NVIDIA vGPU devices

To change the configuration of assigned vGPU mediated devices, the existing devices have to be removed from the assigned guests.

Procedure

  1. From the Administration portal, click ComputeVirtual Machines.
  2. Right-click the virtual machine and click Power off.
  3. After the virtual machine is powered off, select the virtual machine and click Edit. The Edit Virtual Machine window opens.
  4. On the Custom Properties tab, next to mdev type, click the minus - button and click OK.

2.4. Monitoring NVIDIA vGPUs

For NVIDIA vGPUS, to get info on the physical GPU and vGPU, you can use the NVIDIA System Management Interface by entering the nvidia-smi command on the host. For more information, see NVIDIA System Management Interface nvidia-smi in the NVIDIA Virtual GPU Software Documentation.

For example:

# nvidia-smi
Thu Nov  1 17:40:09 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.62                 Driver Version: 410.62                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla M60           On   | 00000000:84:00.0 Off |                  Off |
| N/A   40C    P8    24W / 150W |   1034MiB /  8191MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla M60           On   | 00000000:85:00.0 Off |                  Off |
| N/A   33C    P8    23W / 150W |   8146MiB /  8191MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla M60           On   | 00000000:8B:00.0 Off |                  Off |
| N/A   34C    P8    24W / 150W |   8146MiB /  8191MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla M60           On   | 00000000:8C:00.0 Off |                  Off |
| N/A   45C    P8    24W / 150W |     18MiB /  8191MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     34432    C+G   vgpu                                         508MiB |
|    0     34718    C+G   vgpu                                         508MiB |
|    1     35032    C+G   vgpu                                        8128MiB |
|    2     35032    C+G   vgpu                                        8128MiB |
+-----------------------------------------------------------------------------+

2.5. Remote desktop streaming services for NVIDIA vGPU

The following remote desktop streaming services have been successfully tested for use with the NVIDIA vGPU feature in RHEL 8:

  • HP-RGS
  • Mechdyne TGX - It is currently not possible to use Mechdyne TGX with Windows Server 2016 guests.
  • NICE DCV - When using this streaming service, Red Hat recommends using fixed resolution settings, because using dynamic resolution in some cases results in a black screen.
Note

SPICE is not supported.