Chapter 12. Managing NVIDIA vGPU devices
The vGPU feature makes it possible to divide a physical NVIDIA GPU device into multiple virtual devices, referred to as mediated devices
. These mediated devices can then be assigned to multiple virtual machines (VMs) as virtual GPUs. As a result, these VMs can share the performance of a single physical GPU.
Assigning a physical GPU to VMs, with or without using mediated devices, makes it impossible for the host to use the GPU.
12.1. Setting up NVIDIA vGPU devices
To set up the NVIDIA vGPU feature, you need to download NVIDIA vGPU drivers for your GPU device, create mediated devices, and assign them to the intended virtual machines. For detailed instructions, see below.
Prerequisites
The mdevctl package is installed.
# yum install mdevctl
Your GPU supports vGPU mediated devices. For an up-to-date list of NVIDIA GPUs that support creating vGPUs, see the NVIDIA GPU Software Documentation.
If you do not know which GPU your host is using, install the lshw package and use the
lshw -C display
command. The following example shows the system is using an NVIDIA Tesla P4 GPU, compatible with vGPU.# lshw -C display *-display description: 3D controller product: GP104GL [Tesla P4] vendor: NVIDIA Corporation physical id: 0 bus info: pci@0000:01:00.0 version: a1 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress cap_list configuration: driver=vfio-pci latency=0 resources: irq:16 memory:f6000000-f6ffffff memory:e0000000-efffffff memory:f0000000-f1ffffff
Procedure
- Download the NVIDIA vGPU drivers and install them on your system. For instructions, see the NVIDIA documentation.
If the NVIDIA software installer did not create the /etc/modprobe.d/nvidia-installer-disable-nouveau.conf file, create a
conf
file of any name in /etc/modprobe.d/, and add the following lines in the file:blacklist nouveau options nouveau modeset=0
Regenerate the initial ramdisk for the current kernel, then reboot.
# dracut --force # reboot
Check that the kernel has loaded the
nvidia_vgpu_vfio
module and that thenvidia-vgpu-mgr.service
service is running.# lsmod | grep nvidia_vgpu_vfio nvidia_vgpu_vfio 45011 0 nvidia 14333621 10 nvidia_vgpu_vfio mdev 20414 2 vfio_mdev,nvidia_vgpu_vfio vfio 32695 3 vfio_mdev,nvidia_vgpu_vfio,vfio_iommu_type1 # systemctl status nvidia-vgpu-mgr.service nvidia-vgpu-mgr.service - NVIDIA vGPU Manager Daemon Loaded: loaded (/usr/lib/systemd/system/nvidia-vgpu-mgr.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2018-03-16 10:17:36 CET; 5h 8min ago Main PID: 1553 (nvidia-vgpu-mgr) [...]
Generate a device UUID.
# uuidgen 30820a6f-b1a5-4503-91ca-0c10ba58692a
Create a mediated device from the GPU hardware that you detected in the prerequisites, and assign the generated UUID to the device.
The following example shows how to create a mediated device of the
nvidia-63
vGPU type on an NVIDIA Tesla P4 card that runs on the 0000:01:00.0 PCI bus:# mdevctl start -u 30820a6f-b1a5-4503-91ca-0c10ba58692a -p 0000:01:00.0 --type nvidia-63
NoteFor the vGPU type values for specific GPU devices, see the Virtual GPU software documentation.
Make the mediated device persistent:
# mdevctl define --auto --uuid 30820a6f-b1a5-4503-91ca-0c10ba58692a
Attach the mediated device to a VM that you want to share the vGPU resources. To do so, add the following lines, along with the previously genereated UUID, to the <devices/> sections in the XML configuration of the VM.
<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci'> <source> <address uuid='30820a6f-b1a5-4503-91ca-0c10ba58692a'/> </source> </hostdev>
Note that each UUID can only be assigned to one VM at a time.
- For full functionality of the vGPU mediated devices to be available on the assigned VMs, set up NVIDIA vGPU guest software licensing on the VMs. For further information and instructions, see the NVIDIA Virtual GPU Software License Server User Guide.
Verification
List the active mediated devices on your host. If the output displays a defined device with the UUID used in the procedure, NVIDIA vGPU has been configured correctly. For example:
# mdevctl list 85006552-1b4b-45ef-ad62-de05be9171df 0000:01:00.0 nvidia-63 30820a6f-b1a5-4503-91ca-0c10ba58692a 0000:01:00.0 nvidia-63 (defined)
Additional resources
-
For more information on using the
mdevctl
utility, useman mdevctl
.
12.2. Removing NVIDIA vGPU devices
To change the configuration of assigned vGPU mediated devices, you need to remove the existing devices from the assigned VMs. For instructions, see below:
Prerequisites
The mdevctl package is installed.
# yum install mdevctl
- The VM from which you want to remove the device is shut down.
Procedure
Obtain the UUID of the mediated device that you want to remove. To do so, use the
mdevctl list
command:# mdevctl list 85006552-1b4b-45ef-ad62-de05be9171df 0000:01:00.0 nvidia-63 (defined) 30820a6f-b1a5-4503-91ca-0c10ba58692a 0000:01:00.0 nvidia-63 (defined)
Stop the running instance of the mediated vGPU device. To do so, use the
mdevctl stop
command with the UUID of the device. For example, to stop the30820a6f-b1a5-4503-91ca-0c10ba58692a
device:# mdevctl stop -u 30820a6f-b1a5-4503-91ca-0c10ba58692a
Remove the device from the XML configuration of the VM. To do so, use the
virsh edit
utility to edit the XML configuration of the VM, and remove the mdev’s configuration segment. The segment will look similar to the following:<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci'> <source> <address uuid='30820a6f-b1a5-4503-91ca-0c10ba58692a'/> </source> </hostdev>
Note that stopping and detaching the mediated device does not delete it, but rather keeps it as defined. As such, you can restart and attach the device to a different VM.
Optional: To delete the stopped mediated device, remove its definition:
# mdevctl undefine -u 30820a6f-b1a5-4503-91ca-0c10ba58692a
Verification
If you only stopped and detached the device, list the active mediated devices and the defined mediated devices.
# mdevctl list 85006552-1b4b-45ef-ad62-de05be9171df 0000:01:00.0 nvidia-63 (defined) # mdevctl list --defined 85006552-1b4b-45ef-ad62-de05be9171df 0000:01:00.0 nvidia-63 auto (active) 30820a6f-b1a5-4503-91ca-0c10ba58692a 0000:01:00.0 nvidia-63 manual
If the first command does not display the device but the second command does, the procedure was successful.
If you also deleted the device, the second command should not display the device.
# mdevctl list 85006552-1b4b-45ef-ad62-de05be9171df 0000:01:00.0 nvidia-63 (defined) # mdevctl list --defined 85006552-1b4b-45ef-ad62-de05be9171df 0000:01:00.0 nvidia-63 auto (active)
Additional resources
-
For more information on using the
mdevctl
utility, useman mdevctl
.
12.3. Obtaining NVIDIA vGPU information about your system
To evaluate the capabilities of the vGPU features available to you, you can obtain additional information about the mediated devices on your system, such as:
- How many mediated devices of a given type can be created
- What mediated devices are already configured on your system.
Prerequisites
The mdevctl package is installed.
# yum install mdevctl
Procedure
To see the available vGPU types on your host, use the
mdevctl types
command.For example, the following shows the information for a system that uses a physical Tesla T4 card under the 0000:41:00.0 PCI bus:
# mdevctl types 0000:41:00.0 nvidia-222 Available instances: 0 Device API: vfio-pci Name: GRID T4-1B Description: num_heads=4, frl_config=45, framebuffer=1024M, max_resolution=5120x2880, max_instance=16 nvidia-223 Available instances: 0 Device API: vfio-pci Name: GRID T4-2B Description: num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=8 nvidia-224 Available instances: 0 Device API: vfio-pci Name: GRID T4-2B4 Description: num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=8 nvidia-225 Available instances: 0 Device API: vfio-pci Name: GRID T4-1A Description: num_heads=1, frl_config=60, framebuffer=1024M, max_resolution=1280x1024, max_instance=16 [...]
To see the active vGPU devices on your host, including their types, UUIDs, and PCI buses of parent devices, use the
mdevctl list
command:# mdevctl list 85006552-1b4b-45ef-ad62-de05be9171df 0000:41:00.0 nvidia-223 83c32df7-d52e-4ec1-9668-1f3c7e4df107 0000:41:00.0 nvidia-223 (defined)
This example shows that the
85006552-1b4b-45ef-ad62-de05be9171df
device is running but not defined, and the83c32df7-d52e-4ec1-9668-1f3c7e4df107
is both defined and running.
Additional resources
-
For more information on using the
mdevctl
utility, useman mdevctl
.
12.4. Remote desktop streaming services for NVIDIA vGPU
The following remote desktop streaming services have been successfully tested for use with the NVIDIA vGPU feature in RHEL 8 hosts:
- HP-RGS - Note that it is currently not possible to use HP-RGS with RHEL 8 VMs.
- Mechdyne TGX - Note that it is currently not possible to use Mechdyne TGX with Windows Server 2016 VMs.
- NICE DCV - When using this streaming service, Red Hat recommends using fixed resolution settings, as using dynamic resolution in some cases results in a black screen. In addition, it is currently not possible to use NICE DCV with RHEL 8 VMs.