Chapter 2. Assigning virtual GPUs
To set up NVIDIA vGPU devices, you need to:
- Obtain and install the correct NVIDIA vGPU driver for your GPU device
- Create mediated devices
- Assign each mediated device to a virtual machine
- Install guest drivers on each virtual machine.
The following procedures explain this process.
2.1. Setting up NVIDIA vGPU devices on the host
Before installing the NVIDIA vGPU driver on the guest operating system, you need to understand the licensing requirements and obtain the correct license credentials.
Prerequisites
- Your GPU device supports virtual GPU (vGPU) functionality.
- Your system is listed as a validated server hardware platform.
For more information about supported GPUs and validated platforms, see NVIDIA vGPU CERTIFIED SERVERS on www.nvidia.com.
Procedure
- Download and install the NVIDIA-vGPU driver. For information on getting the driver, see vGPU drivers page on the NVIDIA website. An Nvidia enterprise account is required to download the drivers. Contact the hardware vendor if this is not available.
- Unzip the downloaded file from the Nvidia website and copy it to the host to install the driver.
-
If the NVIDIA software installer did not create the
/etc/modprobe.d/nvidia-installer-disable-nouveau.conf
file, create it manually. Open
/etc/modprobe.d/nvidia-installer-disable-nouveau.conf
file in a text editor and add the following lines to the end of the file:blacklist nouveau options nouveau modeset=0
Regenerate the initial ramdisk for the current kernel, then reboot:
# dracut --force # reboot
Alternatively, if you need to use a prior supported kernel version with mediated devices, regenerate the initial ramdisk for all installed kernel versions:
# dracut --regenerate-all --force # reboot
Check that the kernel loaded the
nvidia_vgpu_vfio
module:# lsmod | grep nvidia_vgpu_vfio
Check that the
nvidia-vgpu-mgr.service
service is running:# systemctl status nvidia-vgpu-mgr.service
For example:
# lsmod | grep nvidia_vgpu_vfio nvidia_vgpu_vfio 45011 0 nvidia 14333621 10 nvidia_vgpu_vfio mdev 20414 2 vfio_mdev,nvidia_vgpu_vfio vfio 32695 3 vfio_mdev,nvidia_vgpu_vfio,vfio_iommu_type1 # systemctl status nvidia-vgpu-mgr.service nvidia-vgpu-mgr.service - NVIDIA vGPU Manager Daemon Loaded: loaded (/usr/lib/systemd/system/nvidia-vgpu-mgr.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2018-03-16 10:17:36 CET; 5h 8min ago Main PID: 1553 (nvidia-vgpu-mgr) [...]
- In the Administration Portal, click Compute → Virtual Machines.
- Click the name of the virtual machine to go to the details view.
- Click the Host Devices tab.
- Click Manage vGPU. The Manage vGPU dialog box opens.
- Select a vGPU type and the number of instances that you would like to use with this virtual machine.
Select On for Secondary display adapter for VNC to add a second emulated QXL or VGA graphics adapter as the primary graphics adapter for the console in addition to the vGPU.
NoteOn cluster levels 4.5 and later, when a vGPU is used and the Secondary display adapter for VNC is set to On, an additional framebuffer display device is automatically added to the virtual machine. This allows the virtual machine console to be displayed before the vGPU is initialized, instead of a blank screen.
- Click Save.
2.2. Installing the vGPU driver on the virtual machine
Procedure
Run the virtual machine and connect to it using the VNC console.
NoteSPICE is not supported on vGPU.
- Download the driver to the virtual machine. For information on getting the driver, see the Drivers page on the NVIDIA website.
Install the vGPU driver, following the instructions in Installing the NVIDIA vGPU Software Graphics Driver in the NVIDIA Virtual GPU software documentation.
ImportantLinux only: When installing the driver on a Linux guest operating system, you are prompted to update xorg.conf. If you do not update xorg.conf during the installation, you need to update it manually.
After the driver finishes installing, reboot the machine. For Windows virtual machines, fully power off the guest from the Administration portal or the VM portal, not from within the guest operating system.
ImportantWindows only: Powering off the virtual machine from within the Windows guest operating system sometimes sends the virtual machine into hibernate mode, which does not completely clear the memory, possibly leading to subsequent problems. Using the Administration portal or the VM portal to power off the virtual machine forces it to fully clean the memory.
- Run the virtual machine and connect to it using one of the supported remote desktop protocols, such as Mechdyne TGX, and verify that the vGPU is recognized by opening the NVIDIA Control Panel. On Windows, you can alternatively open the Windows Device Manager. The vGPU should appear under Display adapters. For more information, see the NVIDIA vGPU Software Graphics Driver in the NVIDIA Virtual GPU software documentation.
- Set up NVIDIA vGPU guest software licensing for each vGPU and add the license credentials in the NVIDIA control panel. For more information, see How NVIDIA vGPU Software Licensing Is Enforced in the NVIDIA Virtual GPU Software Documentation.
2.3. Removing NVIDIA vGPU devices
To change the configuration of assigned vGPU mediated devices, the existing devices have to be removed from the assigned guests.
Procedure
- In the Administration Portal, click Compute → Virtual Machines.
- Click the name of the virtual machine to go to the details view.
- Click the Host Devices tab.
- Click Manage vGPU. The Manage vGPU dialog box opens.
- Click the x button next to Selected vGPU Type Instances to detach the vGPU from the virtual machine.
- Click SAVE.
2.4. Monitoring NVIDIA vGPUs
For NVIDIA vGPUS, to get info on the physical GPU and vGPU, you can use the NVIDIA System Management Interface by entering the nvidia-smi
command on the host. For more information, see NVIDIA System Management Interface nvidia-smi in the NVIDIA Virtual GPU Software Documentation.
For example:
# nvidia-smi Thu Nov 1 17:40:09 2018 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 410.62 Driver Version: 410.62 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla M60 On | 00000000:84:00.0 Off | Off | | N/A 40C P8 24W / 150W | 1034MiB / 8191MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla M60 On | 00000000:85:00.0 Off | Off | | N/A 33C P8 23W / 150W | 8146MiB / 8191MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla M60 On | 00000000:8B:00.0 Off | Off | | N/A 34C P8 24W / 150W | 8146MiB / 8191MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla M60 On | 00000000:8C:00.0 Off | Off | | N/A 45C P8 24W / 150W | 18MiB / 8191MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 34432 C+G vgpu 508MiB | | 0 34718 C+G vgpu 508MiB | | 1 35032 C+G vgpu 8128MiB | | 2 35032 C+G vgpu 8128MiB | +-----------------------------------------------------------------------------+
2.5. Remote desktop streaming services for NVIDIA vGPU
The following remote desktop streaming services have been successfully tested for use with the NVIDIA vGPU feature in RHEL 8:
- HP-RGS
- Mechdyne TGX - It is currently not possible to use Mechdyne TGX with Windows Server 2016 guests.
- NICE DCV - When using this streaming service, use fixed resolution settings, because using dynamic resolution in some cases results in a black screen.