Multiple NVIDIA H100 GPUs Fail with Driver Errors in KubeVirt VMs on Hosted Control Plane NodePools Configured with HostDevices
Issue
When attempting to pass through multiple NVIDIA H100 PCIe devices to KubeVirt Virtual Machines (VMs) provisioned via NodePools in a Hosted Control Plane (HCP) cluster, the VMs encounter hardware driver errors. This issue specifically appears when more than one GPU is assigned to a VM, even though single GPU passthrough works correctly.
Environment
- Red Hat OpenShift Container Platform (OCP) 4.x (with Hosted Control Plane)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.