Multiple NVIDIA H100 GPUs Fail with Driver Errors in KubeVirt VMs on Hosted Control Plane NodePools Configured with HostDevices

Solution Verified - Updated -

Issue

When attempting to pass through multiple NVIDIA H100 PCIe devices to KubeVirt Virtual Machines (VMs) provisioned via NodePools in a Hosted Control Plane (HCP) cluster, the VMs encounter hardware driver errors. This issue specifically appears when more than one GPU is assigned to a VM, even though single GPU passthrough works correctly.

Environment

  • Red Hat OpenShift Container Platform (OCP) 4.x (with Hosted Control Plane)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content