Issue starting VM with GPU in passthrough

Solution In Progress - Updated -

Issue

  • We have a RedHat Openstack Platform 16.1 deployment which includes a Compute node equipped with 4 NVIDIA-A100 GPUs and 1 Mellanox InfiniBand adapter.

  • We have configured the Compute node to provide GPUs in passthrough to the VMs following the instructions here and here.

  • We were able to schedule the creation of a VM with the GPU in passthrough; however, on the first attempt, the VM creation failed.

  • Looking into /var/log/containers/nova/nova-compute.log on the GPU Compute node, we found the following error message:

2021-09-08 13:02:00.954 7 ERROR nova.compute.manager [instance: 6c1c1fec-8da7-41cc-809f-069fb3dc49ed] 2021-09-08T11:01:26.416056Z qemu-kvm: -device vfio-pci,host=0000:2f:00.0,id=hostdev0,bus=pci.0,addr=0x5: vfio 0000:2f:00.0: group 19 is not viable
2021-09-08 13:02:00.954 7 ERROR nova.compute.manager [instance: 6c1c1fec-8da7-41cc-809f-069fb3dc49ed] Please ensure all devices within the iommu_group are bound to their vfio bus driver.
  • Checking on the compute host, we found out that GPU0 (PCI device 0000:2f:00.0) is in the same IOMMU group 19 as the InfiniBand device (PCI device 0000:25:00.0).

  • If we understand correctly, PCI devices in the same IOMMU group must be all assigned in block either to the host or to one guest VM.

  • In fact, we were able to successfully create the instance after unbinding the InfiniBand device from its driver on the host:

echo -n "0000:25:00.0" > /sys/bus/pci/drivers/mlx5_core/unbind
  • This workaround was acceptable in this case because the Infiniband device is not currently needed on the host, but it is not a good general solution.

  • Is there any way to configure PCI passthrough independently for individual PCI devices?

Environment

  • Red Hat OpenStack Platform 16.1 (RHOSP)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content