VMware guest with large memory hangs

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 9
  • AMD CPUs
  • VMware hypervisor where the vIOMMU is enabled

Issue

  • VMware guests with large memory intermittently hangs
  • The serial console logs show a large number of errors related to storage devices

Resolution

Red Hat support and engineering continue to look into this matter.

Workaround

Root Cause

  • The guest kernel sends an address to the hypervisor vIOMMU that is larger than what the vIOMMU is configured to handle.

Explanation

  • IOMMUs work to help translate addresses between what physical devices (like network cards, graphics cards, usb mice, etc) need for communicating with a system and the OS/Applications written to interact with those areas of memory. This translation also guards against attempts to access areas of memory the device should not access, allows a device to address large amounts of space, etc.
  • Virtualized environments may have an IOMMU layer in the Guest OS itself, and the hypervisor may also have an IOMMU layer to communicate with hardware devices. To facilitate Guest OS uses of IOMMUs, a hypervisor will also provide a virtualized IOMMU (vIOMMU) to the Guest OS. This vIOMMU will act as a go-between to the Guest OS's IOMMU layers and the physical IOMMU the hypervisor uses to communicate with devices.
  • Within the layers of the Guest OS IOMMU, vIOMMU, and the physical IOMMU on the host, the address range accessible needs to match throughout the layers, otherwise an address may be too large to handle by one of these layers.
  • The AMD IOMMU implementation in Red Hat Enterprise Linux and the Linux kernel assumes an address space larger than what the VMware vIOMMU is written to handle, which can result in Guest OS hangs in the event the Guest OS sends an address larger than what the vIOMMU can handle.
  • Setting iommu=pt disables the Guest OS translations and directly passes those addresses through the Guest OS IOMMU layer to the vIOMMU to handle, mitigating address size mismatches.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments