VMware guest with large memory hangs
Environment
- Red Hat Enterprise Linux 9
- AMD CPUs
- VMware hypervisor where the vIOMMU is enabled
Issue
- VMware guests with large memory intermittently hangs
- The serial console logs show a large number of errors related to storage devices
Resolution
Red Hat support and engineering continue to look into this matter.
Workaround
- Setting
iommu=ptseems to mitigate the issue for now. - For details on modifying kernel parameters for Red Hat Enterprise Linux 9, please refer to the following;
- How to modify the kernel command-line in Red Hat Enterprise Linux 9
- Note Please do not set this parameter if PCI Device passthrough or SRIOV is required for the Guest OS.
Root Cause
- The guest kernel sends an address to the hypervisor vIOMMU that is larger than what the vIOMMU is configured to handle.
Explanation
- IOMMUs work to help translate addresses between what physical devices (like network cards, graphics cards, usb mice, etc) need for communicating with a system and the OS/Applications written to interact with those areas of memory. This translation also guards against attempts to access areas of memory the device should not access, allows a device to address large amounts of space, etc.
- Virtualized environments may have an IOMMU layer in the Guest OS itself, and the hypervisor may also have an IOMMU layer to communicate with hardware devices. To facilitate Guest OS uses of IOMMUs, a hypervisor will also provide a virtualized IOMMU (vIOMMU) to the Guest OS. This vIOMMU will act as a go-between to the Guest OS's IOMMU layers and the physical IOMMU the hypervisor uses to communicate with devices.
- Within the layers of the Guest OS IOMMU, vIOMMU, and the physical IOMMU on the host, the address range accessible needs to match throughout the layers, otherwise an address may be too large to handle by one of these layers.
- The AMD IOMMU implementation in Red Hat Enterprise Linux and the Linux kernel assumes an address space larger than what the VMware vIOMMU is written to handle, which can result in Guest OS hangs in the event the Guest OS sends an address larger than what the vIOMMU can handle.
- Setting
iommu=ptdisables the Guest OS translations and directly passes those addresses through the Guest OS IOMMU layer to the vIOMMU to handle, mitigating address size mismatches.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments