GPU Access RHEL Baremetal Installation -

Posted on

Hi,
for a RHEL 9.3 installation a problem with accessing one of two GPUs occured.
Setup: Motherboard Lenovo 1037, Intel Xeon Silver, RTX A2000, RTX A5000, 80GB Ram

The system fails to assign the bus [1].
The issue came up in Ubuntu as well, there the use of "pci=realloc" on grub solved the problem and we were able to address the card.

Procedure on RHEL:
I changed /etc/default/grub:
GRUB_CMDLINE_LINUX="crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M pci=realloc intel_iommu=on resume=/dev/mapper/rhel00-swap rd.lvm.lv=rhel00/root rd.lvm.lv=rhel00/swap rhgb quiet"
sudo grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg
reboot
Unfortunately, this procedure did not fox the problem. As this worked for Ubuntu, I am not sure whether the "pci=realloc" command also proposed here [2] was actually picked up by the system (option [3] was also tried out).

Any help is highly appreciated.

[1]
[ 0.807109] pci 0000:65:00.0: BAR 1: no space for [mem size 0x800000000 64bit pref]
[ 0.807111] pci 0000:65:00.0: BAR 1: failed to assign [mem size 0x800000000 64bit pref]
[ 0.807113] pci 0000:65:00.0: BAR 3: no space for [mem size 0x02000000 64bit pref]
[ 0.807114] pci 0000:65:00.0: BAR 3: failed to assign [mem size 0x02000000 64bit pref]
[ 0.807116] pci 0000:65:00.0: BAR 0: no space for [mem size 0x01000000]
[ 0.807117] pci 0000:65:00.0: BAR 0: failed to assign [mem size 0x01000000]
[ 0.807119] pci 0000:64:00.0: PCI bridge to [bus 65]
[ 0.807127] pci 0000:64:00.0: bridge window [mem 0xe0900000-0xe0efffff]
[ 0.807134] pci 0000:64:00.0: bridge window [mem 0x4b380000000-0x4bfafffffff 64bit pref]
[ 0.807147] pci_bus 0000:64: Some PCI device resources are unassigned, try booting with pci=realloc

[2] https://forums.developer.nvidia.com/t/ph402-dual-p100-64g-rminitadapter-failed-memory-mapping-issue/173877/2

[3] https://forums.developer.nvidia.com/t/2-cards-issue/70261/3

Responses