Open MPI failed an OFI Libfabric library call (fi_endpoint)
Issue
-
Open MPI jobs fail with the following error.
WARNING: There was an error initializing an OpenFabrics device. Local host: jei-cpu-r-09 Local device: mlx5_0 -------------------------------------------------------------------------- sla-cpu-r-21:rank2.gcom_test_abort.exe: Unable to alloc send buffer MR on mlx5_0: Cannot allocate memory sla-cpu-r-21:rank2.gcom_test_abort.exe: Unable to allocate UD send buffer pool sla-cpu-r-21:rank1.gcom_test_abort.exe: Unable to alloc send buffer MR on mlx5_0: Cannot allocate memory sla-cpu-r-21:rank1.gcom_test_abort.exe: Unable to allocate UD send buffer pool -------------------------------------------------------------------------- Open MPI failed an OFI Libfabric library call (fi_endpoint). This is highly unusual; your job may behave unpredictably (and/or abort) after this. Local host: jei-cpu-r-09 Location: mtl_ofi_component.c:513 Error: Invalid argument (22)
Environment
- Red Hat Enterprise Linux (RHEL) 9.y
openmpi-4.1.1-7.el9.x86_64
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.