Open MPI failed an OFI Libfabric library call (fi_endpoint)

Solution Unverified - Updated -

Issue

  • Open MPI jobs fail with the following error.

    WARNING: There was an error initializing an OpenFabrics device.  Local host:   jei-cpu-r-09
      Local device: mlx5_0
    --------------------------------------------------------------------------
    sla-cpu-r-21:rank2.gcom_test_abort.exe: Unable to alloc send buffer MR on mlx5_0: Cannot allocate memory
    sla-cpu-r-21:rank2.gcom_test_abort.exe: Unable to allocate UD send buffer pool
    sla-cpu-r-21:rank1.gcom_test_abort.exe: Unable to alloc send buffer MR on mlx5_0: Cannot allocate memory
    sla-cpu-r-21:rank1.gcom_test_abort.exe: Unable to allocate UD send buffer pool
    --------------------------------------------------------------------------
    Open MPI failed an OFI Libfabric library call (fi_endpoint).  This is highly
    unusual; your job may behave unpredictably (and/or abort) after this.  Local host: jei-cpu-r-09
      Location: mtl_ofi_component.c:513
      Error: Invalid argument (22)
    

Environment

  • Red Hat Enterprise Linux (RHEL) 9.y
  • openmpi-4.1.1-7.el9.x86_64

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content