Chapter 9. Configure InfiniBand and RDMA Networks
9.1. Understanding InfiniBand and RDMA technologies
IPcommunications because they bypass kernel intervention in the communication process, and in the process greatly reduce the CPU overhead normally needed to process network communications. In a typical
IPdata transfer, application X on machine A will send some data to application Y on machine B. As part of the transfer, the kernel on machine B must first receive the data, decode the packet headers, determine that the data belongs to application Y, wake up application Y, wait for application Y to perform a read syscall into the kernel, then it must manually copy the data from the kernel's own internal memory space into the buffer provided by application Y. This process means that most network traffic must be copied across the system's main memory bus at least twice (once when the host adapter uses DMA to put the data into the kernel-provided memory buffer, and again when the kernel moves the data to the application's memory buffer) and it also means the computer must execute a number of context switches to switch between kernel context and application Y context. Both of these things impose extremely high CPU loads on the system when network traffic is flowing at very high rates.
IPnetworking applications are built upon, so it must provide its own API, the InfiniBand Verbs API, and applications must be ported to this API before they can use RDMA technology directly.
IPnetwork link layer as their underlying technology, and so the majority of their configuration is actually covered in the Chapter 2, Configure IP Networking chapter of this document. For the most part, once their
IPnetworking features are properly configured, their RDMA features are all automatic and will show up as long as the proper drivers for the hardware are installed. The kernel drivers are always included with each kernel Red Hat provides, however the user-space drivers must be installed manually if the InfiniBand package group was not selected at machine install time.
These are the necessary user-space packages:
Chelsio hardware— libcxgb3 or libcxgb4 depending on version of hardware
Mellanox hardware— libmlx4 or libmlx5 depending on hardware version.Additionally, edit
/etc/rdma/mlx4.confto set the port types properly for RoCE/IBoE usage. Edit
/etc/modprobe.d/mlx4.confto instruct the driver on which packet priority is configured for the “no-drop” service on the Ethernet switches the cards are plugged into.To configure Mellanox mlx5 cards, use the mstconfig program from the mstflint package. For more details, see the Configuring Mellanox mlx5 cards in Red Hat Enterprise Linux 7 Knowledge Base Article on the Red Hat Customer Portal.