Red Hat Training

A Red Hat training course is available for Red Hat Enterprise Linux

13.2. Transferring Data Using RoCE

RDMA over Converged Ethernet (RoCE) is a network protocol that enables remote direct memory access (RDMA) over an Ethernet network. There are two RoCE versions, RoCE v1 and RoCE v2, depending on the network adapter used.

RoCE v1
The RoCE v1 protocol is an Ethernet link layer protocol with ethertype 0x8915 that enables communication between any two hosts in the same Ethernet broadcast domain. RoCE v1 is the default version for RDMA Connection Manager (RDMA_CM) when using the ConnectX-3 network adapter.
RoCE v2
The RoCE v2 protocol exists on top of either the UDP over IPv4 or the UDP over IPv6 protocol. The UDP destination port number 4791 has been reserved for RoCE v2. Since Red Hat Enterprise Linux 7.5, RoCE v2 is the default version for RDMA_CM when using the ConnectX-3 Pro, ConnectX-4, ConnectX-4 Lx and ConnectX-5 network adapters. Hardware supports both RoCE v1 and RoCE v2.
RDMA Connection Manager (RDMA_CM) is used to set up a reliable connection between a client and a server for transferring data. RDMA_CM provides an RDMA transport-neutral interface for establishing connections. The communication is over a specific RDMA device, and data transfers are message-based.

Prerequisites

An RDMA_CM session requires one of the following:
  • Both client and server support the same RoCE mode.
  • A client supports RoCE v1 and a server RoCE v2.
Since a client determines the mode of the connection, the following cases are possible:
A successful connection:
If a client is in RoCE v1 or in RoCE v2 mode depending on the network card and the driver used, the corresponding server must have the same version to create a connection. Also, the connection is successful if a client is in RoCE v1 and a server in RoCE v2 mode.
A failed connection:
If a client is in RoCE v2 and the corresponding server is in RoCE v1, no connection can be established. In this case, update the driver or the network adapter of the corresponding server, see Section 13.2, “Transferring Data Using RoCE”

Table 13.1. RoCE Version Defaults Using RDMA_CM

ClientServerDefault setting
RoCE v1RoCE v1Connection
RoCE v1RoCE v2Connection
RoCE v2RoCE v2Connection
RoCE v2RoCE v1No connection
That RoCE v2 on the client and RoCE v1 on the server are not compatible. To resolve this issue, force both the server and client-side environment to communicate over RoCE v1. This means to force hardware that supports RoCE v2 to use RoCE v1:

Procedure 13.1. Changing the Default RoCE Mode When the Hardware Is Already Running in Roce v2

  1. Change into the /sys/kernel/config/rdma_cm directory to et the RoCE mode:
    ~]# cd /sys/kernel/config/rdma_cm
  2. Enter the ibstat command with an Ethernet network device to display the status. For example, for mlx5_0:
    ~]$ ibstat mlx5_0
        CA 'mlx5_0'
            CA type: MT4115
            Number of ports: 1
            Firmware version: 12.17.1010
            Hardware version: 0
            Node GUID: 0x248a0703004bf0a4
            System image GUID: 0x248a0703004bf0a4
            Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x268a07fffe4bf0a4
                Link layer: Ethernet
  3. Create a directory for the mlx5_0 device:
    ~]# mkdir mlx5_0
  4. Display the RoCE mode in the default_roce_mode file in the tree format:
    ~]# cd mlx5_0
    ~]$ tree
    └── ports
        └── 1
            ├── default_roce_mode
            └── default_roce_tos
    ~]$ cat /sys/kernel/config/rdma_cm/mlx5_0/ports/1/default_roce_mode
        RoCE v2
  5. Change the default RoCE mode:
    ~]# echo "RoCE v1" > /sys/kernel/config/rdma_cm/mlx5_0/ports/1/default_roce_mode
  6. View the changes:
    ~]$ cat /sys/kernel/config/rdma_cm/mlx5_0/ports/1/default_roce_mode
        RoCE v1