Chapter 2. Configuring RoCE

This section explains background information about RDMA over Converged Ethernet (RoCE), as well as how to change the default RoCE version, and how to configure a software RoCE adapter.

Note that there are different vendors, such as Mellanox, Broadcom, and QLogic, who provide RoCE hardware.

2.1. Overview of RoCE protocol versions

RoCE is a network protocol that enables remote direct memory access (RDMA) over Ethernet.

The following are the different RoCE versions:

RoCE v1

The RoCE version 1 protocol is an Ethernet link layer protocol with ethertype 0x8915 that enables communication between any two hosts in the same Ethernet broadcast domain.

By default, when using a Mellanox ConnectX-3 network adapter, Red Hat Enterprise Linux uses RoCE v1 for the RDMA Connection Manager (RDMA_CM).

RoCE v2

The RoCE version 2 protocol exists on top of either the UDP over IPv4 or the UDP over IPv6 protocol. The UDP destination port number 4791 is reserved for RoCE v2.

By default, when using a Mellanox ConnectX-3 Pro, ConnectX-4 Lx, or ConnectX-5 network adapter, Red Hat Enterprise Linux uses RoCE v2 for the RDMA_CM, but the hardware supports both RoCE v1 and RoCE v2.

The RDMA_CM sets up a reliable connection between a client and a server for transferring data. RDMA_CM provides an RDMA transport-neutral interface for establishing connections. The communication uses a specific RDMA device, and data transfers are message-based.

Important

Using RoCE v2 on the client and RoCE v1 on the server is not supported. In this case, configure both the server and client to communicate over RoCE v1.

Additional resources

2.2. Temporarily changing the default RoCE version

Using the RoCE v2 protocol on the client and RoCE v1 on the server is not supported. If the hardware in your server only supports RoCE v1, configure your clients to communicate with the server using RoCE v1. This section describes how to enforce RoCE v1 on the client that uses the mlx5_0 driver for the Mellanox ConnectX-5 Infiniband device. Note that the changes described in this section are only temporary until you reboot the host.

Prerequisites

  • The client uses an InfiniBand device that uses, by default, the RoCE v2 protocol.
  • The InfiniBand device in the server only supports RoCE v1.

Procedure

  1. Create the /sys/kernel/config/rdma_cm/mlx5_0/ directory:

    # mkdir /sys/kernel/config/rdma_cm/mlx5_0/
  2. Display the default RoCE mode. For example, to display the mode for port 1:

    # cat /sys/kernel/config/rdma_cm/mlx5_0/ports/1/default_roce_mode
    
        RoCE v2
  3. Change the default RoCE mode to version 1:

    # echo "IB/RoCE v1" > /sys/kernel/config/rdma_cm/mlx5_0/ports/1/default_roce_mode

2.3. Configuring Soft-RoCE

Soft-RoCE is a software implementation of remote direct memory access (RDMA) over Ethernet, which is also called RXE. This section describes how to configure Soft-RoCE.

Use Soft-RoCE on hosts without RoCE host channel adapters (HCA).

Prerequisites

  • An Ethernet adapter is installed in the system.

Procedure

  1. Install the libibverbs, libibverbs-utils, and infiniband-diags packages:

    # yum install libibverbs libibverbs-utils infiniband-diags
  2. Load the rdma_rxe kernel module and display the current configuration:

    # rxe_cfg start
      Name    Link  Driver      Speed  NMTU  IPv4_addr        RDEV  RMTU
      enp7s0  yes   virtio_net         1500
  3. Add a new RXE device. For example, to add the enp7s0 Ethernet device as an RXE device, enter:

    # rxe_cfg add enp7s0
  4. Display the RXE device status:

    # rxe_cfg status
      Name    Link  Driver      Speed  NMTU  IPv4_addr        RDEV  RMTU
      enp7s0  yes   virtio_net         1500                   rxe0  1024  (3)

    In the RDEV column, you see that the enp7s0 is mapped to the rxe0 device.

  5. Optional: list the available RDMA devices in the system:

    # ibv_devices
        device          	   node GUID
        ------          	----------------
        rxe0            	505400fffed5e0fb

    Alternatively, use the ibstat utility to display a detailed status:

    # ibstat rxe0
    CA 'rxe0'
    	CA type:
    	Number of ports: 1
    	Firmware version:
    	Hardware version:
    	Node GUID: 0x505400fffed5e0fb
    	System image GUID: 0x0000000000000000
    	Port 1:
    		State: Active
    		Physical state: LinkUp
    		Rate: 100
    		Base lid: 0
    		LMC: 0
    		SM lid: 0
    		Capability mask: 0x00890000
    		Port GUID: 0x505400fffed5e0fb
    		Link layer: Ethernet