Red Hat Training

A Red Hat training course is available for Red Hat Enterprise Linux

13.5. Configuring the Base RDMA Subsystem

Startup of the rdma service is automatic. When RDMA capable hardware, whether InfiniBand or iWARP or RoCE/IBoE is detected, udev instructs systemd to start the rdma service. 
~]# systemctl status rdma
● rdma.service - Initialize the iWARP/InfiniBand/RDMA stack in the kernel
   Loaded: loaded (/usr/lib/systemd/system/rdma.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
     Docs: file:/etc/rdma/rdma.conf
Users need not enable the rdma service, but they can if they want to force it on all the time. To do that, enter the following command as root:
~]# systemctl enable rdma

13.5.1. Configuration of the rdma.conf file

The rdma service reads /etc/rdma/rdma.conf to find out which kernel-level and user-level RDMA protocols the administrator wants to be loaded by default. Users should edit this file to turn various drivers on or off.
The various drivers that can be enabled and disabled are:
  • IPoIB — This is an IP network emulation layer that allows IP applications to run over InfiniBand networks.
  • SRP — This is the SCSI Request Protocol. It allows a machine to mount a remote drive or drive array that is exported through the SRP protocol on the machine as though it were a local hard disk.
  • SRPT — This is the target mode, or server mode, of the SRP protocol. This loads the kernel support necessary for exporting a drive or drive array for other machines to mount as though it were local on their machine. Further configuration of the target mode support is required before any devices will actually be exported. See the documentation in the targetd and targetcli packages for further information.
  • ISER — This is a low-level driver for the general iSCSI layer of the Linux kernel that provides transport over InfiniBand networks for iSCSI devices.
  • RDS — This is the Reliable Datagram Service in the Linux kernel. It is not enabled in Red Hat Enterprise Linux 7 kernels and so cannot be loaded.

13.5.2. Usage of 70-persistent-ipoib.rules

The rdma package provides the file /etc/udev.d/rules.d/70-persistent-ipoib.rules. This udev rules file is used to rename IPoIB devices from their default names (such as ib0 and ib1) to more descriptive names. Users must edit this file to change how their devices are named. First, find out the GUID address for the device to be renamed:
~]$ ip link show ib0
8: ib0: >BROADCAST,MULTICAST,UP,LOWER_UP< mtu 65520 qdisc pfifo_fast state UP mode DEFAULT qlen 256
    link/infiniband 80:00:02:00:fe:80:00:00:00:00:00:00:f4:52:14:03:00:7b:cb:a1 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
Immediately after link/infiniband is the 20 byte hardware address for the IPoIB interface. The final 8 bytes of the address, marked in bold above, is all that is required to make a new name. Users can make up whatever naming scheme suits them. For example, use a device_fabric naming convention such as mlx4_ib0 if a mlx4 device is connected to an ib0 subnet fabric. The only thing that is not recommended is to use the standard names, like ib0 or ib1, as these can conflict with the kernel assigned automatic names. Next, add an entry in the rules file. Copy the existing example in the rules file, replace the 8 bytes in the ATTR{address} entry with the highlighted 8 bytes from the device to be renamed, and enter the new name to be used in the NAME field.

13.5.3. Relaxing memlock restrictions for users

RDMA communications require that physical memory in the computer be pinned (meaning that the kernel is not allowed to swap that memory out to a paging file in the event that the overall computer starts running short on available memory). Pinning memory is normally a very privileged operation. In order to allow users other than root to run large RDMA applications, it will likely be necessary to increase the amount of memory that non-root users are allowed to pin in the system. This is done by adding a file in the /etc/security/limits.d/ directory with contents such as the following:
~]$ more /etc/security/limits.d/rdma.conf
# configuration for rdma tuning
*       soft    memlock         unlimited
*       hard    memlock         unlimited
# rdma tuning end

13.5.4. Configuring Mellanox cards for Ethernet operation

Certain hardware from Mellanox is capable of running in either InfiniBand or Ethernet mode. These cards generally default to InfiniBand. Users can set the cards to Ethernet mode. There is currently support for setting the mode only on ConnectX family hardware (which uses either the mlx5 or mlx4 driver).
To configure Mellanox mlx5 cards, use the mstconfig program from the mstflint package. For more details, see the Configuring Mellanox mlx5 cards in Red Hat Enterprise Linux 7 Knowledge Base article on the Red Hat Customer Portal.
To configure Mellanox mlx4 cards, use mstconfig to set the port types on the card as described in the Knowledge Base article. If mstconfig does not support your card, edit the /etc/rdma/mlx4.conf file and follow the instructions in that file to set the port types properly for RoCE/IBoE usage. In this case is also necessary to rebuild the initramfs to make sure the updated port settings are copied into the initramfs.
Once the port type has been set, if one or both ports are set to Ethernet and mstconfig was not used to set the port types, then users might see this message in their logs:
mlx4_core 0000:05:00.0: Requested port type for port 1 is not supported on this HCA
This is normal and does not affect operation. The script responsible for setting the port type has no way of knowing when the driver has finished switching port 2 to the requested type internally, and from the time that the script issues a request for port 2 to switch until that switch is complete, the attempts to set port 1 to a different type get rejected. The script retries until the command succeeds or until a timeout has passed indicating that the port switch never completed.

13.5.5. Connecting to a Remote Linux SRP Target

The SCSI RDMA Protocol (SRP) is a network protocol that enables a system to use RDMA to access SCSI devices that are attached to another system. To allow an SRP initiator to connect an SRP target on the SRP target side, you must add an access control list (ACL) entry for the host channel adapter (HCA) port used in the initiator.
ACL IDs for HCA ports are not unique. The ACL IDs depend on the GID format of the HCAs. HCAs that use the same driver, for example ib_qib, can have different format of GIDs. The ACL ID also depends on how you initiate the connection request.

Connecting to a Remote Linux SRP Target: High-Level Overview

  1. Prepare the target side:
    1. Create storage back end. For example get the /dev/sdc1 partition:
      /> /backstores/block create vol1 /dev/sdc1
    2. Create an SRP target:
      /> /srpt create 0xfe80000000000000001175000077dd7e
    3. Create a LUN based on the back end created in step a:
      /> /srpt/ib.fe80000000000000001175000077dd7e/luns create /backstores/block/vol1
    4. Create a Node ACL for the remote SRP client:
      /> /srpt/ib.fe80000000000000001175000077dd7e/acls create 0x7edd770000751100001175000077d708
      Note that the Node ACL is different for srp_daemon and ibsrpdm.
  2. Initiate an SRP connection with srp_daemon or ibsrpdm for the client side:
    [root@initiator]# srp_daemon -e -n -i qib0 -p 1 -R 60 &
    [root@initiator]# ibsrpdm -c -d /dev/infiniband/umad0 > /sys/class/infiniband_srp/srp-qib0-1/add_target
  3. Optional. It is recommended to verify the SRP connection with different tools, such as lsscsi or dmesg.

Procedure 13.3. Connecting to a Remote Linux SRP Target with srp_daemon or ibsrpdm

  1. Use the ibstat command on the target to determine the State and Port GUID values. The HCA must be in Active state. The ACL ID is based on the Port GUID:
    [root@target]# ibstat
    CA 'qib0'
    	CA type: InfiniPath_QLE7342
    	Number of ports: 1
    	Firmware version:
    	Hardware version: 2
    	Node GUID: 0x001175000077dd7e
    	System image GUID: 0x001175000077dd7e
    	Port 1:
    		State: Active
    		Physical state: LinkUp
    		Rate: 40
    		Base lid: 1
    		LMC: 0
    		SM lid: 1
    		Capability mask: 0x0769086a
    		Port GUID: 0x001175000077dd7e
    		Link layer: InfiniBand
    
  2. Get the SRP target ID, which is based on the GUID of the HCA port. Note that you need a dedicated disk partition as a back end for a SRP target, for example /dev/sdc1. The following command replaces the default prefix of fe80, removes the colon, and adds the new prefix to the remainder of the string:
    [root@target]# ibstatus | grep '<default-gid>' | sed -e 's/<default-gid>://' -e 's/://g' | grep 001175000077dd7e
    fe80000000000000001175000077dd7e
    
  3. Use the targetcli tool to create the LUN vol1 on the block device, create an SRP target, and export the LUN:
    [root@target]# targetcli
    
    
    /> /backstores/block create vol1 /dev/sdc1
    Created block storage object vol1 using /dev/sdc1.
    /> /srpt create 0xfe80000000000000001175000077dd7e
    Created target ib.fe80000000000000001175000077dd7e.
    /> /srpt/ib.fe80000000000000001175000077dd7e/luns create /backstores/block/vol1
    Created LUN 0.
    /> ls /
    o- / ............................................................................. [...]
      o- backstores .................................................................. [...]
      | o- block ...................................................... [Storage Objects: 1]
      | | o- vol1 ............................... [/dev/sdc1 (77.8GiB) write-thru activated]
      | o- fileio ..................................................... [Storage Objects: 0]
      | o- pscsi ...................................................... [Storage Objects: 0]
      | o- ramdisk .................................................... [Storage Objects: 0]
      o- iscsi ................................................................ [Targets: 0]
      o- loopback ............................................................. [Targets: 0]
      o- srpt ................................................................. [Targets: 1]
        o- ib.fe80000000000000001175000077dd7e ............................... [no-gen-acls]
          o- acls ................................................................ [ACLs: 0]
          o- luns ................................................................ [LUNs: 1]
            o- lun0 ............................................... [block/vol1 (/dev/sdc1)]
    />
    
  4. Use the ibstat command on the initiator to check if the state is Active and determine the Port GUID:
    [root@initiator]# ibstat
    CA 'qib0'
    	CA type: InfiniPath_QLE7342
    	Number of ports: 1
    	Firmware version:
    	Hardware version: 2
    	Node GUID: 0x001175000077d708
    	System image GUID: 0x001175000077d708
    	Port 1:
    		State: Active
    		Physical state: LinkUp
    		Rate: 40
    		Base lid: 2
    		LMC: 0
    		SM lid: 1
    		Capability mask: 0x07690868
    		Port GUID: 0x001175000077d708
    		Link layer: InfiniBand
    
  5. Use the following command to scan without connecting to a remote SRP target. The target GUID shows that the initiator had found remote target. The ID string shows that the remote target is a Linux software target (ib_srpt.ko).
    [root@initiator]# srp_daemon -a -o
    IO Unit Info:
        port LID:        0001
        port GID:        fe80000000000000001175000077dd7e
        change ID:       0001
        max controllers: 0x10
    
        controller[  1]
            GUID:      001175000077dd7e
            vendor ID: 000011
            device ID: 007322
            IO class : 0100
            ID:        Linux SRP target
            service entries: 1
                service[  0]: 001175000077dd7e / SRP.T10:001175000077dd7e
    
  6. To verify the SRP connection, use the lsscsi command to list SCSI devices and compare the lsscsi output before and after the initiator connects to target.
    [root@initiator]# lsscsi
    [0:0:10:0]   disk    IBM-ESXS ST9146803SS      B53C  /dev/sda
    
  7. To connect to a remote target without configuring a valid ACL for the initiator port, which is expected to fail, use the following commands for srp_daemon or ibsrpdm:
    [root@initiator]# srp_daemon -e -n -i qib0 -p 1 -R 60 &
    [1] 4184
    
    [root@initiator]# ibsrpdm -c -d /dev/infiniband/umad0 > /sys/class/infiniband_srp/srp-qib0-1/add_target
    
  8. The output of the dmesg shows why the SRP connection operation failed. In a later step, the dmesg command on the target side is used to make the situation clear.
    [root@initiator]# dmesg -c
    [ 1230.059652] scsi host5: ib_srp: REJ received
    [ 1230.059659] scsi host5: ib_srp: SRP LOGIN from fe80:0000:0000:0000:0011:7500:0077:d708 to fe80:0000:0000:0000:0011:7500:0077:dd7e REJECTED, reason 0x00010006
    [ 1230.073792] scsi host5: ib_srp: Connection 0/2 failed
    [ 1230.078848] scsi host5: ib_srp: Sending CM DREQ failed
    
  9. Because of failed LOGIN, the output of the lsscsi command is the same as in the earlier step.
    [root@initiator]# lsscsi
    [0:0:10:0]   disk    IBM-ESXS ST9146803SS      B53C  /dev/sda
    
  10. Using the dmesg on the target side (ib_srpt.ko) provides an explanation of why LOGIN failed. Also, the output contains the valid ACL ID provided by srp_daemon: 0x7edd770000751100001175000077d708.
    [root@target]# dmesg
    [ 1200.303001] ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x7edd770000751100:0x1175000077d708, t_port_id 0x1175000077dd7e:0x1175000077dd7e and it_iu_len 260 on port 1 (guid=0xfe80000000000000:0x1175000077dd7e)
    [ 1200.322207] ib_srpt Rejected login because no ACL has been configured yet for initiator 0x7edd770000751100001175000077d708.
  11. Use the targetcli tool to add a valid ACL:
    [root@target]# targetcli
    targetcli shell version 2.1.fb41
    Copyright 2011-2013 by Datera, Inc and others.
    For help on commands, type 'help'.
    
    /> /srpt/ib.fe80000000000000001175000077dd7e/acls create 0x7edd770000751100001175000077d708
    Created Node ACL for ib.7edd770000751100001175000077d708
    Created mapped LUN 0.
    
  12. Verify the SRP LOGIN operation:
    1. Wait for 60 seconds to allow srp_daemon to re-try logging in:
      [root@initiator]# sleep 60
    2. Verify the SRP LOGIN operation:
      [root@initiator]# lsscsi
      [0:0:10:0]   disk    IBM-ESXS ST9146803SS      B53C  /dev/sda
      [7:0:0:0] disk LIO-ORG vol1 4.0 /dev/sdb
    3. For a kernel log of SRP target discovery, use:
      [root@initiator]# dmesg -c
      [ 1354.182072] scsi host7: SRP.T10:001175000077DD7E
      [ 1354.187258] scsi 7:0:0:0: Direct-Access     LIO-ORG  vol1             4.0  PQ: 0 ANSI: 5
      [ 1354.208688] scsi 7:0:0:0: alua: supports implicit and explicit TPGS
      [ 1354.215698] scsi 7:0:0:0: alua: port group 00 rel port 01
      [ 1354.221409] scsi 7:0:0:0: alua: port group 00 state A non-preferred supports TOlUSNA
      [ 1354.229147] scsi 7:0:0:0: alua: Attached
      [ 1354.233402] sd 7:0:0:0: Attached scsi generic sg1 type 0
      [ 1354.233694] sd 7:0:0:0: [sdb] 163258368 512-byte logical blocks: (83.5 GB/77.8 GiB)
      [ 1354.235127] sd 7:0:0:0: [sdb] Write Protect is off
      [ 1354.235128] sd 7:0:0:0: [sdb] Mode Sense: 43 00 00 08
      [ 1354.235550] sd 7:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
      [ 1354.255491] sd 7:0:0:0: [sdb] Attached SCSI disk
      [ 1354.265233] scsi host7: ib_srp: new target: id_ext 001175000077dd7e ioc_guid 001175000077dd7e pkey ffff service_id 001175000077dd7e sgid fe80:0000:0000:0000:0011:7500:0077:d708 dgid fe80:0000:0000:0000:0011:7500:0077:dd7e
      xyx