Chapter 5. Configuring an InfiniBand subnet manager

All InfiniBand networks must have a subnet manager running for the network to function. This is true even if two machines are connected directly with no switch involved.

It is possible to have more than one subnet manager. In that case, one acts as a master and another subnet manager acts as a slave that will take over in case the master subnet manager fails.

Most InfiniBand switches contain an embedded subnet manager. However, if you need a more up-to-date subnet manager or if you require more control, use the OpenSM subnet manager provided by Red Hat Enterprise Linux.

5.1. Installing the OpenSM subnet manager

OpenSM is a subnet manager and administrator that follows the InfiniBand specifications to initialize InfiniBand hardware where at least one instance of OpenSM service always runs.

Procedure

  1. Install the opensm package:

    # yum install opensm
  2. Configure OpenSM in case the default installation does not match your environment.

    With only one InfiniBand port, the host acts as the master subnet manager that does not require any custom changes. The default configuration works without any modification.

  3. Enable and start the opensm service:

    # systemctl enable --now opensm

Additional resources

  • opensm(8) man page

5.2. Configuring OpenSM using the simple method

OpenSM is an InfiniBand specification-based subnet manager and administrator that configures the InfiniBand fabric, a network topology to interconnect the InfiniBand nodes.

Prerequisites

  • One or more InfiniBand ports are installed on the server

Procedure

  1. Obtain the GUIDs for the ports using the ibstat utility:

    # ibstat -d mlx4_0
    
    CA 'mlx4_0'
       CA type: MT4099
       Number of ports: 2
       Firmware version: 2.42.5000
       Hardware version: 1
       Node GUID: 0xf4521403007be130
       System image GUID: 0xf4521403007be133
       Port 1:
          State: Active
          Physical state: LinkUp
          Rate: 56
          Base lid: 3
          LMC: 0
          SM lid: 1
          Capability mask: 0x02594868
          Port GUID: 0xf4521403007be131
          Link layer: InfiniBand
       Port 2:
          State: Down
          Physical state: Disabled
          Rate: 10
          Base lid: 0
          LMC: 0
          SM lid: 0
          Capability mask: 0x04010000
          Port GUID: 0xf65214fffe7be132
          Link layer: Ethernet
    Note

    Some InfiniBand adapters use the same GUID for the node, system, and port.

  2. Edit the /etc/sysconfig/opensm file and set the GUIDs in the GUIDS parameter:

    GUIDS="GUID_1 GUID_2"
  3. You can set the PRIORITY parameter if multiple subnet managers are available in your subnet. For example:

    PRIORITY=15

Additional resources

  • /etc/sysconfig/opensm

5.3. Configuring OpenSM by editing the opensm.conf file

OpenSM performance depends on the number of available InfiniBand ports on the device. You can customize it by editing /etc/rdma/opensm.conf file.

Prerequisites

  • Only one InfiniBand port is installed on the server.

Procedure

  1. Edit the /etc/rdma/opensm.conf file and customize the settings to match your environment.

    After updating an opensm package, the yum utility overrides the /etc/rdma/opensm.conf and creates a copy which is the new OpenSM configuration file /etc/rdma/opensm.conf.rpmnew. So, you can compare the previous and new files to identify changes and incorporate them manually in file opensm.conf.

  2. Restart the opensm service:

    # systemctl restart opensm

5.4. Configuring multiple OpenSM instances

OpenSM is an InfiniBand constrained subnet manager and administrator. To provide high performance, OpenSM uses switched fabric network topology that interconnects with InfiniBand network nodes.

Prerequisites

  • One or more InfiniBand ports are installed on the server.

Procedure

  1. Copy the /etc/rdma/opensm.conf file to /etc/rdma/opensm.conf.orig file:

    # cp /etc/rdma/opensm.conf /etc/rdma/opensm.conf.orig

    When you install an updated opensm package, the yum utility overrides the /etc/rdma/opensm.conf. With the copy created in this step, compare the previous and new files to identify changes and incorporate them manually in the instance-specific opensm.conf files.

  2. Create a copy of the /etc/rdma/opensm.conf file:

    # cp /etc/rdma/opensm.conf /etc/rdma/opensm.conf.1

    For each instance, create and append a unique and continuous number to a copy of the configuration file.

    After updating the opensm package, the yum utility stores the new OpenSM configuration file as /etc/rdma/opensm.conf.rpmnew. Compare this file with your customized /etc/rdma/opensm.conf.\* files, and manually incorporate the changes.

  3. Edit the copy you created in the previous step, and customize the settings for the instance to match your environment. For example, set the guid, subnet_prefix, and logdir parameters.
  4. Optionally, create a partitions.conf file with a unique name specifically for this subnet and reference that file in the partition_config_file parameter in the corresponding copy of the opensm.conf file.
  5. Repeat the previous steps for each instance you want to create.
  6. Start the opensm service:

    # systemctl start opensm

    The opensm service automatically starts a unique instance for each opensm.conf.* file in the /etc/rdma/ directory. If multiple opensm.conf.* files exist, the service ignores settings in the /etc/sysconfig/opensm file as well as in the base /etc/rdma/opensm.conf file.

5.5. Creating a partition configuration

Partitions enable administrators to create subnets on InfiniBand similar to Ethernet VLANs.

Important

If you define a partition with a specific speed such as 40 Gbps, all hosts within this partition must support this speed minimum. If a host does not meet the speed requirements, it cannot join the partition. Therefore, set the speed of a partition to the lowest speed supported by any host with permission to join the partition.

Prerequisites

  • One or more InfiniBand ports are installed on the server

Procedure

  1. Edit the /etc/rdma/partitions.conf file to configure the partitions as follows:

    Note

    All fabrics must contain the 0x7fff partition, and all switches and all hosts must belong to that fabric.

    Add the following content to the file to create the 0x7fff default partition at a reduced speed of 10 Gbps, and a partition 0x0002 with a speed of 40 Gbps:

    # For reference:
    # IPv4 IANA reserved multicast addresses:
    #   http://www.iana.org/assignments/multicast-addresses/multicast-addresses.txt
    # IPv6 IANA reserved multicast addresses:
    #   http://www.iana.org/assignments/ipv6-multicast-addresses/ipv6-multicast-addresses.xml
    #
    # mtu =
    #   1 = 256
    #   2 = 512
    #   3 = 1024
    #   4 = 2048
    #   5 = 4096
    #
    # rate =
    #   2  = 2.5 GBit/s
    #   3  = 10   GBit/s
    #   4  = 30   GBit/s
    #   5  = 5   GBit/s
    #   6  = 20   GBit/s
    #   7  = 40   GBit/s
    #   8  = 60   GBit/s
    #   9  = 80   GBit/s
    #   10 = 120   GBit/s
    
    Default=0x7fff, rate=3, mtu=4, scope=2, defmember=full:
        ALL, ALL_SWITCHES=full;
    Default=0x7fff, ipoib, rate=3, mtu=4, scope=2:
        mgid=ff12:401b::ffff:ffff   # IPv4 Broadcast address
        mgid=ff12:401b::1           # IPv4 All Hosts group
        mgid=ff12:401b::2           # IPv4 All Routers group
        mgid=ff12:401b::16          # IPv4 IGMP group
        mgid=ff12:401b::fb          # IPv4 mDNS group
        mgid=ff12:401b::fc          # IPv4 Multicast Link Local Name Resolution group
        mgid=ff12:401b::101         # IPv4 NTP group
        mgid=ff12:401b::202         # IPv4 Sun RPC
        mgid=ff12:601b::1           # IPv6 All Hosts group
        mgid=ff12:601b::2           # IPv6 All Routers group
        mgid=ff12:601b::16          # IPv6 MLDv2-capable Routers group
        mgid=ff12:601b::fb          # IPv6 mDNS group
        mgid=ff12:601b::101         # IPv6 NTP group
        mgid=ff12:601b::202         # IPv6 Sun RPC group
        mgid=ff12:601b::1:3         # IPv6 Multicast Link Local Name Resolution group
        ALL=full, ALL_SWITCHES=full;
    
    ib0_2=0x0002, rate=7, mtu=4, scope=2, defmember=full:
            ALL, ALL_SWITCHES=full;
    ib0_2=0x0002, ipoib, rate=7, mtu=4, scope=2:
        mgid=ff12:401b::ffff:ffff   # IPv4 Broadcast address
        mgid=ff12:401b::1           # IPv4 All Hosts group
        mgid=ff12:401b::2           # IPv4 All Routers group
        mgid=ff12:401b::16          # IPv4 IGMP group
        mgid=ff12:401b::fb          # IPv4 mDNS group
        mgid=ff12:401b::fc          # IPv4 Multicast Link Local Name Resolution group
        mgid=ff12:401b::101         # IPv4 NTP group
        mgid=ff12:401b::202         # IPv4 Sun RPC
        mgid=ff12:601b::1           # IPv6 All Hosts group
        mgid=ff12:601b::2           # IPv6 All Routers group
        mgid=ff12:601b::16          # IPv6 MLDv2-capable Routers group
        mgid=ff12:601b::fb          # IPv6 mDNS group
        mgid=ff12:601b::101         # IPv6 NTP group
        mgid=ff12:601b::202         # IPv6 Sun RPC group
        mgid=ff12:601b::1:3         # IPv6 Multicast Link Local Name Resolution group
        ALL=full, ALL_SWITCHES=full;