Configuring Mellanox mlx5 cards in Red Hat Enterprise Linux 7

Updated -

To configure Mellanox mlx5 cards, use the mstconfig program from the mstflint package. Install the package using the yum command:

~]$ sudo yum install mstflint

Use the lspci command to get an ID of the device:

~]$ lspci | grep Mellanox
04:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
04:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
83:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
83:00.1 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]

Query the selected card using the mstconfig command:

~]$ mstconfig -d 04:00.0 q

Device #1:
----------

Device type:    ConnectX4LX     
PCI device:     04:00.0         

Configurations:                              Current
         SRIOV_EN                            True(1)         
         NUM_OF_VFS                          8               
         PF_LOG_BAR_SIZE                     5               
         VF_LOG_BAR_SIZE                     0               
         NUM_PF_MSIX                         63              
         NUM_VF_MSIX                         11              
         LOG_DCR_HASH_TABLE_SIZE             14              
         DCR_LIFO_SIZE                       16384           
         ROCE_NEXT_PROTOCOL                  254             
         ROCE_CC_ALGORITHM_P1                ECN(0)          
         ROCE_CC_PRIO_MASK_P1                0               
         ROCE_CC_ALGORITHM_P2                ECN(0)          
         ROCE_CC_PRIO_MASK_P2                0               
         CLAMP_TGT_RATE_P1                   0               
         CLAMP_TGT_RATE_AFTER_TIME_INC_P1    1               
         RPG_TIME_RESET_P1                   600             
         RPG_BYTE_RESET_P1                   32767           
         RPG_THRESHOLD_P1                    5               
         RPG_MAX_RATE_P1                     0               
         RPG_AI_RATE_P1                      5               
         RPG_HAI_RATE_P1                     50              
         RPG_GD_P1                           11              
         RPG_MIN_DEC_FAC_P1                  50              
         RPG_MIN_RATE_P1                     1               
         RATE_TO_SET_ON_FIRST_CNP_P1         100             
         DCE_TCP_G_P1                        4               
         DCE_TCP_RTT_P1                      1               
         RATE_REDUCE_MONITOR_PERIOD_P1       4               
         INITIAL_ALPHA_VALUE_P1              0               
         MIN_TIME_BETWEEN_CNPS_P1            0               
         CNP_DSCP_P1                         7               
         CNP_802P_PRIO_P1                    0               
         CLAMP_TGT_RATE_P2                   0               
         CLAMP_TGT_RATE_AFTER_TIME_INC_P2    1               
         RPG_TIME_RESET_P2                   600             
         RPG_BYTE_RESET_P2                   32767           
         RPG_THRESHOLD_P2                    5               
         RPG_MAX_RATE_P2                     0               
         RPG_AI_RATE_P2                      5               
         RPG_HAI_RATE_P2                     50              
         RPG_GD_P2                           11              
         RPG_MIN_DEC_FAC_P2                  50              
         RPG_MIN_RATE_P2                     1               
         RATE_TO_SET_ON_FIRST_CNP_P2         100             
         DCE_TCP_G_P2                        4               
         DCE_TCP_RTT_P2                      1               
         RATE_REDUCE_MONITOR_PERIOD_P2       4               
         INITIAL_ALPHA_VALUE_P2              0               
         MIN_TIME_BETWEEN_CNPS_P2            0               
         CNP_DSCP_P2                         7               
         CNP_802P_PRIO_P2                    0               
         PORT_OWNER                          True(1)         
         ALLOW_RD_COUNTERS                   True(1)         
         IP_VER                              IPv4(0)         
         NUM_OF_TC_P1                        8_TCS(0)        
         NUM_OF_VL_P1                        4_VLS(3)        
         NUM_OF_TC_P2                        8_TCS(0)        
         NUM_OF_VL_P2                        4_VLS(3)        
         LLDP_NB_RX_MODE_P1                  0               
         LLDP_NB_TX_MODE_P1                  0               
         LLDP_NB_DCBX_P1                     False(0)        
         LLDP_NB_RX_MODE_P2                  0               
         LLDP_NB_TX_MODE_P2                  0               
         LLDP_NB_DCBX_P2                     False(0)        
         DCBX_IEEE_P1                        True(1)         
         DCBX_CEE_P1                         True(1)         
         DCBX_WILLING_P1                     True(1)         
         DCBX_IEEE_P2                        True(1)         
         DCBX_CEE_P2                         True(1)         
         DCBX_WILLING_P2                     True(1)         

Note that the card in the example output is an Ethernet-only card, so there is no port type setting.

Query the ConnectX4 type:

~]$ mstconfig -d 83:00.0 q
Device #1:
----------

Device type:    ConnectX4       
PCI device:     83:00.0         

Configurations:                              Current
         SRIOV_EN                            False(0)        
         NUM_OF_VFS                          0               
         PF_LOG_BAR_SIZE                     5               
         VF_LOG_BAR_SIZE                     1               
         NUM_PF_MSIX                         63              
         NUM_VF_MSIX                         11              
         LINK_TYPE_P1                        IB(1)           
         LINK_TYPE_P2                        IB(1)           
         LOG_DCR_HASH_TABLE_SIZE             14              
         DCR_LIFO_SIZE                       16384           
         ROCE_NEXT_PROTOCOL                  254             
         ROCE_CC_ALGORITHM_P1                ECN(0)          
         ROCE_CC_PRIO_MASK_P1                0               
         ROCE_CC_ALGORITHM_P2                ECN(0)          
         ROCE_CC_PRIO_MASK_P2                0               
         CLAMP_TGT_RATE_P1                   0               
         CLAMP_TGT_RATE_AFTER_TIME_INC_P1    1               
         RPG_TIME_RESET_P1                   600             
         RPG_BYTE_RESET_P1                   32767           
         RPG_THRESHOLD_P1                    5               
         RPG_MAX_RATE_P1                     0               
         RPG_AI_RATE_P1                      5               
         RPG_HAI_RATE_P1                     50              
         RPG_GD_P1                           11              
         RPG_MIN_DEC_FAC_P1                  50              
         RPG_MIN_RATE_P1                     1               
         RATE_TO_SET_ON_FIRST_CNP_P1         100             
         DCE_TCP_G_P1                        4               
         DCE_TCP_RTT_P1                      1               
         RATE_REDUCE_MONITOR_PERIOD_P1       4               
         INITIAL_ALPHA_VALUE_P1              0               
         MIN_TIME_BETWEEN_CNPS_P1            0               
         CNP_DSCP_P1                         7               
         CNP_802P_PRIO_P1                    0               
         CLAMP_TGT_RATE_P2                   0               
         CLAMP_TGT_RATE_AFTER_TIME_INC_P2    1               
         RPG_TIME_RESET_P2                   600             
         RPG_BYTE_RESET_P2                   32767           
         RPG_THRESHOLD_P2                    5               
         RPG_MAX_RATE_P2                     0               
         RPG_AI_RATE_P2                      5               
         RPG_HAI_RATE_P2                     50              
         RPG_GD_P2                           11              
         RPG_MIN_DEC_FAC_P2                  50              
         RPG_MIN_RATE_P2                     1               
         RATE_TO_SET_ON_FIRST_CNP_P2         100             
         DCE_TCP_G_P2                        4               
         DCE_TCP_RTT_P2                      1               
         RATE_REDUCE_MONITOR_PERIOD_P2       4               
         INITIAL_ALPHA_VALUE_P2              0               
         MIN_TIME_BETWEEN_CNPS_P2            0               
         CNP_DSCP_P2                         7               
         CNP_802P_PRIO_P2                    0               
         PORT_OWNER                          True(1)         
         ALLOW_RD_COUNTERS                   True(1)         
         IP_VER                              IPv4(0)         
         NUM_OF_TC_P1                        8_TCS(0)        
         NUM_OF_VL_P1                        4_VLS(3)        
         NUM_OF_TC_P2                        8_TCS(0)        
         NUM_OF_VL_P2                        4_VLS(3)        
         LLDP_NB_RX_MODE_P1                  0               
         LLDP_NB_TX_MODE_P1                  0               
         LLDP_NB_DCBX_P1                     False(0)        
         LLDP_NB_RX_MODE_P2                  0               
         LLDP_NB_TX_MODE_P2                  0               
         LLDP_NB_DCBX_P2                     False(0)        
         DCBX_IEEE_P1                        True(1)         
         DCBX_CEE_P1                         True(1)         
         DCBX_WILLING_P1                     True(1)         
         DCBX_IEEE_P2                        True(1)         
         DCBX_CEE_P2                         True(1)         
         DCBX_WILLING_P2                     True(1)

In this example, the card is a Virtual Protocol Interconnect (VPI) card, and the LINK_TYPE_P1 and LINK_TYPE_P2 options control port mode. Available values are the following: 1 - IB, 2 - Ethernet, and 3 - VPI (autodetect).

~]$ mstconfig -d 83:00.0 set LINK_TYPE_P1=2
Device #1:
----------

Device type:    ConnectX4       
PCI device:     83:00.0         

Configurations:                              Current         New
         LINK_TYPE_P1                        IB(1)           ETH(2)          

 Apply new Configuration? ? (y/n) [n] : 
Applying... Done!
-I- Please reboot machine to load new configurations.

For all mlx5 driver-based devices, this is the preferred means of setting the port type for each port and for enabling (or disabling) SRIOV, as well as setting other options on Ethernet ports like the boot mode (PXE or UEFI, VLAN, IPv4/IPv6, Enabled/Disabled). These settings are persistent after system restarts and require no startup files once enabled in the firmware using the mstconfig command.

See the mstconfig(1) man page for more details on non-volatile configurable options of Mellanox Host Channel Adapters (HCA) provided by this command.

See the Configure InfiniBand and RDMA Networks chapter in the Red Hat Enterprise Linux 7 Networking Guide for configuration scenarios.

1 Comments

Documentation about how to use DCBX on mlx5 and make the whole lossless setup. Mellanox has some info that use their tools, but a good description for RHEL is lacking. Information about how to verify that a lossless setup is working. I.e. verify that pause frames are send for a specific priority (PFC).