SRIOV NIC drivers can allocate up to 500MB of memory for ring buffers per PF

Solution In Progress - Updated -

Environment

  • Red Hat Openshift Container Platform 4.16
    • with kubernetes v1.31.8
    • Note Openshift is not required to experience this issue
  • Red Hat Enterprise Linux 9.4 and above
  • Intel ice, i40e, ixgbe drivers
    • Intel iavf, ixgbevf drivers
  • This is not limited to Intel drivers. Affects all drivers that allocate memory upfront for ring buffers.
  • 16 PFs.

Issue

  • ice driver consumes 15GB of memory with below call trace when VFs are created.
  • Every time we create 1 VF on a NIC port, memory usage increases by approximately 500 MB.
  • Removing the VF brings back the memory to normal. So we suspect its not a memory leak, but unexpected use of 500MB for each VF which seems not reasonable. It should be a maximum of 40MB.

Resolution

  • If possible only set the VFs MTU to 9000 and leave the PF as 1500.
  • Otherwise consider decreasing the number of NIC queues with e.g: ethtool -L <PF> combined 8, also if possible reduce the number of RX descriptors with e.g: ethtool -G <PF> rx 512 for PFs.

Root Cause

  • This issue is triggered when the VF is provisioned. OpenShift is one common environment where this configuration is triggered, but the issue is not limited to the OpenShift environment.
  • This is expected because the SRIOV operator assigns VFs to multiple PFs and increases the PF MTU to 9000 at the same time. The bulk of the memory increase is due to the MTU 9000 setting on the PF which requires 6x the memory of the default 1500 MTU. The SRIOV operator sets the MTU to 9000 on a PF and then assigns a VF. It uses multiple PFs in the process and by setting the MTU to 9000 increases the memory dramatically considering 2048 descriptors of 9000 bytes are added for each of the 64 NIC queues per PF.

Diagnostic Steps

Call trace:
+++
7.00 GB
917056 times, 1834112 pages, allocated by OTHERS :
Page allocated via order 1, mask 0x162a20(GFP_ATOMIC|__GFP_NOWARN|__GFP_COMP|__GFP_MEMALLOC|__GFP_HARDWALL), pid 6992, tgid 6837 (sriov-network-c), ts  ns
 get_page_from_freelist+0x387/0x530
 __alloc_pages+0xf2/0x250
 ice_alloc_rx_bufs+0xcc/0x1c0 [ice]
 ice_vsi_cfg_rxq+0x108/0x290 [ice]
 ice_vsi_cfg_rxqs+0x6b/0xa0 [ice]
 ice_down_up+0x2e/0x60 [ice]
 ice_change_mtu+0xbd/0x140 [ice]
 dev_set_mtu_ext+0xed/0x200
 do_setlink+0x1a6/0xc00
 rtnl_setlink+0xe5/0x180
 rtnetlink_rcv_msg+0x159/0x3d0
 netlink_rcv_skb+0x54/0x100
 netlink_unicast+0x23b/0x360
 netlink_sendmsg+0x24c/0x4c0
 __sys_sendto+0x1dc/0x1f0
 __x64_sys_sendto+0x20/0x30

4.00 GB
524032 times, 1048064 pages, allocated by OTHERS :
Page allocated via order 1, mask 0x162a20(GFP_ATOMIC|__GFP_NOWARN|__GFP_COMP|__GFP_MEMALLOC|__GFP_HARDWALL), pid 6985, tgid 6837 (sriov-network-c), ts  ns
 get_page_from_freelist+0x387/0x530
 __alloc_pages+0xf2/0x250
 ice_alloc_rx_bufs+0xcc/0x1c0 [ice]
 ice_vsi_cfg_rxq+0x108/0x290 [ice]
 ice_vsi_cfg_rxqs+0x6b/0xa0 [ice]
 ice_down_up+0x2e/0x60 [ice]
 ice_change_mtu+0xbd/0x140 [ice]
 dev_set_mtu_ext+0xed/0x200
 do_setlink+0x1a6/0xc00
 rtnl_setlink+0xe5/0x180
 rtnetlink_rcv_msg+0x159/0x3d0
 netlink_rcv_skb+0x54/0x100
 netlink_unicast+0x23b/0x360
 netlink_sendmsg+0x24c/0x4c0
 __sys_sendto+0x1dc/0x1f0
 __x64_sys_sendto+0x20/0x30
  • smem -twk will show that 500MB of memory is consumed for every VF when added to a different PF:
Test result
=============
+++
# 0 VFs
Area                           Used      Cache   Noncache
firmware/hardware                 0          0          0
kernel image                      0          0          0
kernel dynamic memory         33.0G      11.4G      21.6G
userspace memory              15.7G       5.6G      10.1G
free memory                   13.7G      13.7G          0
----------------------------------------------------------
                              62.5G      30.8G      31.7G

# 1 VF (1 new VF on ens17f0)
Area                           Used      Cache   Noncache
firmware/hardware                 0          0          0
kernel image                      0          0          0
kernel dynamic memory         33.0G      11.3G      21.6G
userspace memory              15.8G       5.7G      10.1G
free memory                   13.7G      13.7G          0
----------------------------------------------------------
                              62.5G      30.7G      31.7G

# 2 VF (1 new VF on ens17f1)
Area                           Used      Cache   Noncache
firmware/hardware                 0          0          0
kernel image                      0          0          0
kernel dynamic memory         33.5G      11.4G      22.1G
userspace memory              15.8G       5.7G      10.1G
free memory                   13.2G      13.2G          0
----------------------------------------------------------
                              62.5G      30.2G      32.3G

# 3 VF (1 new VF on ens17f2)
Area                           Used      Cache   Noncache
firmware/hardware                 0          0          0
kernel image                      0          0          0
kernel dynamic memory         34.0G      11.4G      22.6G
userspace memory              15.8G       5.7G      10.1G
free memory                   12.7G      12.7G          0
----------------------------------------------------------
                              62.5G      29.7G      32.7G

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments