Chapter 31. Tuning the network performance
Tuning the network settings is a complex process with many factors to consider. For example, this includes the CPU-to-memory architecture, the amount of CPU cores, and more. Red Hat Enterprise Linux uses default settings that are optimized for most scenarios. However, in certain cases, it can be necessary to tune network settings to increase the throughput or latency or to solve problems, such as packet drops.
31.1. Configuring an operating system to optimize access to network resources
You can configure the operating system to present optimized access to network resources across their workloads. Network performance problems are sometimes the result of hardware malfunction or faulty infrastructure. Resolving these issues is beyond the scope of this document.
The TuneD
service provides a several different profiles to improve performance in a several specific use cases:
-
latency-performance
-
network-latency
-
network-throughput
31.1.1. Tools for monitoring and diagnosing performance issues
The following are the available tools in Red Hat Enterprise Linux 9, which are used for monitoring system performance and diagnosing performance problems related to the networking subsystem:
-
ss
utility prints statistical information about sockets, enables administrators to assess device performance over time. By default,ss
displays open non-listening sockets that have established connections. Using command-line options, administrators can filter out statistics about specific sockets. Red Hat recommendsss
over the deprecatednetstat
in Red Hat Enterprise Linux -
ip
utility lets administrators manage and monitor routes, devices, routing policies, and tunnels. Theip
monitor command can continuously monitor the state of devices, addresses, and routes. Use the-j
option to display the output in JSON format, which can be further provided to other utilities to automate information processing. -
dropwatch
is an interactive tool, provided by thedropwatch
package. It monitors and records packets that are dropped by the kernel. -
ethtool
utility enables administrators to view and edit network interface card settings. Use this tool to observe the statistics, such as the number of packets dropped by that device, of certain devices. Using theethtool -S device name
command, view the status of a specified device’s counters of the device you want to monitor. -
/proc/net/snmp
file displays data that thesnmp
agent uses for IP, ICMP, TCP and UDP monitoring and management. Examining this file on a regular basis helps administrators to identify unusual values and thereby identify potential performance problems. For example, an increase in UDP input errors (InErrors
) in the/proc/net/snmp
file can indicate a bottleneck in a socket receive queue. -
nstat
tool monitors kernel SNMP and network interface statistics. This tool reads data from the/proc/net/snmp
file and prints the information in a human readable format. By default, the
SystemTap
scripts, provided by the systemtap-client package are installed in the/usr/share/systemtap/examples/network
directory:-
nettop.stp
: Every 5 seconds, the script displays a list of processes (process identifier and command) with the number of packets sent and received and the amount of data sent and received by the process during that interval. -
socket-trace.stp
: Instruments each of the functions in the Linux kernel’snet/socket.c
file, and displays trace data. -
dropwatch.stp
: Every 5 seconds, the script displays the number of socket buffers freed at locations in the kernel. Use the--all-modules
option to see symbolic names. -
latencytap.stp
: This script records the effect that different types of latency have on one or more processes. It prints a list of latency types every 30 seconds, sorted in descending order by the total time the process or processes spent waiting. This can be useful for identifying the cause of both storage and network latency.
Red Hat recommends using the
--all-modules
option with this script to better enable the mapping of latency events. By default, this script is installed in the/usr/share/systemtap/examples/profiling
directory.-
-
BPF Compiler Collection (BCC) is a library, which facilitates the creation of the extended Berkeley Packet Filter (
eBPF
) programs. The main utility of theeBPF
programs is analyzing OS performance and network performance without experiencing overhead or security issues.
Additional resources
-
ss(8)
,ethtool(8)
,nettop(1)
,ip(8)
,dropwatch(1)
, andsystemtap(8)
man pages -
/usr/share/systemtap/examples/network
directory -
/usr/share/doc/bcc/README.md
file - How to write a NetworkManager dispatcher script to apply ethtool commands? Red Hat Knowlegebase solution
- Configuring ethtool offload features
31.1.2. Bottlenecks in a packet reception
While the network stack is largely self-optimizing, there are a number of points during network packet processing that can become bottlenecks and reduce the performance.
The following are the issues that can cause bottleneck:
The buffer or ring buffer of the network card
-
The hardware buffer can be a bottleneck if the kernel drops a large number of packets. Use the
ethtool
utility for monitoring a system for dropped packets. The hardware or software interrupt queues
- Interrupts can increase latency and processor contention. For information on how the processor handles interrupts, see Overview of an interrupt request, Balancing interrupts manually, and Setting the smp_affinity mask.
The socket receive queue of the application
-
A large number of packets that are not copied or by an increase in the UDP input errors (
InErrors
) in the/proc/net/snmp
file, indicates a bottleneck in an application’s receive queue.
If the hardware buffer drops a large number of packets, the following are the few potential solutions:
Slow the input traffic
- Filter the incoming traffic, reduce the number of joined multicast groups, or reduce the amount of broadcast traffic to decrease the rate at which the queue fills.
Resize the hardware buffer queue
Resize the hardware buffer queue: Reduce the number of packets being dropped by increasing the size of the queue so that it does not overflow as easily. You can modify the
rx/tx
parameters of the network device with theethtool
command:ethtool --set-ring device-name value
Change the drain rate of the queue
Decrease the rate at which the queue fills by filtering or dropping packets before they reach the queue, or by lowering the weight of the device. Filter incoming traffic or lower the network interface card’s device weight to slow incoming traffic.
The device weight refers to the number of packets a device can receive at one time in a single scheduled processor access. You can increase the rate at which a queue is drained by increasing its device weight that is controlled by the
dev_weight
kernel setting. To temporarily alter this parameter, change the contents of the/proc/sys/net/core/dev_weight
file, or to permanently alter, use thesysctl
command, which is provided by theprocps-ng
package.Increase the length of the application’s socket queue: This is typically the easiest way to improve the drain rate of a socket queue, but it is unlikely to be a long-term solution. If a socket queue receives a limited amount of traffic in bursts, increasing the depth of the socket queue to match the size of the bursts of traffic may prevent packets from being dropped. To increase the depth of a queue, increase the size of the socket receive buffer by making either of the following changes:
-
Increase the value of the
/proc/sys/net/core/rmem_default
parameter: This parameter controls the default size of the receive buffer used by sockets. This value must be smaller than or equal to the value of the/proc/sys/net/core/rmem_max
parameter. -
Use the
setsockopt
to configure a largerSO_RCVBUF
value: This parameter controls the maximum size in bytes of a socket’s receive buffer. Use thegetsockopt
system call to determine the current value of the buffer.
-
Increase the value of the
Altering the drain rate of a queue is usually the simplest way to mitigate poor network performance. However, increasing the number of packets that a device can receive at one time uses additional processor time, during which no other processes can be scheduled, so this can cause other performance problems.
Additional resources
-
ss(8)
,socket(7)
, andethtool(8)
man pages -
/proc/net/snmp
file
31.1.3. Busy polling
If analysis reveals high latency, your system may benefit from the poll-based rather than interrupt-based packet receipt.
Busy polling helps to reduce latency in the network receive path by allowing socket layer code to poll the receive queue of a network device, and disables network interrupts. This removes delays caused by the interrupt and the resultant context switch. However, it also increases CPU utilization. Busy polling also prevents the CPU from sleeping, which can incur additional power consumption. Busy polling behavior is supported by all the device drivers.
Additional resources
31.1.3.1. Enabling busy polling
By default, the busy polling is disabled. This procedure describes how to enable busy polling.
Procedure
Ensure if the
CONFIG_NET_RX_BUSY_POLL
compilation option is enabled:# cat /boot/config-$(uname -r) | grep CONFIG_NET_RX_BUSY_POLL CONFIG_NET_RX_BUSY_POLL=y
Enable busy polling
To enable busy polling on specific sockets, set the
sysctl.net.core.busy_poll
kernel value to a value other than0
:# echo "net.core.busy_poll=50" > /etc/sysctl.d/95-enable-busy-polling-for-sockets.conf # sysctl -p /etc/sysctl.d/95-enable-busy-polling-for-sockets.conf
This parameter controls the number of microseconds to wait for packets on the socket poll and select
syscalls
. Red Hat recommends a value of50
.-
Add the
SO_BUSY_POLL
socket option to the socket. To enable busy polling globally, set the
sysctl.net.core.busy_read
to a value other than0
:# echo "net.core.busy_read=50" > /etc/sysctl.d/95-enable-busy-polling-globally.conf # sysctl -p /etc/sysctl.d/95-enable-busy-polling-globally.conf
The
net.core.busy_read
parameter controls the number of microseconds to wait for packets on the device queue for socket reads. It also sets the default value of theSO_BUSY_POLL
option. Red Hat recommends a value of50
for a small number of sockets, and a value of100
for large numbers of sockets. For extremely large numbers of sockets, for example more than several hundred, use theepoll
system call instead.
Additional resources
-
ethtool(8)
,socket(7)
,sysctl(8)
, andsysctl.conf(5)
man pages - Configuring ethtool offload features
31.1.4. Receive-Side Scaling
Receive-Side Scaling (RSS), also known as multi-queue receive, distributes network receive processing across several hardware-based receive queues, allowing inbound network traffic to be processed by multiple CPUs. RSS can be used to relieve bottlenecks in receive interrupt processing caused by overloading a single CPU, and to reduce network latency. By default, RSS is enabled.
The number of queues or the CPUs that should process network activity for RSS are configured in the appropriate network device driver:
-
For the
bnx2x
driver, it is configured in thenum_queues
parameter. -
For the
sfc
driver, it is configured in therss_cpus
parameter.
Regardless, it is typically configured in the /sys/class/net/device/queues/rx-queue/
directory, where device is the name of the network device (such as enp1s0
) and rx-queue is the name of the appropriate receive queue.
The irqbalance
daemon can be used in conjunction with RSS to reduce the likelihood of cross-node memory transfers and cache line bouncing. This lowers the latency of processing network packets.
31.1.4.1. Viewing the interrupt request queues
When configuring Receive-Side Scaling (RSS), Red Hat recommends limiting the number of queues to one per physical CPU core. Hyper-threads are often represented as separate cores in analysis tools, but configuring queues for all cores including logical cores such as hyper-threads has not proven beneficial to network performance.
When enabled, RSS distributes network processing equally between available CPUs based on the amount of processing each CPU has queued. However, use the --show-rxfh-indir
and --set-rxfh-indir
parameters of the ethtool
utility, to modify how RHEL distributes network activity, and weigh certain types of network activity as more important than others.
This procedure describes how to view the interrupt request queues.
Procedure
To determine whether your network interface card supports RSS, check whether multiple interrupt request queues are associated with the interface in
/proc/interrupts
:# egrep 'CPU|p1p1' /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 89: 40187 0 0 0 0 0 IR-PCI-MSI-edge p1p1-0 90: 0 790 0 0 0 0 IR-PCI-MSI-edge p1p1-1 91: 0 0 959 0 0 0 IR-PCI-MSI-edge p1p1-2 92: 0 0 0 3310 0 0 IR-PCI-MSI-edge p1p1-3 93: 0 0 0 0 622 0 IR-PCI-MSI-edge p1p1-4 94: 0 0 0 0 0 2475 IR-PCI-MSI-edge p1p1-5
The output shows that the NIC driver created 6 receive queues for the
p1p1
interface (p1p1-0
throughp1p1-5
). It also shows how many interrupts were processed by each queue, and which CPU serviced the interrupt. In this case, there are 6 queues because by default, this particular NIC driver creates one queue per CPU, and this system has 6 CPUs. This is a fairly common pattern among NIC drivers.To list the interrupt request queue for a PCI device with the address
0000:01:00.0
:# ls -1 /sys/devices/*/*/0000:01:00.0/msi_irqs 101 102 103 104 105 106 107 108 109
31.1.5. Receive Packet Steering
Receive Packet Steering (RPS) is similar to Receive-Side Scaling (RSS) in that it is used to direct packets to specific CPUs for processing. However, RPS is implemented at the software level, and helps to prevent the hardware queue of a single network interface card from becoming a bottleneck in network traffic. By default, RPS is disabled.
RPS has several advantages over hardware-based RSS:
- RPS can be used with any network interface card.
- It is easy to add software filters to RPS to deal with new protocols.
- RPS does not increase the hardware interrupt rate of the network device. However, it does introduce inter-processor interrupts.
RPS is configured per network device and receive queue, in the /sys/class/net/device/queues/rx-queue/rps_cpus
file, where device is the name of the network device, such as enp1s0 and rx-queue is the name of the appropriate receive queue, such as rx-0.
The default value of the rps_cpus
file is 0. This disables RPS, and the CPU handles the network interrupt and also processes the packet. To enable RPS, configure the appropriate rps_cpus
file with the CPUs that should process packets from the specified network device and receive queue.
The rps_cpus
files use comma-delimited CPU bitmaps. Therefore, to allow a CPU to handle interrupts for the receive queue on an interface, set the value of their positions in the bitmap to 1. For example, to handle interrupts with CPUs 0
, 1
, 2
, and 3
, set the value of the rps_cpus
to f
, which is the hexadecimal value for 15
. In binary representation, 15
is 00001111 (1+2+4+8)
.
For network devices with single transmit queues, best performance can be achieved by configuring RPS to use CPUs in the same memory domain. On non-NUMA systems, this means that all available CPUs can be used. If the network interrupt rate is extremely high, excluding the CPU that handles network interrupts may also improve performance.
For network devices with multiple queues, there is typically no benefit to configure both RPS and RSS, as RSS is configured to map a CPU to each receive queue by default. However, RPS can still be beneficial if there are fewer hardware queues than CPUs, and RPS is configured to use CPUs in the same memory domain.
31.1.6. Receive Flow Steering
Receive Flow Steering (RFS) extends Receive Packet Steering (RPS) behavior to increase the CPU cache hit rate and thereby reduce network latency. Where RPS forwards packets based solely on queue length, RFS uses the RPS back end to calculate the most appropriate CPU, then forwards packets based on the location of the application consuming the packet. This increases CPU cache efficiency.
Data received from a single sender is not sent to more than one CPU. If the amount of data received from a single sender is greater than a single CPU can handle, configure a larger frame size to reduce the number of interrupts and therefore the amount of processing work for the CPU. Alternatively, consider NIC offload options or faster CPUs.
Consider using numactl
or taskset
in conjunction with RFS to pin applications to specific cores, sockets, or NUMA nodes. This can help prevent packets from being processed out of order.
31.1.6.1. Enabling Receive Flow Steering
By default, Receive Flow Steering (RFS) is disabled. This procedure describes how to enable RFS.
Procedure
Set the value of the
net.core.rps_sock_flow_entries
kernel value to the maximum expected number of concurrently active connections:# echo "net.core.rps_sock_flow_entries=32768" > /etc/sysctl.d/95-enable-rps.conf
NoteRed Hat recommends a value of
32768
for moderate server loads. All values entered are rounded up to the nearest power of2
in practice.Persistently set the value of the
net.core.rps_sock_flow_entries
:# sysctl -p /etc/sysctl.d/95-enable-rps.conf
To temporarily set the value of the
sys/class/net/device/queues/rx-queue/rps_flow_cnt
file to the value of the (rps_sock_flow_entries/N
), where N is the number of receive queues on a device:# echo 2048 > /sys/class/net/device/queues/rx-queue/rps_flow_cnt
Replace device with the name of the network device you wish to configure (for example, enp1s0), and rx-queue with the receive queue you wish to configure (for example, rx-0).
Replace N with the number of configured receive queues. For example, if the
rps_flow_entries
is set to32768
and there are16
configured receive queues, therps_flow_cnt = 32786/16= 2048
(that is,rps_flow_cnt = rps_flow_enties/N
).For single-queue devices, the value of
rps_flow_cnt
is the same as the value ofrps_sock_flow_entries
.Persistently enable RFS on all network devices, create the
/etc/udev/rules.d/99-persistent-net.rules
file, and add the following content:SUBSYSTEM=="net", ACTION=="add", RUN{program}+="/bin/bash -c 'for x in /sys/$DEVPATH/queues/rx-*; do echo 2048 > $x/rps_flow_cnt; done'"
Optional: To enable RPS on a specific network device:
SUBSYSTEM=="net", ACTION=="move", NAME="device name" RUN{program}+="/bin/bash -c 'for x in /sys/$DEVPATH/queues/rx-*; do echo 2048 > $x/rps_flow_cnt; done'"
Replace device name with the actual network device name.
Verification steps
Verify if RFS is enabled:
# cat /proc/sys/net/core/rps_sock_flow_entries 32768 # cat /sys/class/net/device/queues/rx-queue/rps_flow_cnt 2048
Additional resources
-
sysctl(8)
man page
31.1.7. Accelerated RFS
Accelerated RFS boosts the speed of Receive Flow Steering (RFS) by adding hardware assistance. Like RFS, packets are forwarded based on the location of the application consuming the packet.
Unlike traditional RFS, however, packets are sent directly to a CPU that is local to the thread consuming the data:
- either the CPU that is executing the application
- or a CPU local to that CPU in the cache hierarchy
Accelerated RFS is only available if the following conditions are met:
-
NIC must support the accelerated RFS. Accelerated RFS is supported by cards that export the
ndo_rx_flow_steer()
net_device
function. Check the NIC’s data sheet to ensure if this feature is supported. -
ntuple
filtering must be enabled. For information on how to enable these filters, see Enabling the ntuple filters.
Once these conditions are met, CPU to queue mapping is deduced automatically based on traditional RFS configuration. That is, CPU to queue mapping is deduced based on the IRQ affinities configured by the driver for each receive queue. For more information on enabling the traditional RFS, see Enabling Receive Flow Steering.
31.1.7.1. Enabling the ntuple filters
The ntuple
filtering must be enabled. Use the ethtool -k
command to enable the ntuple
filters.
Procedure
Display the current status of the
ntuple
filter:# ethtool -k enp1s0 | grep ntuple-filters ntuple-filters: off
Enable the
ntuple
filters:# ethtool -k enp1s0 ntuple on
If the output is ntuple-filters: off [fixed]
, then the ntuple
filtering is disabled and you cannot configure it:
# ethtool -k enp1s0 | grep ntuple-filters
ntuple-filters: off [fixed]
Verification steps
Ensure if
ntuple
filters are enabled:# ethtool -k enp1s0 | grep ntuple-filters ntuple-filters: on
Additional resources
-
ethtool(8)
man page