Red Hat Training

A Red Hat training course is available for Red Hat Enterprise Linux

8.4. Resolving Common Queuing/Frame Loss Issues

By far, the most common reason for frame loss is a queue overrun. The kernel sets a limit to the length of a queue, and in some cases the queue fills faster than it drains. When this occurs for too long, frames start to get dropped.
As illustrated in Figure 8.1, “Network receive path diagram”, there are two major queues in the receive path: the NIC hardware buffer and the socket queue. Both queues need to be configured accordingly to protect against queue overruns.

8.4.1. NIC Hardware Buffer

The NIC fills its hardware buffer with frames; the buffer is then drained by the softirq, which the NIC asserts via an interrupt. To interrogate the status of this queue, use the following command:
ethtool -S ethX
Replace ethX with the NIC's corresponding device name. This will display how many frames have been dropped within ethX. Often, a drop occurs because the queue runs out of buffer space in which to store frames.
There are different ways to address this problem, namely:
Input traffic
You can help prevent queue overruns by slowing down input traffic. This can be achieved by filtering, reducing the number of joined multicast groups, lowering broadcast traffic, and the like.
Queue length
Alternatively, you can also increase the queue length. This involves increasing the number of buffers in a specified queue to whatever maximum the driver will allow. To do so, edit the rx/tx ring parameters of ethX using:
ethtool --set-ring ethX
Append the appropriate rx or tx values to the aforementioned command. For more information, refer to man ethtool.
Device weight
You can also increase the rate at which a queue is drained. To do this, adjust the NIC's device weight accordingly. This attribute refers to the maximum number of frames that the NIC can receive before the softirq context has to yield the CPU and reschedule itself. It is controlled by the /proc/sys/net/core/dev_weight variable.
Most administrators have a tendency to choose the third option. However, keep in mind that there are consequences for doing so. Increasing the number of frames that can be received from a NIC in one iteration implies extra CPU cycles, during which no applications can be scheduled on that CPU.