8.4. Resolving Common Queuing/Frame Loss Issues
By far, the most common reason for frame loss is a queue overrun. The kernel sets a limit to the length of a queue, and in some cases the queue fills faster than it drains. When this occurs for too long, frames start to get dropped.
As illustrated in Figure 8.1, “Network receive path diagram”, there are two major queues in the receive path: the NIC hardware buffer and the socket queue. Both queues need to be configured accordingly to protect against queue overruns.
8.4.1. NIC Hardware Buffer
The NIC fills its hardware buffer with frames; the buffer is then drained by the
softirq, which the NIC asserts via an interrupt. To interrogate the status of this queue, use the following command:
ethtool -S ethX
ethXwith the NIC's corresponding device name. This will display how many frames have been dropped within
ethX. Often, a drop occurs because the queue runs out of buffer space in which to store frames.
There are different ways to address this problem, namely:
- Input traffic
- You can help prevent queue overruns by slowing down input traffic. This can be achieved by filtering, reducing the number of joined multicast groups, lowering broadcast traffic, and the like.
- Queue length
- Alternatively, you can also increase the queue length. This involves increasing the number of buffers in a specified queue to whatever maximum the driver will allow. To do so, edit the
txring parameters of
ethtool --set-ring ethXAppend the appropriate
txvalues to the aforementioned command. For more information, refer to
- Device weight
- You can also increase the rate at which a queue is drained. To do this, adjust the NIC's device weight accordingly. This attribute refers to the maximum number of frames that the NIC can receive before the
softirqcontext has to yield the CPU and reschedule itself. It is controlled by the
Most administrators have a tendency to choose the third option. However, keep in mind that there are consequences for doing so. Increasing the number of frames that can be received from a NIC in one iteration implies extra CPU cycles, during which no applications can be scheduled on that CPU.