Dropped Packets - Different NICs
Hi,
We have some HPE DL580 G9 and RHEL 6.5 is installed on those servers.
We are facing with dropped packets on our servers and actually on one of 10Gb NICs.
There is two type of NICs are installed on the servers:
Intel Corporation 82599ES 10-Gigabit SFI/SFP+ (PCI)
Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet (FLR)
The Broadcom NIC works fine without packet drop but Intel NIC is dropping packets. All configurations are same and the NICs are connected to same switch and Jumbo Frame is enabled on both.
Buffer size is increased to hardware maximum on both RX and TX.
There is some questions about this situation:
1. How can we monitor dropping packets via TCPDUMP or WireShark?
2. There is no lot of dropped packets and there is no CRC error, so has it impact on Oracle RAC performance or other clustering solutions?
3. Is there any other recommendation about this situation?
Responses
You can monitor packet drops at the NIC with ethtool -S ethX. Each driver reports differently. The xsos tool includes a grep to catch most common ones, just run xsos --net. An easy way to monitor a lot of stuff over time is our monitor.sh script.
How are you telling you have packet loss? It's not the fdir_miss stat is it? That's Intel Flow Director working normally and is not packet loss.
A packet capture is usually not too useful. The best you can do is infer packet loss from the presence of TCP retranmissions, but even then that loss may be external, without captures along each point in a transfer you have no way of knowing. UDP needs a knowledge of the underlying traffic or at least analysis by IP Identification field. That's all pretty tedious analysis and not the first place I would go, not even something I would consider a realistic option considering the above tools available.
Yes packet loss can have a performance impact at high enough rates. This really depends on the protocol in use, the rate of transfer and loss, and the application. Packet loss on cluster interconnects usually leads to fencing as other nodes seem to "disappear" off the network if enough traffic is lost.
We have the Red Hat Enterprise Linux Network Performance Tuning Guide which is a long but comprehensive read. I gave a cut-down version of that as a talk at Linux.Conf.Au a few years ago.
On most systems you can just increase the NIC RX buffer and persist that change, use a tuned performance profile and turn low C-states off. Those solve packet loss in the overwhelming majority of situations.
Make sure you don't have any silly kernel tunables like TCP Timestamps disabled or TCP SACK disabled. The defaults are usually pretty good.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
