RHEL6 kvm guest server getting network packet drop
Environment
- Red Hat Enterprise Linux 6.3
Issue
- Ping test shows intermittent packet loss:
[root@kvmhost123 ~]# virsh list --all
Id Name State
----------------------------------------------------
7 guest1001 running
8 guest1002 running
11 guest1003 running
12 guest1004 running
15 guest1005 running
16 guest1006 running
[root@kvmhost123 ~]#
- Packet drop:
Ping statistics for XX.XX.XX.X:
Packets: Sent = 291, Received = 265, Lost = 26 (8% loss),
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 2522ms, Average = 17ms
Resolution
- Upgrade to 6.4.z kernel 2.6.32-358.23.1.el6 or higher, or
- Upgrade to a 6.5 kernel.
Root Cause
A previous change in the bridge multicast code allowed sending general multicast queries in order to achieve faster convergence on startup. To prevent interference with multicast routers, send packets contained a zero source IP address. However, these packets interfered with certain multicast-aware switches, which resulted in the system being flooded with the IGMP membership queries with zero source IP address. A series of patches addresses this problem by disabling multicast queries by default and implementing multicast querier that allows to toggle up sending of general multicast queries if needed.
Diagnostic Steps
Ok, we did two 30 count ping packet tests, we'll focus on the second run.
We miss our 12th sequence
64 bytes from guest1001.example.com (10.XX.XX.XX): icmp_seq=10 ttl=59 time=1.83 ms
64 bytes from guest1001.example.com (10.XX.XX.XX): icmp_seq=11 ttl=59 time=1.61 ms
And then again we miss our 19 through 24 sequence
64 bytes from guest1001.example.com (10.XX.XX.XX): icmp_seq=18 ttl=59 time=1.82 ms
64 bytes from guest1001.example.com (10.XX.XX.XX): icmp_seq=25 ttl=59 time=1.72 ms
We see all ping requests from the kvm host bridge tcpdump.
But we don't see the ping requests in kvm guest or vnet interace.
What stands out is that during each missed ping sequence there is an IGMPv2 query close by?
674 49.982981 10.XX.XX.XX 10.XX.XX.XX ICMP 98 Echo (ping) reply id=0xd26b, seq=11/2816, ttl=64
675 50.020287 0.0.0.0 all-systems.mcast.net IGMPv2 46 Membership Query, general
790 54.705071 0.0.0.0 all-systems.mcast.net IGMPv2 56 Membership Query, general
816 56.991950 10.XX.XX.XX 10.XX.XX.XX ICMP 98 Echo (ping) reply id=0xd26b, seq=18/4608, ttl=64
817 57.122483 10.XX.XX.XX 232.XX.XXX.XX IGMPv2 46 Membership Report group 232.XX.XXX.XX
So it is evident that the IGMPv2 query is causing problems.
Looking at the last 6.4z changelog we see a solid match to this issue
Documented in the following BZ:
bridge: sending IGMP membership query with zero source address causes the switch to flood
bridge: sending IGMP membership query with zero source address causes the switch to flood. [rhel-6.4.z]
bridge: sending IGMP membership query with zero source address causes the switch to flood.
We will need you to update the kernel to 2.6.32-358.23.1.el6 (6.4.z) or a 6.5 kernel which should have the fix as well.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
