[RHOSP13][OVN] Intermittent packet drops in OpenvSwitch with Conntrack ACL
Environment
- Red Hat OpenStack Platform release 13.0.4 (Queens)
- Red Hat Enterprise Linux Server release 7.6 (Maipo)
- OVN (Open vSwitch) 2.9.0
Issue
- In OVN DVR setup and in east-west L2 communication lower frame (<60 bytes) get lost in both UDP and TCP based protocol in OpenvSwitch layer.
Resolution
-
While debugging the bad conntrack checksum has noticed in Syslog message, so it recommends to disable the conntrack_checksum from the kernel.
sysctl net.netfilter.nf_conntrack_checksum=0
- The fix is availble in kernel 4.14-stable patch and RedHat is in progress to backport this patch to downstream which track in BZ:1684518.
Root Cause
- Currently, in the OVS conntrack receive path,
ovs_ct_execute()
pulls the skb to the L3 header but does not trim it to the L3 length before callingnf_conntrack_in(NF_INET_PRE_ROUTING)
. - When
nf_conntrack_proto_tcp
encounters a packet with lower-layer padding,nf_ip_checksum()
fails causing anf_ct_tcp: bad TCP checksum
log message.
Diagnostic Steps
-
The diagnosis steps refer to RDP
port=3389
testing with both UDP & TCP protocol. -
UDP:
$ tshark -r source_tap_device.pcap -T fields -e frame.len 'udp && udp.port==3389' | sed -e 's/^[6-9][0-9]$/>60/' -e 's/^[0-5][0-9]$/<60/' -e 's/^[0-9][0-9][0-9]$/<1000/' -e 's/^[0-9][0-9][0-9][0-9]$/>1000/'| sort | uniq -c |sort -n -k 2 17 >1000 215 <60 !!! 262 >60 435 <1000
$ tshark -r destination_tap_device.pcap -T fields -e frame.len 'udp && udp.port==3389' | sed -e 's/^[6-9][0-9]$/>60/' -e 's/^[0-5][0-9]$/<60/' -e 's/^[0-9][0-9][0-9]$/<1000/' -e 's/^[0-9][0-9][0-9][0-9]$/>1000/'| sort | uniq -c |sort -n -k 2 17 >1000 186 <60 !!! 262 >60 435 <1000
-
In the TCP test, in each direction, the <60 bytes TCP frames are completely missing in the other end pcap.
$ for i in source_tap_device.pcap destination_tap_device.pcap;do echo $i; tshark -r $i -T fields -e frame.len 'tcp && tcp.dstport==3389' | sed -e 's/^[6-9][0-9]$/>60/' -e 's/^[0-5][0-9]$/<60/' -e 's/^[0-9][0-9][0-9]$/<1000/' -e 's/^[0-9][0-9][0-9][0-9]$/>1000/'| sort | uniq -c |sort -n -k 2; done source_tap_device.pcap 21 >1000 46 >60 109 <60 !!! 623 <1000 destination_tap_device.pcap 21 >1000 46 >60 637 <1000
$ for i in source_tap_device.pcap destination_tap_device.pcap;do echo $i; tshark -r $i -T fields -e frame.len 'tcp && tcp.srcport==3389' | sed -e 's/^[6-9][0-9]$/>60/' -e 's/^[0-5][0-9]$/<60/' -e 's/^[0-9][0-9][0-9]$/<1000/' -e 's/^[0-9][0-9][0-9][0-9]$/>1000/'| sort | uniq -c |sort -n -k 2; done source_tap_device.pcap 26 >1000 144 >60 305 <1000 destination_tap_device.pcap 26 >1000 144 >60 305 <1000 357 <60 !!!
-
The drop is clearly visible OVS datapath with following commands:
watch ovs-dpctl dump-flows | grep drop
. - Use
conntrack -L
command to verify whether the source and destination tracked by conntrack or not. -
By enabling Netfilter conntrack debugging noticed the bad UDP checksum messages in syslog message
dmesg | tail
and drop the small packets when padding bytes were added by the NIC.echo "255"> /proc/sys/net/netfilter/nf_conntrack_log_invalid
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments