[RHOSP13][OVN] Intermittent packet drops in OpenvSwitch with Conntrack ACL

Solution Verified - Updated -

Environment

  • Red Hat OpenStack Platform release 13.0.4 (Queens)
  • Red Hat Enterprise Linux Server release 7.6 (Maipo)
  • OVN (Open vSwitch) 2.9.0

Issue

  • In OVN DVR setup and in east-west L2 communication lower frame (<60 bytes) get lost in both UDP and TCP based protocol in OpenvSwitch layer.

Resolution

  • While debugging the bad conntrack checksum has noticed in Syslog message, so it recommends to disable the conntrack_checksum from the kernel.

    sysctl net.netfilter.nf_conntrack_checksum=0
    
  • The fix is availble in kernel 4.14-stable patch and RedHat is in progress to backport this patch to downstream which track in BZ:1684518.

Root Cause

  • Currently, in the OVS conntrack receive path, ovs_ct_execute() pulls the skb to the L3 header but does not trim it to the L3 length before calling nf_conntrack_in(NF_INET_PRE_ROUTING).
  • When nf_conntrack_proto_tcp encounters a packet with lower-layer padding, nf_ip_checksum() fails causing a nf_ct_tcp: bad TCP checksum log message.

Diagnostic Steps

  • The diagnosis steps refer to RDP port=3389 testing with both UDP & TCP protocol.

  • UDP:

    $ tshark -r source_tap_device.pcap -T fields -e frame.len 'udp && udp.port==3389' | sed -e 's/^[6-9][0-9]$/>60/' -e 's/^[0-5][0-9]$/<60/' -e 's/^[0-9][0-9][0-9]$/<1000/' -e 's/^[0-9][0-9][0-9][0-9]$/>1000/'| sort | uniq -c |sort -n -k 2
      17 >1000
     215 <60   !!!
     262 >60
     435 <1000
    
    $ tshark -r destination_tap_device.pcap -T fields -e frame.len 'udp && udp.port==3389' | sed -e 's/^[6-9][0-9]$/>60/' -e 's/^[0-5][0-9]$/<60/' -e 's/^[0-9][0-9][0-9]$/<1000/' -e 's/^[0-9][0-9][0-9][0-9]$/>1000/'| sort | uniq -c |sort -n -k 2
      17 >1000
     186 <60   !!!
     262 >60
     435 <1000
    
  • In the TCP test, in each direction, the <60 bytes TCP frames are completely missing in the other end pcap.

    $ for i in source_tap_device.pcap destination_tap_device.pcap;do echo $i; tshark -r $i -T fields -e frame.len 'tcp && tcp.dstport==3389' | sed -e 's/^[6-9][0-9]$/>60/' -e 's/^[0-5][0-9]$/<60/' -e 's/^[0-9][0-9][0-9]$/<1000/' -e 's/^[0-9][0-9][0-9][0-9]$/>1000/'| sort | uniq -c |sort -n -k 2; done
    source_tap_device.pcap
      21 >1000
      46 >60
     109 <60   !!!
     623 <1000
    destination_tap_device.pcap
      21 >1000
      46 >60
     637 <1000
    
    $ for i in source_tap_device.pcap destination_tap_device.pcap;do echo $i; tshark -r $i -T fields -e frame.len 'tcp && tcp.srcport==3389' | sed -e 's/^[6-9][0-9]$/>60/' -e 's/^[0-5][0-9]$/<60/' -e 's/^[0-9][0-9][0-9]$/<1000/' -e 's/^[0-9][0-9][0-9][0-9]$/>1000/'| sort | uniq -c |sort -n -k 2; done
    source_tap_device.pcap
      26 >1000
     144 >60
     305 <1000
    destination_tap_device.pcap
      26 >1000
     144 >60
     305 <1000
     357 <60 !!!
    
  • The drop is clearly visible OVS datapath with following commands: watch ovs-dpctl dump-flows | grep drop.

  • Use conntrack -L command to verify whether the source and destination tracked by conntrack or not.
  • By enabling Netfilter conntrack debugging noticed the bad UDP checksum messages in syslog message dmesg | tail and drop the small packets when padding bytes were added by the NIC.

    echo "255"> /proc/sys/net/netfilter/nf_conntrack_log_invalid
    

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments