Intermittent disconnects and connection errors with Chelsio network adapters

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux
  • Chelsio 100Gb Network Adapters (T62100-SO-CR)

Issue

  • We’re seeing intermittent client disconnects with our application pods
  • Application connections fail or disconnect randomly with errors similar to the following:

    Connection failed: read tcp 10.10.10.10:50000->10.20.20.20:5000: read: connection timed out.
    reconnect|WARN|unix#15762: connection dropped (Broken pipe)
    jsonrpc|WARN|unix#15763: send error: Broken pipe
    
  • Packet captures show a lot of intermittent duplicate packets

Resolution

  • Please engage your respective hardware vendor for further assistance.

Root Cause

  • A packet capture shows intermittent duplicate packet captures originating from outside Red Hat Enterprise Linux operating system. Further investigations would need to be carried out via the network hardware provider.

Diagnostic Steps

Because the issue is regarding intermittent duplicate packets being sent through the network, a packet capture tool will be needed. For Red Hat Enterprise Linux, we recommend tcpdump, however, network infrastructure vendors often have their own packet capture tools such as Cisco's Embedded Packet Capture (EPC) feature. For assistance with third-party packet capture tooling, please engage the appropriate software vendor.

  1. Install tcpdump or pull the appropriate container which provides tcpdump as per the knowledge-base article listed below which most appropriately matches your environment:

  2. Begin a packet capture with tcpdump during the time of the issue on both the client and remote system. If the issue is produced only with connections over a specific interface (e.g. eth0), then designate that interface to listed on. Otherwise, simply listen to all interfaces. Note This will make the terminal unusable while tcpdump is running.

    # tcpdump -nn -i <INTERFACE> -w /tmp/$(hostname)_$(date +\%d%m_%Y-%H_%M_%S-%Z).pcap    # replace <INTERFACE> with the interface name 
    # tcpdump -nn -i any -w /tmp/$(hostname)_$(date +\%d%m_%Y-%H_%M_%S-%Z).pcap    # listens to all interfaces on the system
    
    • Example:

      # tcpdump -i any -w /tmp/$(hostname)_$(date +\%d%m_%Y-%H_%M_%S-%Z).pcap
      dropped privs to tcpdump
      tcpdump: listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
      
  3. Attempt to ping or ssh into a remote system

    # ping <REMOTE HOST> -c 3
    # ssh <USER>@<SYSTEM>
    
  4. Terminate the packet captures (either with CTRL+c or pkill tcpdump from another terminal).

  5. Review the packet capture. For this, using a filter will reduce the amount of unrelated packets displayed.

    # tshark -Y 'ip.addr eq <REMOTE IP>' -r <EXAMPLE>.pcap    # show packets addressed to/from a specific IP address
    # tshark -Y 'tcp.port eq <PORT NUMBER>' -r <EXAMPLE>.pcap    # show packets sent/received over a specific TCP socket number
    # tshark -Y 'ip.addr eq <REMOTE IP> and tcp.port eq <PORT NUMBER>' -r <EXAMPLE>.pcap    # combine filters if desired. 
    
    • Example (where 1.1.1.1 was successfully pinged 3 times):

      # tshark -Y "ip.addr eq 1.1.1.1" -r /tmp/r8_2709_2021-18_41_29-EDT.pcap
         60   5.336831 192.168.1.10 → 1.1.1.1      ICMP 98 Echo (ping) request  id=0xdddd, seq=1/256, ttl=64
         63   5.351545      1.1.1.1 → 192.168.1.10 ICMP 98 Echo (ping) reply    id=0xdddd, seq=1/256, ttl=55 (request in 60)
         67   6.338890 192.168.1.10 → 1.1.1.1      ICMP 98 Echo (ping) request  id=0xdddd, seq=2/512, ttl=64
         68   6.347052      1.1.1.1 → 192.168.1.10 ICMP 98 Echo (ping) reply    id=0xdddd, seq=2/512, ttl=55 (request in 67)
         71   7.340402 192.168.1.10 → 1.1.1.1      ICMP 98 Echo (ping) request  id=0xdddd, seq=3/768, ttl=64
         72   7.348887      1.1.1.1 → 192.168.1.10 ICMP 98 Echo (ping) reply    id=0xdddd, seq=3/768, ttl=55 (request in 71)
      

What to look for

This issue produces intermittent duplicate packets over the network fabric. The client will send a single packet while the server receives duplicate packets. The server should respond to both.

  • Below is an example of a getattr() call by a client on a file hosted by an NFS share

    $ tshark -r example.pcap | head
        1 2021-08-12 02:36:25.650064   0.000000 10.0.0.10 → 10.0.0.20 NFS 194 V3 GETATTR Call, FH: 0xba90f240 0.000000 128 16:36:25.650064  
    
    Duplicate packets:
    
        2 2021-08-12 02:36:25.650264   0.000200 10.0.0.20 → 10.0.0.10 TCP 78 2049 → 965 [ACK] Seq=3054231008 Ack=1937132059 Win=29126 Len=0 TSval=3841336683 TSecr=4152412905 SLE=1937131931 SRE=1937132059 0.000200  16:36:25.650264  
        3 2021-08-12 02:36:25.650289   0.000225 10.0.0.20 → 10.0.0.10 TCP 78 [TCP Dup ACK 2#1] 2049 → 965 [ACK] Seq=3054231008 Ack=1937132059 Win=29126 Len=0 TSval=3841336683 TSecr=4152412905 SLE=1937131931 SRE=1937132059 0.000025  16:36:25.650289  
    
    Duplicate packets:
    
        4 2021-08-12 02:36:25.650311   0.000247 10.0.0.20 → 10.0.0.10 NFS 182 V3 GETATTR Reply (Call In 1)  Regular File mode: 0664 uid: NNNN gid: NNNN 0.000022 116 16:36:25.650311  
        5 2021-08-12 02:36:25.650330   0.000266 10.0.0.20 → 10.0.0.10 TCP 182 [TCP Retransmission] 2049 → 965 [PSH, ACK] Seq=3054231008 Ack=1937132059 Win=29128 Len=116 TSval=3841336683 TSecr=4152412905 0.000019 116 16:36:25.650330  
    
        6 2021-08-12 02:36:25.650347   0.000283 10.0.0.10 → 10.0.0.20 TCP 78 965 → 2049 [ACK] Seq=1937132059 Ack=3054231124 Win=12284 Len=0 TSval=4152412906 TSecr=3841336683 SLE=3054231008 SRE=3054231124 0.000017  16:36:25.650347  
        7 2021-08-12 02:36:25.650496   0.000432 10.0.0.10 → 10.0.0.20 NFS 302 V3 WRITE Call, FH: 0xba90f240 Offset: 0 Len: 88 FILE_SYNC 0.000149 236 16:36:25.650496  
    
    Duplicate packets:
    
        8 2021-08-12 02:36:25.650697   0.000633 10.0.0.20 → 10.0.0.10 TCP 78 2049 → 965 [ACK] Seq=3054231124 Ack=1937132295 Win=29125 Len=0 TSval=3841336683 TSecr=4152412906 SLE=1937132059 SRE=1937132295 0.000201  16:36:25.650697  
        9 2021-08-12 02:36:25.650708   0.000644 10.0.0.20 → 10.0.0.10 TCP 78 [TCP Dup ACK 8#1] 2049 → 965 [ACK] Seq=3054231124 Ack=1937132295 Win=29125 Len=0 TSval=3841336683 TSecr=4152412906 SLE=1937132059 SRE=1937132295 0.000011  16:36:25.650708  
    
    • Note Various attributes about the packets, such as the sequence number, ack number, etc are the same. Some of the packets are interpreted as specific to the NFS stack while others are simply acking the same packets.
  • In some instances, TCP communications with this issue will include Duplicate Selective ACKs (DSACK). DSACKs are intended to inform the remote side that it has sent a duplicate packet unnecessarily;

    • Below is an example of the issue when communicating for an extended period of time. The command counts the number of DSACKs sent between the two systems

      $ tshark -r remote-system.pcap -Y "tcp.options.sack.count > 0 && ip.addr == 10.0.0.10 && ip.addr == 10.0.0.20 && tcp.options.sack_le < tcp.ack" | wc -l
        26635
      
    • In contrast, below is an example of two virtual machines communicating to each other on the same hypervisor (and thus not over the problem hardware):

      $ tshark -r between-vms.pcap -Y "tcp.options.sack.count > 0 && ip.addr == 10.0.0.10 && ip.addr == 10.0.0.20 && tcp.options.sack_le < tcp.ack" | wc -l
        0        
      

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments