connection established after TCP RST,ACK

Latest response

Hello,

I am having a strange behaviour on my rhel7 server regarding TCP connections.

I have an XMPP server (Openfire) running on port 5222, although the issue relies on the TCP layer. This server as a normal load of about 220k simultaneous connections and is hosted directly on physical hardware.

After the TCP 3-way handhaske, the client sends a TCP RST,ACK to which the server replies with a TCP ACK and keeps the connection as ESTABLISHED for what seems, forever. I know the client behaviour is not the best but the server should close the connection and forget about it. What could be causing this? I have run out of ideas.

Here is a tcpdump for this scenario :

17:37:39.928575 IP 192.168.1.2.12818 > 192.168.1.1.5222: Flags [S], seq 0, win 16334, options [mss 1450], length 0
17:37:39.928600 IP 192.168.1.1.5222 > 192.168.1.2.12818: Flags [S.], seq 338496114, ack 1, win 29200, options [mss 1460], length 0
17:37:40.167624 IP 192.168.1.2.12818 > 192.168.1.1.5222: Flags [.], ack 1, win 16334, length 0
17:37:40.402129 IP 192.168.1.2.12818 > 192.168.1.1.5222: Flags [R.], seq 152, ack 1, win 16334, length 0

17:38:40.902470 IP 192.168.1.2.16060 > 192.168.1.1.5222: Flags [S], seq 0, win 16334, options [mss 1450], length 0
17:38:40.902522 IP 192.168.1.1.5222 > 192.168.1.2.16060: Flags [S.], seq 4241295752, ack 1, win 29200, options [mss 1460], length 0
17:38:41.188799 IP 192.168.1.2.16060 > 192.168.1.1.5222: Flags [.], ack 1, win 16334, length 0
17:38:41.461494 IP 192.168.1.2.16060 > 192.168.1.1.5222: Flags [R.], seq 152, ack 1, win 16334, length 0
17:38:41.461541 IP 192.168.1.1.5222 > 192.168.1.2.16060: Flags [.], ack 1, win 29200, length 0

17:39:42.552232 IP 192.168.1.2.19091 > 192.168.1.1.5222: Flags [S], seq 0, win 16334, options [mss 1450], length 0
17:39:42.552257 IP 192.168.1.1.5222 > 192.168.1.2.19091: Flags [S.], seq 3447276853, ack 1, win 29200, options [mss 1460], length 0
17:39:42.783282 IP 192.168.1.2.19091 > 192.168.1.1.5222: Flags [.], ack 1, win 16334, length 0
17:39:42.793409 IP 192.168.1.2.19091 > 192.168.1.1.5222: Flags [R.], seq 152, ack 1, win 16334, length 0
17:39:42.793423 IP 192.168.1.1.5222 > 192.168.1.2.19091: Flags [.], ack 1, win 29200, length 0

The real ipaddresses are public but i changed them to a private network , where 192.168.1.1 is the server and 192.168.1.2 is the client. In between them, on the server side there is a corporate firewall (don't know which one) and from the client side i have no idea about network components. After the previous capture, the number of TCP connections marked as ESTABLISHED was 3 more.

Here are the TCP tunables in place on the server side:

net.ipv4.tcp_abort_on_overflow = 0
net.ipv4.tcp_adv_win_scale = 1
net.ipv4.tcp_allowed_congestion_control = cubic reno
net.ipv4.tcp_app_win = 31
net.ipv4.tcp_autocorking = 1
net.ipv4.tcp_available_congestion_control = cubic reno
net.ipv4.tcp_base_mss = 512
net.ipv4.tcp_challenge_ack_limit = 1000
net.ipv4.tcp_congestion_control = cubic
net.ipv4.tcp_dsack = 1
net.ipv4.tcp_early_retrans = 3
net.ipv4.tcp_ecn = 2
net.ipv4.tcp_fack = 1
net.ipv4.tcp_fastopen = 0
net.ipv4.tcp_fastopen_key = 00000000-00000000-00000000-00000000
net.ipv4.tcp_fin_timeout = 60
net.ipv4.tcp_frto = 2
net.ipv4.tcp_invalid_ratelimit = 500
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_limit_output_bytes = 262144
net.ipv4.tcp_low_latency = 0
net.ipv4.tcp_max_orphans = 262144
net.ipv4.tcp_max_ssthresh = 0
net.ipv4.tcp_max_syn_backlog = 65536
net.ipv4.tcp_max_tw_buckets = 262144
net.ipv4.tcp_mem = 6180528  8240704 12361056
net.ipv4.tcp_min_tso_segs = 2
net.ipv4.tcp_moderate_rcvbuf = 1
net.ipv4.tcp_mtu_probing = 0
net.ipv4.tcp_no_metrics_save = 0
net.ipv4.tcp_notsent_lowat = -1
net.ipv4.tcp_orphan_retries = 0
net.ipv4.tcp_reordering = 3
net.ipv4.tcp_retrans_collapse = 1
net.ipv4.tcp_retries1 = 3
net.ipv4.tcp_retries2 = 15
net.ipv4.tcp_rfc1337 = 0
net.ipv4.tcp_rmem = 4096    87380   6291456
net.ipv4.tcp_sack = 1
net.ipv4.tcp_slow_start_after_idle = 1
net.ipv4.tcp_stdurg = 0
net.ipv4.tcp_syn_retries = 6
net.ipv4.tcp_synack_retries = 3
net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_thin_dupack = 0
net.ipv4.tcp_thin_linear_timeouts = 0
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_tso_win_divisor = 3
net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_tw_reuse = 0
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_wmem = 4096    16384   4194304
net.ipv4.tcp_workaround_signed_windows = 0
net.netfilter.nf_conntrack_tcp_be_liberal = 0
net.netfilter.nf_conntrack_tcp_loose = 1
net.netfilter.nf_conntrack_tcp_max_retrans = 3
net.netfilter.nf_conntrack_tcp_timeout_close = 10
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60
net.netfilter.nf_conntrack_tcp_timeout_established = 432000
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300

Thank you in advance for your support.

Responses

The client handshakes then Resets with a larger sequence number:

Flags [R.], seq 152

The high SEQ implies the client has tried transmitting data but never received an ACK for that data, so the client gives up on the connection after a very short time? Packet loss from client to server could be an overall root cause for the bad client behaviour.

As for the server socket remaining ESTAB forever, there was a bug fixed where socket and iptables conntrack states didn't match which results in TCP Resets being ignored:

These are backported to RHEL 6.7 (kernel-2.6.32-573.el6 with Bug 1200541) and RHEL 7.2 (kernel-3.10.0-327.el7 with Bug 1212829) and later. If you're running an earlier kernel than those, try upgrading.

Trying the latest kernel would be a generally good troubleshooting step.

You're running the firewall with connection tracking, so stopping firewalld/iptables would be a good troubleshooting step to isolate the problem to either the firewall or the network stack.

If you can make the network otherwise quiet, using dropwatch -lkas while reproducing will clearly identify where the kernel is freeing packets from memory, you can then follow the kernel code to understand why.

You've got a very well-defined issue so it would not be amiss to open a support case. Feel free to mention this discussion thread and the case will make its way to me or one of my network support colleagues to assist further.

First of all, thank you for the quick response.

Regarding the kernel version we are already using the latest one: 3.10.0-862.3.2.el7.x86_64 .

In what concerns to dropping packets I will analyse this issue further with the firewall/network team, but for now I already see with dropwatch that the major drops are at tcp_rcv_state_process+1b6. If we can't find anything more relevant we will follow your suggestion and start a support case.

Thank you once again.

Hi Antonio,

Might it be that the client has keep alive setup.

Regards,

Jan Gerrit Kootstra

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.