Slow TCP over loopback, TCP Window = 1 from the server

Latest response

Hi there,

Maybe someone have some hints on what the issue can be here (I'd appreciate any comments),

I have a TCP connection, over a loopback (lo), everything goes well at the start of TCP connect, TCP Window increased packet by packet (until it reach its maximum (459520 bytes)), and then after few minutes (sometimes hours) I see extreme slowness in TCP connection throughput. I've collected Packet Captures and I see that client start to send just 256 bytes of payload in TCP, here is what I see,

No.     Time               Source                Destination           Protocol Length Delta          First TCP frame Info
    402 08:59:19.195008    127.0.0.1             127.0.0.1             TCP      322    0.000000       0.000000000     34491 → 27508 [PSH, ACK] Seq=1 Ack=1 Win=1795 Len=256 TSval=64843961 TSecr=64843910 [TCP segment of a reassembled PDU]

Frame 402: 322 bytes on wire (2576 bits), 322 bytes captured (2576 bits)
Ethernet II, Src: 00:00:00_00:00:00 (00:00:00:00:00:00), Dst: 00:00:00_00:00:00 (00:00:00:00:00:00)
Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1
Transmission Control Protocol, Src Port: 34491, Dst Port: 27508, Seq: 1, Ack: 1, Len: 256
    Source Port: 34491
    Destination Port: 27508
    [Stream index: 4]
    [TCP Segment Len: 256]
    Sequence number: 1    (relative sequence number)
    [Next sequence number: 257    (relative sequence number)]
    Acknowledgment number: 1    (relative ack number)
    1000 .... = Header Length: 32 bytes (8)
    Flags: 0x018 (PSH, ACK)
    Window size value: 1795
    [Calculated window size: 1795]
    [Window size scaling factor: -1 (unknown)]
    Checksum: 0xff28 [unverified]
    [Checksum Status: Unverified]
    Urgent pointer: 0
    Options: (12 bytes), No-Operation (NOP), No-Operation (NOP), Timestamps
    [SEQ/ACK analysis]
    [Timestamps]
    TCP payload (256 bytes)
    TCP segment data (256 bytes)

No.     Time               Source                Destination           Protocol Length Delta          First TCP frame Info
    403 08:59:19.195051    127.0.0.1             127.0.0.1             TCP      66     0.000043       0.000043000     27508 → 34491 [ACK] Seq=1 Ack=257 Win=1 Len=0 TSval=64843961 TSecr=64843961

Frame 403: 66 bytes on wire (528 bits), 66 bytes captured (528 bits)
Ethernet II, Src: 00:00:00_00:00:00 (00:00:00:00:00:00), Dst: 00:00:00_00:00:00 (00:00:00:00:00:00)
Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1
Transmission Control Protocol, Src Port: 27508, Dst Port: 34491, Seq: 1, Ack: 257, Len: 0
    Source Port: 27508
    Destination Port: 34491
    [Stream index: 4]
    [TCP Segment Len: 0]
    Sequence number: 1    (relative sequence number)
    [Next sequence number: 1    (relative sequence number)]
    Acknowledgment number: 257    (relative ack number)
    1000 .... = Header Length: 32 bytes (8)
    Flags: 0x010 (ACK)
    Window size value: 1
    [Calculated window size: 1]
    [Window size scaling factor: -1 (unknown)]
    Checksum: 0xfe28 [unverified]
    [Checksum Status: Unverified]
    Urgent pointer: 0
    Options: (12 bytes), No-Operation (NOP), No-Operation (NOP), Timestamps
    [SEQ/ACK analysis]
    [Timestamps]

No.     Time               Source                Destination           Protocol Length Delta          First TCP frame Info
    419 08:59:19.399011    127.0.0.1             127.0.0.1             TCP      322    0.203960       0.204003000     34491 → 27508 [PSH, ACK] Seq=257 Ack=1 Win=1795 Len=256 TSval=64844012 TSecr=64843961 [TCP segment of a reassembled PDU]

Frame 419: 322 bytes on wire (2576 bits), 322 bytes captured (2576 bits)
Ethernet II, Src: 00:00:00_00:00:00 (00:00:00:00:00:00), Dst: 00:00:00_00:00:00 (00:00:00:00:00:00)
Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1
Transmission Control Protocol, Src Port: 34491, Dst Port: 27508, Seq: 257, Ack: 1, Len: 256
    Source Port: 34491
    Destination Port: 27508
    [Stream index: 4]
    [TCP Segment Len: 256]
    Sequence number: 257    (relative sequence number)
    [Next sequence number: 513    (relative sequence number)]
    Acknowledgment number: 1    (relative ack number)
    1000 .... = Header Length: 32 bytes (8)
    Flags: 0x018 (PSH, ACK)
    Window size value: 1795
    [Calculated window size: 1795]
    [Window size scaling factor: -1 (unknown)]
    Checksum: 0xff28 [unverified]
    [Checksum Status: Unverified]
    Urgent pointer: 0
    Options: (12 bytes), No-Operation (NOP), No-Operation (NOP), Timestamps
    [SEQ/ACK analysis]
    [Timestamps]
    TCP payload (256 bytes)
    TCP segment data (256 bytes)

No.     Time               Source                Destination           Protocol Length Delta          First TCP frame Info
    420 08:59:19.399038    127.0.0.1             127.0.0.1             TCP      66     0.000027       0.204030000     27508 → 34491 [ACK] Seq=1 Ack=513 Win=1 Len=0 TSval=64844012 TSecr=64844012

Frame 420: 66 bytes on wire (528 bits), 66 bytes captured (528 bits)
Ethernet II, Src: 00:00:00_00:00:00 (00:00:00:00:00:00), Dst: 00:00:00_00:00:00 (00:00:00:00:00:00)
Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1
Transmission Control Protocol, Src Port: 27508, Dst Port: 34491, Seq: 1, Ack: 513, Len: 0
    Source Port: 27508
    Destination Port: 34491
    [Stream index: 4]
    [TCP Segment Len: 0]
    Sequence number: 1    (relative sequence number)
    [Next sequence number: 1    (relative sequence number)]
    Acknowledgment number: 513    (relative ack number)
    1000 .... = Header Length: 32 bytes (8)
    Flags: 0x010 (ACK)
    Window size value: 1
    [Calculated window size: 1]
    [Window size scaling factor: -1 (unknown)]
    Checksum: 0xfe28 [unverified]
    [Checksum Status: Unverified]
    Urgent pointer: 0
    Options: (12 bytes), No-Operation (NOP), No-Operation (NOP), Timestamps
    [SEQ/ACK analysis]
    [Timestamps]

So, the server is sending TCP Window = 1, and due to this the client is sending only 256 bytes of payload. In socket statistics (ss) I see the following for this TCP connection (client and server are both on same host (localhost)),

tcp    ESTAB      0      0      127.0.0.1:27508              127.0.0.1:34491               users:(("mongod",pid=21212,fd=39)) timer:(keepalive,1min9sec,0) uid:1238 ino:76354492 sk:d5 <->
     skmem:(r0,rb8388608,t0,tb2626560,f4096,w0,o0,bl0) ts sack cubic wscale:8,8 rto:208 rtt:4.981/9.882 ato:40 mss:65483 cwnd:10 ssthresh:465 bytes_acked:4718 bytes_received:5486977062 segs_out:254560 segs_in:3049050 send 1051.7Mbps lastsnd:17483088 lastrcv:136 lastack:31168 pacing_rate 2103.1Mbps reordering:129 rcv_rtt:4 rcv_space:266611

tcp    ESTAB      0      6247456 ::ffff:127.0.0.1:34491              ::ffff:127.0.0.1:27508               users:(("mms-app",pid=3308,fd=443)) timer:(persist,068ms,0) uid:1238 ino:76349961 sk:104 <->
     skmem:(r0,rb1061808,t0,tb8388608,f4176,w6332336,o0,bl0) ts sack cubic wscale:8,8 rto:204 rtt:0.073/0.013 ato:40 mss:65483 cwnd:172 ssthresh:172 bytes_acked:5486977063 bytes_received:4718 segs_out:3049050 segs_in:254561 send 1234309.7Mbps lastsnd:136 lastrcv:17483088 lastack:136 retrans:0/518 reordering:129 rcv_rtt:4 rcv_space:65495

Questions,
A) How can I have Send-Q = 6247456 (and SK_MEMINFO_WMEM_QUEUED = 6332336) on the client with such a big value, if it only send 256 bytes of data at once. Could someone please help to clarify it?
B) What exactly can trigger the server side to send TCP Window = 1? I understand that it might be due to it's overload, but I've already checked that there's zero issues with utilization on CPU/memory/disk IO. What would you suggest to check on the server side?

Thanks a lot in advance!
Alexey

Responses