Possible TCP stack bug

Issue

We found unexpected behavior in an application that appears to be a bug in the TCP algorithm.
Following tcpdumps we detected that a second unnecessary re-transmission timeout (RTO) occurs after a first valid RTO
Bug detailed description:
- The unexpected behavior appears in the server applications when TCP needs to re-transmit dropped packets. It appears in all server applications at a quite a high frequency.
- The bug appears only when the server detected a drop (by a RTO after 200ms) and at that moment it is still pending to receive the ACK for 2 packets. In that case, after 200ms of sending all packets, the RTO triggers the re-transmission of the first packet, then the ACK for that packet is received, but the second packet is not re-transmitted at that moment. After another 400ms another RTO is triggered and that second packet is re-transmitted and ACKed. To our understanding this second re-transmission should not occur. The expected behavior is that the second packet is re-transmitted right after receiving the ACK for the first re-transmitted packet.
- Also this unexpected second RTO occurs only if there are 2 pending packets at the moment of the first RTO. If there is one packet to retransmit for more than 2, the behavior is as expected, all packets are re-transmitted and ACKed after the first RTO (there is no second RTO).

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.