TCP SACK PANIC - Kernel vulnerabilities - CVE-2019-11477, CVE-2019-11478 & CVE-2019-11479

Public Date: June 17, 2019, 08:19
Updated September 3, 2021, 12:07 - Chinese, Simplified Japanese Korean
Resolved Status
Important Impact

Insights vulnerability analysis

View exposed systems

Executive Summary

Three related flaws were found in the Linux kernel’s handling of TCP networking.  The most severe vulnerability could allow a remote attacker to trigger a kernel panic in systems running the affected software and, as a result, impact the system’s availability.

The issues have been assigned multiple CVEs: CVE-2019-11477 is considered an Important severity, whereas CVE-2019-11478 and CVE-2019-11479 are considered a Moderate severity. 

The first two are related to the Selective Acknowledgement (SACK) packets combined with Maximum Segment Size (MSS), the third solely with the Maximum Segment Size (MSS).

These issues are corrected either through applying mitigations or kernel patches.  Mitigation details and links to RHSA advsories can be found on the RESOLVE tab of this article.

Issue Details and Background

Three related flaws were found in the Linux kernel’s handling of TCP Selective Acknowledgement (SACK) packets handling with low MSS size. The extent of impact is understood to be limited to denial of service at this time. No privilege escalation or information leak is currently suspected.

While mitigations shown in this article are available, they might affect traffic from legitimate sources that require the lower MSS values to transmit correctly and system performance. Please evaluate the mitigation that is appropriate for the system’s environment before applying.

What is a selective acknowledgement ?

TCP Selective Acknowledgment (SACK) is a mechanism where the data receiver can inform the sender about all the segments that have successfully been accepted. This allows the sender to retransmit segments of the stream that are missing from its ‘known good’ set. When TCP SACK is disabled a much larger set of retransmits are required to retransmit a complete stream.

What is MSS 

The maximum segment size (MSS) is a parameter set in the TCP header of a packet that specifies the total amount of data contained in a reconstructed TCP segment.
As packets might become fragmented when transmitting across different routes, a host must specify the MSS as equal to the largest IP datagram payload size that a host can handle. Very large MSS sizes might mean that a stream of packets ends up fragmented on their way to the destination, whereas smaller packets can ensure less fragmentation but end up with unused overhead.


Operating systems and transport types can default to specified MSS sizes. Attackers with privileged access can create raw packets with crafted MSS options in the packet to create this attack.

TCP SACKs:

TCP is a connection oriented protocol. When two parties wish to communicate over a TCP connection, they establish a connection by exchanging certain information such as requesting to initiate (SYN) a connection, initial sequence number, acknowledgement number, maximum segment size (MSS) to use over this connection, permission to send and process Selective Acknowledgements(SACKs), etc. This connection establishment process is known as 3-way handshake.

TCP sends and receives user data by a unit called Segment. A TCP segment consists of TCP Header, Options and user data.
TCP Segmentation


Each TCP segment has a Sequence Number (SEQ) and Acknowledgement Number (ACK).

These SEQ & ACK numbers are used to track which segments are successfully received by the receiver. ACK number indicates the next expected segment by the receiver.

Example: user ‘A’ above sends 1 kilobytes of data through 13 segments of 100 bytes each, 13 because each segment has TCP header of 20 bytes. On the receiving end, user ‘B’ receives segments 1, 2, 4, 6, 8 - 13, segments 3, 5 and 7 are lost, not received by user ‘B’.

By using ACK numbers, user ‘B’ will indicate that it is expecting segment number 3, which the user ‘A’ reads as none of the segments after 2 were received by the user ‘B’,and user ‘A’ will retransmit all the segments from 3 onwards, even though segments 4, 6 and 8-13 were successfully received by user ‘B’. User ‘B’ has no way to indicate that to user ‘A’. This leads to an inefficient usage of the network.

Selective Acknowledgement: SACK

To overcome above problem, Selective Acknowledgement(SACK) mechanism was devised and defined by RFC-2018. With Selective Acknowledgement(SACK), user ‘B’ above uses its TCP options field to inform user ‘A’ about all the segments(1,2,4,6,8-13) it has received successfully, so user ‘A’ needs to retransmit only segments 3, 5, and 7, thus considerably saving the network bandwidth and avoiding further congestion.

CVE-2019-11477 SACK Panic:
Socket Buffers(SKB):
Socket Buffer (SKB) is the most central data structure used in the Linux TCP/IP implementation. It is a linked list of buffers, which holds network packets. Such list can act as a Transmission queue, Receive queue, SACK’d queue, Retransmission queue, etc. SKB can hold packet data into fragments. Linux SKB can hold up to 17 fragments.

linux/include/linux/skbuff.h
define MAX_SKB_FRAGS (65536/PAGE_SIZE + 1)  => 17

With each fragment holding up to 32KB on x86 (64KB on PowerPC) of data. When packet is due to be sent, it’s placed on the Send queue and it’s details are kept in a control buffer structure like

    linux/include/linux/skbuff.h
struct tcp_skb_cb {
    __u32       seq;                    /* Starting sequence number */
    __u32       end_seq;    /* SEQ + FIN + SYN + datalen */
    __u32       tcp_tw_isn;
        struct {
                u16 tcp_gso_segs;
                u16 tcp_gso_size; 
        };
    __u8        tcp_flags;  /2* TCP header flags. (tcp[13])  */
    …
}

Of these, ‘tcp_gso_segs’ and ‘tcp_gso_size’ fields are used to tell device driver about segmentation offload.

When Segmentation offload is on and SACK mechanism is also enabled, due to packet loss and selective retransmission of some packets, SKB could end up holding multiple packets, counted by ‘tcp_gso_segs’. Multiple such SKB in the list are merged together into one to efficiently process different SACK blocks. It involves moving data from one SKB to another in the list. During this movement of data, the SKB structure can reach its maximum limit of 17 fragments and ‘tcp_gso_segs’ parameter can overflow and hit the BUG_ON() call below resulting in the said kernel panic issue.

static bool tcp_shifted_skb (struct sock *sk, …, unsigned int pcount, ...)
{
...
tcp_skb_pcount_add(prev, pcount);
BUG_ON(tcp_skb_pcount(skb) < pcount);   <= SACK panic
tcp_skb_pcount_add(skb, -pcount);

}

A remote user can trigger this issue by setting the Maximum Segment Size(MSS) of a TCP connection to its lowest limit of 48 bytes and sending a sequence of specially crafted SACK packets. Lowest MSS leaves merely 8 bytes of data per segment, thus increasing the number of TCP segments required to send all data.

Acknowledgements

Jonathan Looney (Netflix Information Security)

References


RFC-2018 - TCP selective acknowledgments 
How SKB’s work

Netflix (reporters) original report.


Impacted Products


Red Hat Product Security has rated this update as having a security impact of Important.

The following Red Hat product versions are impacted:
Primarily Impacted Products

  • Red Hat Enterprise Linux 8
  • Red Hat Enterprise Linux 7
  • Red Hat Enterprise Linux 6
  • Red Hat Enterprise Linux 5
  • Red Hat Atomic Host
  • Red Hat Enterprise MRG 2
  • Red Hat OpenShift Container Platform 4 (RHEL CoreOS)
  • Red Hat OpenShift Online
  • Red Hat OpenShift Dedicated (and dependent services)
  • OpenShift on Azure (ARO)
  • Red Hat OpenStack Platform (images shipping kernel)
  • Red Hat Virtualization (RHV-H)

Secondarily Impacted Products (underlying platform must be updated)

  • Red Hat OpenStack Platform

While Red Hat's Linux Containers are not directly impacted by kernel vulnerabilities, their security relies upon the integrity of the host kernel environment. Red Hat recommends that you use the most recent versions of your container images. The Container Health Index, part of the Red Hat Container Catalog, can always be used to determine the security status of Red Hat containers. To protect the privacy of the containers in use, you will need to ensure that the Container host (such as Red Hat Enterprise Linux, CoreOS, or Atomic Host) has been updated against these attacks. Red Hat will release an updated Atomic Host for this use case.


Affected Red Hat Products by Incident CVE


CVE-2019-11477
Important
CVE-2019-11478
Moderate
CVE-2019-11479
Moderate
RHEL 8 (kernel, kernel-rt)Affected - will fix all active streamsAffected - will fix all active streamsAffected - will fix all active streams
RHEL 7 (kernel, kernel-rt)Affected - will fix all active streamsAffected - will fix all active streamsAffected - will fix all active streams
RHEL 6Affected - will fix all active streamsAffected - will fix all active streamsAffected - will fix all active streams
RHEL 5 Not AffectedAffected (wontfix, out of support scope)Affected (wontfix, out of support scope)

Issue Details and Background Information

The following three flaws were reported by researchers. Below is a short description of each flaw and its impact.

CVE-2019-11477

The Linux kernel is vulnerable to an integer overflow in the 16 bit width of TCP_SKB_CB(skb)->tcp_gso_segs. A remote attacker could exploit this to crash the system and create a Denial Of Service.


CVE-2019-11478

The Linux kernel is vulnerable to a flaw that allows attackers to send a crafted sequence of SACKs which will fragment the TCP retransmission queue. An attacker might be able to further exploit the fragmented queue to cause an expensive linked-list walk for subsequent SACKs received for that same TCP connection.  This could cause the CPU to spend excessive time attempting to reconstruct the list creating a Denial Of Service.  


CVE-2019-11479    

The Linux kernel is vulnerable to a flaw that allows attackers to send a crafted packets with low MSS values to trigger excessive resource consumption. An attacker can force the Linux kernel to segment its responses into multiple TCP segments, each of which contains only 8 bytes of data. This drastically increases the bandwidth required to deliver the same amount of data. Further, it consumes additional resources (CPU and NIC processing power). This attack requires continued effort from the attacker and the impacts will end shortly after the attacker stops sending traffic.  While this attack is ongoing, the system will work at reduced capacity resulting in a Denial Of Service for some users.


Diagnose your vulnerability

Use the detection script to determine if your system is currently vulnerable to this flaw. To verify the legitimacy of the script, you can download the detached GPG signature as well.

Determine if your system is vulnerable

Current Version: 1.0

Take Action

Red Hat customers running affected versions of these Red Hat products are strongly recommended to update them as soon as errata are available. Customers are urged to apply the available updates immediately and enable the mitigations as they feel appropriate.   

A kpatch for customers running supported versions of Red Hat Enterprise Linux 7 or greater will be available. Please open a support case to gain access to the kpatch.

For more details about what a kpatch is: Is live kernel patching (kpatch) supported in RHEL 7 and beyond?

Updates for Affected Products

A regression introduced by the TCP SACK PANIC fixes was found. See [6] for further information.

Product
PackageAdvisory/Update
Red Hat Enterprise Linux 8 (z-stream)kernelRHSA-2019:1479 [6]
Red Hat Enterprise Linux 8
kernel-rtRHSA-2019:1480 [6]
Red Hat Enterprise Linux 7 (z-stream)kernelRHSA-2019:1481 [6]
Red Hat Enterprise Linux 7kernel-rtRHSA-2019:1486 [6]
Red Hat Enterprise Linux 7.5 Extended Update Support [1]kernelRHSA-2019:1482 [6]
Red Hat Enterprise Linux 7.4 Extended Update Support [1]
kernelRHSA-2019:1483 [6]
Red Hat Enterprise Linux 7.3 Update Services for SAP Solutions, & Advanced Update Support [2], [3]kernelRHSA-2019:1484 [6]
Red Hat Enterprise Linux 7.2 Update Services for SAP Solutions, & Advanced Update Support [2], [3]kernelRHSA-2019:1485 [6]
Red Hat Enterprise Linux 6 (z-stream)kernelRHSA-2019:1488 [6]
Red Hat Enterprise Linux 6.6 Advanced Update Support [2]kernelRHSA-2019:1489 [6]
Red Hat Enterprise Linux 6.5 Advanced Update Support [2]kernelRHSA-2019:1490 [6]
Red Hat Enterprise Linux 5 Extended Lifecycle Support [5]kernelsee below
RHEL Atomic Host [4]kernelReleased
Red Hat Enterprise MRG 2kernel-rtRHSA-2019:1487 [6]
Red Hat Virtualization 4virtualization hostRHSA-2019:1594 [6]
OpenShift Container Platform 4.1kernelRHBA-2019:1589 [6]

[1] An active EUS subscription is required for access to this patch.  Please contact Red Hat sales or your specific sales representative for more information if your account does not have an active EUS subscription.

What is the Red Hat Enterprise Linux Extended Update Support Subscription?

[2] An active AUS subscription is required for access to this patch in RHEL AUS.

What is Advanced mission critical Update Support (AUS)?

[3] An active Update Services for SAP Solutions Add-on or TUS subscription is required for access to this patch in RHEL E4S / TUS.

[4] For details on how to update Red Hat Enterprise Atomic Host, please see Deploying a specific version of Red Hat Enterprise Atomic Host.

FAQ: Red Hat Enterprise Linux 5 Extended Life Cycle Support (ELS) Add-On

[5] At this time, based on the severity of these issues, where Red Hat Enterprise Linux 5 is in its support lifecycle, RHEL5 will not be addressed.  Please contact Red Hat Support for available upgrade paths and options.

[6] A regression introduced by the TCP SACK PANIC fixes was found. For further information refer to TCP performance issues and stalls.

Mitigation

To mitigate issue CVE-2019-11477 and CVE-2019-11478 we can either disable the vulnerable component [Option #1], or use iptables to drop connections with a MSS size [Option #2] that is able to exploit the vulnerability.


Option #1
Disable selective acknowledgments system wide for all newly established TCP connections.  


# echo 0 > /proc/sys/net/ipv4/tcp_sack

or

# sysctl -w net.ipv4.tcp_sack=0

This option will disable selective acknowledgements but will likely increase the bandwidth required to correctly complete streams when errors occur.
To make this option persist across reboots, create a file in /etc/sysctl.d/ such as /etc/sysctl.d/99-tcpsack.conf - with content:

# CVE-2019-11477 & CVE-2019-11478
net.ipv4.tcp_sack=0

For Red Hat OpenShift Container Platform (OCP) 4.x, node tuning operators can be used to persist sysctl settings across nodes. On a default installation of OCP 4.1, this setting can be applied to all nodes by adding the sysctl to the ‘openshift’ default tuned profile:

$ oc edit tuned/default -n openshift-cluster-node-tuning-operator
...
spec:
  profile:
  - data: |
      [main]
      summary=Optimize systems running OpenShift (parent profile)
      ...
      [sysctl]
      net.ipv4.tcp_sack=0
      ...
      name: openshift


 See the product documentation for more information.

Option #2 Mitigates CVE-2019-11477CVE-2019-11478 and CVE-2019-11479 by preventing new connections made with low MSS sizes.

The default firewall configuration on Red Hat Enterprise Linux 7 and 8 is firewalld.  To prevent new connections with low MSS sizes using firewalld use the commands.

# firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 0 -p tcp --tcp-flags SYN SYN -m tcpmss --mss 1:500 -j DROP
# firewall-cmd --permanent --direct --add-rule ipv6 filter INPUT 0 -p tcp --tcp-flags SYN SYN -m tcpmss --mss 1:500 -j DROP
# firewall-cmd --reload
# firewall-cmd --permanent --direct --get-all-rules

This firewall-cmd command will remain persistent through system reboots.
If using the traditional iptables firewalling method on any version of Red Hat Enterprise Linux, iptables equivalent command is:

# iptables -I INPUT -p tcp --tcp-flags SYN SYN -m tcpmss --mss 1:500 -j DROP
# ip6tables -I INPUT -p tcp --tcp-flags SYN SYN -m tcpmss --mss 1:500 -j DROP

# iptables -nL -v
# ip6tables -nL -v

This option will drop connection attempts with an MSS size between 1 and 500. Please note it might also  deny some connections that may be considered valid. This mitigation works as long as net.ipv4.tcp_mtu_probing is disabled.

To ensure the iptables command persists after a system reboot see this article outlining the procedure.

Ansible playbook

Additionally, an Ansible playbook, disable_tcpsack_mitigate.yml is provided below. This playbook will disable selective acknowledgments and make the change permanent. To use the playbook, specify the hosts you'd like to disable selective acknowledgments on with the HOSTS extra var:

ansible-playbook -e HOSTS=web,mail,ldap04 disable_tcpsack_mitigate.yml

To verify the legitimacy of the playbook, you can download the detached GPG signature.

Automate the mitigation

Curent version: 1.0

Comments