kernel: Possible SYN flooding on port #. Sending cookies.

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux (RHEL)
  • TCP network connections

Issue

  • One of the following messages is logged:

    kernel: possible SYN flooding on port X.
    kernel: possible SYN flooding on port X. Sending cookies.
    kernel: Possible SYN flooding on port X. Check SNMP counters.
    kernel: Possible SYN flooding on port X. Sending cookies. Check SNMP counters.
    kernel: TCPv6: Possible SYN flooding on port X.
    
  • Our system is sending SYN cookies.

  • Client application has high load with many rapid TCP connections, which appears to SYN flood the server.
  • What tunables in the kernel can help guard against or make a system resistant to SYN-FLOOD attacks?
  • In netstat -s I see x times the listen queue of a socket overflowed or SYNs to LISTEN sockets dropped growing
  • The ListenOverflows or ListenDrops value of /proc/net/netstat is increasing
  • Kernel dropping TCP connections due to LISTEN sockets buffer full in Red Hat Enterprise Linux
  • During peak periods, RHEL server would drop TCP SYN packets due to the kernel's buffer of LISTEN sockets being full and overflowing

Resolution

If required, refer to the below Root Cause section to obtain an understanding of TCP SYN, TCP handshake, listening sockets, SYN flood, and SYN cookies. An understanding of these terms is recommended before investigating or implementing any changes.

Table of Contents

Determine whether the traffic is valid or malicious

Use application debugging, network monitoring tools, or work with your network team or service provider.

This requires an understanding of:

  • the application's expected workload
  • expected client IP addresses
  • and expected client behaviour

Use the netstat or ss commands to inspect TCP socket states as follows, where X is the port number reported in the Possible SYN flooding on port X message:

netstat -nta | egrep "State|X"
ss -nta '( dport = :X )'

Having many sockets in the SYN-RECV state could mean a malicious "SYN flood" attack, though this is not the only type of malicious attack. You may also wish to inspect the source IP addresses of traffic to the port in question to confirm if client IPs are expected or unexpected.

The SystemTap script at Where are TCP SYNs coming from? can be used to monitor valid incoming SYNs to sockets in LISTEN state, even SYNs which are later rejected as SYN Flood or with SYN Cookies.

Use the tcpdump command to capture network traffic. Use the Packet Capture Syntax Generator to generate meaningful command options. Also refer to How to capture network packets with tcpdump? or the manual page man tcpdump.

A packet capture which displays many SYNs, which the server responds to with SYN+ACK, but the client never replies with the final ACK could mean a traditional "SYN flood" attack, though this is not the only type of malicious attack.

It is up to you to determine whether incoming traffic is valid or not. Red Hat have no knowledge of your application traffic or environment or expected client addresses. Red Hat can help to use these commands to extract meaningful results, but the decision whether traffic is valid or malicious is up to you and your business to make.

If the traffic is malicious

Work with your network team or service provider to block the traffic before it reaches the listening system or your network.

You may also use the iptables firewall to block traffic using the limit or hashlimit or connlimit match extensions. For full syntax and examples see:

  • RHEL 7: man iptables-extensions
  • RHEL 5 and 6: man iptables

Note that Red Hat are able to assist with usage of the iptables commands but are not able to write firewall rules to resolve malicious attacks against customers. Such an action would be development and implementation of security policy which is outside the Production Support Scope of Coverage.

If the traffic is valid

Confirm the application is accepting new connections

Confirm the application is actually making the accept() system call to move new connections out of the socket backlog.

Use the strace command as described at How do I use strace to trace system calls made by a command? or use application-specific debugging.

If the application is not calling accept() at all, or is calling it slower than expected, then debug the application to determine why it is not accepting new connections fast enough.

If the application is accepting new connections

If you confirm that application is accepting new connections and the rate of valid traffic is too high for the application, then two changes must be made to allow this listening application to cope with the workload.

These changes are:

  • the kernel's socket backlog limit
  • the application's socket listen backlog

Both of these must be changed. There is no point changing one but not the other.

Increase kernel socket backlog limit

The kernel's socket backlog limit is controlled by the net.core.somaxconn kernel tunable.

View the current value of the tunable with the command:

# sysctl net.core.somaxconn
net.core.somaxconn = 128

Increase the value with a command such as:

# sysctl -w net.core.somaxconn=2048
net.core.somaxconn = 2048

Confirm the change by viewing again:

# sysctl net.core.somaxconn
net.core.somaxconn = 2048

Persist this change across reboots by entering the corresponding line into /etc/sysctl.conf:

# echo "net.core.somaxconn = 2048" >> /etc/sysctl.conf

Note that your value may not be 2048, it may be smaller or larger.

After changing this tunable, restart the application for the changes to take effect at the next listen() call.

Note: On some systems, it may also be necessary to change the limit of currently-handshaking (SYN-RECV and waiting for ACK) connections on a socket with the kernel tunable:

# sysctl net.ipv4.tcp_max_syn_backlog
net.ipv4.tcp_max_syn_backlog = 512

Increase with sysctl -w and persist across reboots with /etc/sysctl.conf as per the examples above.

After changing this tunable, restart the application for the changes to take effect at the next listen() call.

Increase application socket listen backlog

The application's socket listen backlog is applied when the application makes the listen() system call against its socket.

This example in the C language shows the change from a small listen backlog to a larger backlog:

-   rc = listen(sockfd, 128);    /* old line */
+   rc = listen(sockfd, 2048);   /* new line */
    if (rc < 0)
    {
        perror("listen() failed");
        close(sockfd);
        exit(-1);
    }

The manual page for man listen shows the syntax in C:

int listen(int sockfd, int backlog);

Other programming languages may implement the listen backlog with a different syntax.

An application may even make the listen backlog a configurable value.

After changing the application listen backlog, recompile (if written in a compiled language) or restart (if written in an interpreted language or if configuration is changed) the application for the change to apply.

In the event an application has a hard-coded listen backlog which cannot be applied, an unsupported method to override the listen() function is described at How can I increase the TCP listen backlog value of a socket when the application has a hardcoded value?.

Some known programming language constructs and configuration options are listed below.

Java

In the Java ServerSocket object the syntax is:

ServerSocket(int port, int backlog)
Python

In Python 2 socket library and Python 3 socket library implement the .listen() method on a socket object like:

Socket.listen(backlog)
Apache

In the Apache Web Server, the listen backlog can be configured in the httpd.conf configuration file:

ListenBacklog 512
nginx

In nginx, the listen backlog can be configured as part of the listen directive:

listen 80 backlog=512;
named

In named, the listen backlog can be configured using the tcp-listen-queue directive, which is 10 by default:

tcp-listen-queue 512;
Squid

In the Squid proxy, the listen backlog can be configured in squid.conf with:

max_filedescriptors 512

If squid is compiled with USE_SELECT, the maximum value for this option is 1024. If the value is not compatible, Squid will log the error WARNING: 'max_filedescriptors X' does not work with select() when the service is started.

OpenSSH (sshd)

The OpenSSH server listen backlog is hard-coded to 128 and cannot be changed:

If you believe you have a SSH accept performance issue, please open a support case with Red Hat for investigation.

Samba (smbd)

The Samba listen backlog is hard-coded to 50 and cannot be changed:

If you believe you have a Samba accept performance issue, please open a support case with Red Hat for investigation.

Confirm the change in application behaviour

This can be done multiple ways.

1) Run the application as normal. Once the application has started, view the backlog value under the Send-Q in ss -ntlp output.

The following example shows the listen backlog in use is 10:

# ss -ntlp | more
State     Recv-Q  Send-Q  Local Address:Port  Peer Address:Port              
LISTEN    0       10                  *:9001             *:*       users:(("nc",pid=1234,fd=3))
                  ^^
                  value is 10

2) Run the application under strace system call tracer and observe the values passed to the listen() system call.

The following example shows the listen backlog in use is 10:

# strace -fvttTyyx -s 4096 -e socket,bind,listen nc -n4l 9001
socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 3<TCP:[502295]>
bind(3<TCP:[502295]>, {sa_family=AF_INET, sin_port=htons(9001), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
listen(3<TCP:[502295]>, 10) = 0
                        ^^
                        value is 10

The strace method is normally only useful when you can capture the application startup. Applications usually open their listening socket when the application starts, so it is often not useful to attach strace to an already-running process.

Root Cause

Overview

This section describes how TCP connections work and what happens in the lead up to a Possible SYN flooding message being logged.

The TCP State Diagram may be useful to understand this information.

Diagram provided by Wikimedia Commons under Creative Commons, Attribution, Share Alike license.

What is a TCP SYN and TCP Handshake?

These items are covered on the knowledgebase at:

How do listening sockets work?

When an application opens a socket, it can connect out to another system (sending a SYN), or it can listen for new connections coming in to this system (sending SYN+ACK when a SYN comes in).

When an application listens, it must accept new connections as they appear. Once a connection is accepted, a new socket is open and data can move back and forth.

When an application chooses to listen, it must provide a backlog value, which determines how many un-accepted connections can sit waiting for the application to accept them.

A connection in the socket backlog will still perform a TCP handshake.

What is a SYN Cookie?

SYN cookies are a method by which TCP connections can continue to be established when a socket's listen backlog fills up.

SYN cookies allow connections to continue establishing at times when a socket faces a temporary SYN flood, or when the application does not accept new connections fast enough or at all.

If the system's valid workload is such that SYN cookies are being logged regularly, the system and application should be tuned to avoid them.

A SYN cookie is created by crafting a special SYN+ACK where the TCP Sequence Number is a function of the time, the Maximum Segment Size, and the client and server's IP address and port numbers.

SYN cookies are not part of any RFC, though they do conform to the TCP standard. A full description of the calculation to create a cookie is given at some external sources:

SYN cookies are sent because the functionality is compiled into the RHEL kernel, and enabled by default. SYN cookies are controlled by the kernel tunable:

# sysctl net.ipv4.tcp_syncookies
net.ipv4.tcp_syncookies = 1

Linux kernel SYN Cookies support a limited number of TCP Options. Only the Timestamp, Window Scale, SACK, and ECN options are supported. Other TCP options will not be negotiated.

  • Note If this tunable is set to disable the sending of SYN cookies, the SYN must still be dropped. Doing so will not improve system performance, nor the amount of logging. The log message will change from Sending cookies to Dropping request.

Diagnostic Steps

The following description of the kernel tunables mentioned in the Resolution and Root Cause sections is provided in the kernel-doc package file Documentation/networking/ip-sysctl.txt:

tcp_syncookies - BOOLEAN
        Only valid when the kernel was compiled with CONFIG_SYNCOOKIES
        Send out syncookies when the syn backlog queue of a socket
        overflows. This is to prevent against the common 'SYN flood attack'
        Default: FALSE

        Note, that syncookies is fallback facility.
        It MUST NOT be used to help highly loaded servers to stand
        against legal connection rate. If you see SYN flood warnings
        in your logs, but investigation shows that they occur
        because of overload with legal connections, you should tune 
        another parameters until this warning disappear.
        See: tcp_max_syn_backlog, tcp_synack_retries, tcp_abort_on_overflow.

        syncookies seriously violate TCP protocol, do not allow
        to use TCP extensions, can result in serious degradation
        of some services (f.e. SMTP relaying), visible not by you, 
        but your clients and relays, contacting you. While you see
        SYN flood warnings in logs not being really flooded, your server
        is seriously misconfigured.
somaxconn - INTEGER
    Limit of socket listen() backlog, known in userspace as SOMAXCONN.
    Defaults to 128.  See also tcp_max_syn_backlog for additional tuning
    for TCP sockets.
tcp_max_syn_backlog - INTEGER
    Maximal number of remembered connection requests, which have not
    received an acknowledgment from connecting client.
    The minimal value is 128 for low memory machines, and it will
    increase in proportion to the memory of machine.
    If server suffers from overload, try increasing this number.

The description of the listen() system call is given in man 2 listen and man 3p listen:

LISTEN(2)                  Linux Programmer’s Manual                 LISTEN(2)

NAME
       listen - listen for connections on a socket

SYNOPSIS
       #include <sys/types.h>          /* See NOTES */
       #include <sys/socket.h>

       int listen(int sockfd, int backlog);

DESCRIPTION
       listen() marks the socket referred to by sockfd as a passive socket,
       that is, as a socket that will be used to accept incoming connection
       requests using accept(2).

       The sockfd argument is a file descriptor that refers to a socket of
       type SOCK_STREAM or SOCK_SEQPACKET.

       The backlog argument defines the maximum length to which the queue of
       pending connections for sockfd may grow. If a connection request
       arrives when the queue is full, the client  may receive an error with
       an indication of ECONNREFUSED or, if the underlying protocol supports
       retransmission, the request may be ignored so that a later reattempt at
       connection succeeds.

RETURN VALUE
       On success, zero is returned.  On error, -1 is returned, and errno is
       set appropriately.

Specific line numbers here are from RHEL 6.5 kernel 2.6.32-431.1.1.el6.

The message reporting SYN cookies are being sent is generated at:

net/ipv4/tcp_ipv4.c
 790 #ifdef CONFIG_SYN_COOKIES
 791 static void syn_flood_warning(struct sk_buff *skb)
 792 {
 793     static unsigned long warntime;
 794 
 795     if (time_after(jiffies, (warntime + HZ * 60))) {
 796         warntime = jiffies;
 797         printk(KERN_INFO
 798                "possible SYN flooding on port %d. Sending cookies.\n",
 799                ntohs(tcp_hdr(skb)->dest));
 800     }
 801 }
 802 #endif

The time_after block just prints the message if it has not already printed within the last 60 seconds.

The block calling syn_flood_warning is:

net/ipv4/tcp_ipv4.c
1213 int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
1214 {
1215     struct inet_request_sock *ireq;
1216     struct tcp_options_received tmp_opt;
1217     struct request_sock *req;
1218     __be32 saddr = ip_hdr(skb)->saddr;
1219     __be32 daddr = ip_hdr(skb)->daddr;
1220     __u32 isn = TCP_SKB_CB(skb)->when;
1221     struct dst_entry *dst = NULL;
1222 #ifdef CONFIG_SYN_COOKIES
1223     int want_cookie = 0;
1224 #else
1225 #define want_cookie 0 /* Argh, why doesn't gcc optimize this :( */
1226 #endif
1227 
1228     /* Never answer to SYNs send to broadcast or multicast */
1229     if (skb_rtable(skb)->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))
1230         goto drop;
1231 
1232     /* TW buckets are converted to open requests without
1233      * limitations, they conserve resources and peer is
1234      * evidently real one.
1235      */
1236     if (inet_csk_reqsk_queue_is_full(sk) && !isn) {
1237 #ifdef CONFIG_SYN_COOKIES
1238         if (sysctl_tcp_syncookies) {
1239             want_cookie = 1;
1240         } else
1241 #endif
1242         goto drop;
1243     }
...
1286     if (want_cookie) {
1287 #ifdef CONFIG_SYN_COOKIES
1288         syn_flood_warning(skb);
1289         req->cookie_ts = tmp_opt.tstamp_ok;
1290 #endif
1291         isn = cookie_v4_init_sequence(sk, skb, &req->mss);

Meaning if SYN cookies are enabled (both compiled in and turned on) then we log the fact that we're sending them.

The check for whether or not to send cookies is inet_csk_reqsk_queue_is_full which can be traced as follows:

include/net/inet_connection_sock.h
290 static inline int inet_csk_reqsk_queue_is_full(const struct sock *sk)
291 {
292     return reqsk_queue_is_full(&inet_csk(sk)->icsk_accept_queue);
293 }

Where we check if a queue is full by performing an arithmetic right shift on the socket's queue length by max_qlen_log:

include/net/request_sock.h
228 static inline int reqsk_queue_is_full(const struct request_sock_queue *queue)
229 {                        
230     return queue->listen_opt->qlen >> queue->listen_opt->max_qlen_log;
231 }        

max_qlen_log is given an upper bound by sysctl_max_syn_backlog (the net.ipv4.tcp_max_syn_backlog kernel tunable):

net/core/request_sock.c
 38 int reqsk_queue_alloc(struct request_sock_queue *queue,
 39               unsigned int nr_table_entries)
 40 {                          
...
 44     nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog);
 45     nr_table_entries = max_t(u32, nr_table_entries, 8);
 46     nr_table_entries = roundup_pow_of_two(nr_table_entries + 1);
...
 57     for (lopt->max_qlen_log = 3;
 58          (1 << lopt->max_qlen_log) < nr_table_entries;
 59          lopt->max_qlen_log++);

We create this limit when we start a listening socket:

net/ipv4/inet_connection_sock.c
683 int inet_csk_listen_start(struct sock *sk, const int nr_table_entries)
...
687     int rc = reqsk_queue_alloc(&icsk->icsk_accept_queue, nr_table_entries);

Where the limit comes from the backlog parameter:

net/ipv4/af_inet.c
 188 /*
 189  *  Move a socket into listening state.
 190  */
 191 int inet_listen(struct socket *sock, int backlog)
...
 211         err = inet_csk_listen_start(sk, backlog);

Which is passed from userspace using the listen() system call:

net/socket.c
1464 /*
1465  *  Perform a listen. Basically, we allow the protocol to do anything
1466  *  necessary for a listen, and if that works, we mark the socket as
1467  *  ready for listening.
1468  */
1469 
1470 SYSCALL_DEFINE2(listen, int, fd, int, backlog)
1471 {
1472     struct socket *sock;
1473     int err, fput_needed;
1474     int somaxconn;
1475 
1476     sock = sockfd_lookup_light(fd, &err, &fput_needed);
1477     if (sock) {
1478         somaxconn = sock_net(sock->sk)->core.sysctl_somaxconn;
1479         if ((unsigned)backlog > somaxconn)
1480             backlog = somaxconn;
1481 
1482         err = security_socket_listen(sock, backlog);
1483         if (!err)
1484             err = sock->ops->listen(sock, backlog);
1485 
1486         fput_light(sock->file, fput_needed);
1487     }
1488     return err;
1489 }

The backlog is given a max bound by sysctl_somaxconn (the net.core.somaxconn kernel tunable).

So if a socket's listen queue is full, and more SYNs arrive for that socket, then we either send SYN cookies, or if SYN cookies are disabled then we drop the incoming traffic.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments