Chapter 28. Linux traffic control

Linux offers tools for managing and manipulating the transmission of packets. The Linux Traffic Control (TC) subsystem helps in policing, classifying, shaping, and scheduling network traffic. TC also mangles the packet content during classification by using filters and actions. The TC subsystem achieves this by using queuing disciplines (qdisc), a fundamental element of the TC architecture.

The scheduling mechanism arranges or rearranges the packets before they enter or exit different queues. The most common scheduler is the First-In-First-Out (FIFO) scheduler. You can do the qdiscs operations temporarily using the tc utility or permanently using NetworkManager.

In Red Hat Enterprise Linux, you can configure default queueing disciplines in various ways to manage the traffic on a network interface.

28.1. Overview of queuing disciplines

Queuing disciplines (qdiscs) help with queuing up and, later, scheduling of traffic transmission by a network interface. A qdisc has two operations;

  • enqueue requests so that a packet can be queued up for later transmission and
  • dequeue requests so that one of the queued-up packets can be chosen for immediate transmission.

Every qdisc has a 16-bit hexadecimal identification number called a handle, with an attached colon, such as 1: or abcd:. This number is called the qdisc major number. If a qdisc has classes, then the identifiers are formed as a pair of two numbers with the major number before the minor, <major>:<minor>, for example abcd:1. The numbering scheme for the minor numbers depends on the qdisc type. Sometimes the numbering is systematic, where the first-class has the ID <major>:1, the second one <major>:2, and so on. Some qdiscs allow the user to set class minor numbers arbitrarily when creating the class.

Classful qdiscs

Different types of qdiscs exist and help in the transfer of packets to and from a networking interface. You can configure qdiscs with root, parent, or child classes. The point where children can be attached are called classes. Classes in qdisc are flexible and can always contain either multiple children classes or a single child, qdisc. There is no prohibition against a class containing a classful qdisc itself, this facilitates complex traffic control scenarios.

Classful qdiscs do not store any packets themselves. Instead, they enqueue and dequeue requests down to one of their children according to criteria specific to the qdisc. Eventually, this recursive packet passing ends up where the packets are stored (or picked up from in the case of dequeuing).

Classless qdiscs
Some qdiscs contain no child classes and they are called classless qdiscs. Classless qdiscs require less customization compared to classful qdiscs. It is usually enough to attach them to an interface.

Additional resources

  • tc(8) man page
  • tc-actions(8) man page

28.2. Introduction to connection tracking

At a firewall, the Netfilter framework filters packets from an external network. After a packet arrives, Netfilter assigns a connection tracking entry. Connection tracking is a Linux kernel networking feature for logical networks that tracks connections and identifies packet flow in those connections. This feature filters and analyzes every packet, sets up the connection tracking table to store connection status, and updates the connection status based on identified packets. For example, in the case of FTP connection, Netfilter assigns a connection tracking entry to ensure all packets of FTP connection work in the same manner. The connection tracking entry stores a Netfilter mark and tracks the connection state information in the memory table in which a new packet tuple maps with an existing entry. If the packet tuple does not map with an existing entry, the packet adds a new connection tracking entry that groups packets of the same connection.

You can control and analyze traffic on the network interface. The tc traffic controller utility uses the qdisc discipline to configure the packet scheduler in the network. The qdisc kernel-configured queuing discipline enqueues packets to the interface. By using qdisc, Kernel catches all the traffic before a network interface transmits it. Also, to limit the bandwidth rate of packets belonging to the same connection, use the tc qdisc command.

To retrieve data from connection tracking marks into various fields, use the tc utility with the ctinfo module and the connmark functionality. For storing packet mark information, the ctinfo module copies the Netfilter mark and the connection state information into a socket buffer (skb) mark metadata field.

Transmitting a packet over a physical medium removes all the metadata of a packet. Before the packet loses its metadata, the ctinfo module maps and copies the Netfilter mark value to a specific value of the Diffserv code point (DSCP) in the packet’s IP field.

Additional resources

  • tc(8) and tc-ctinfo(8) man pages

28.3. Inspecting qdiscs of a network interface using the tc utility

By default, Red Hat Enterprise Linux systems use fq_codel qdisc. You can inspect the qdisc counters using the tc utility.

Procedure

  1. Optional: View your current qdisc:

    # tc qdisc show dev enp0s1
  2. Inspect the current qdisc counters:

    # tc -s qdisc show dev enp0s1
    qdisc fq_codel 0: root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
    Sent 1008193 bytes 5559 pkt (dropped 233, overlimits 55 requeues 77)
    backlog 0b 0p requeues 0
    • dropped - the number of times a packet is dropped because all queues are full
    • overlimits - the number of times the configured link capacity is filled
    • sent - the number of dequeues

28.4. Updating the default qdisc

If you observe networking packet losses with the current qdisc, you can change the qdisc based on your network-requirements.

Procedure

  1. View the current default qdisc:

    # sysctl -a | grep qdisc
    net.core.default_qdisc = fq_codel
  2. View the qdisc of current Ethernet connection:

    # tc -s qdisc show dev enp0s1
    qdisc fq_codel 0: root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0
    maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
    new_flows_len 0 old_flows_len 0
  3. Update the existing qdisc:

    # sysctl -w net.core.default_qdisc=pfifo_fast
  4. To apply the changes, reload the network driver:

    # modprobe -r NETWORKDRIVERNAME
    # modprobe NETWORKDRIVERNAME
  5. Start the network interface:

    # ip link set enp0s1 up

Verification

  • View the qdisc of the Ethernet connection:

    # tc -s qdisc show dev enp0s1
    qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
     Sent 373186 bytes 5333 pkt (dropped 0, overlimits 0 requeues 0)
     backlog 0b 0p requeues 0
    ....

28.5. Temporarily setting the current qdisc of a network interface using the tc utility

You can update the current qdisc without changing the default one.

Procedure

  1. Optional: View the current qdisc:

    # tc -s qdisc show dev enp0s1
  2. Update the current qdisc:

    # tc qdisc replace dev enp0s1 root htb

Verification

  • View the updated current qdisc:

    # tc -s qdisc show dev enp0s1
    qdisc htb 8001: root refcnt 2 r2q 10 default 0 direct_packets_stat 0 direct_qlen 1000
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

28.6. Permanently setting the current qdisc of a network interface using NetworkManager

You can update the current qdisc value of a NetworkManager connection.

Procedure

  1. Optional: View the current qdisc:

    # tc qdisc show dev enp0s1
      qdisc fq_codel 0: root refcnt 2
  2. Update the current qdisc:

    # nmcli connection modify enp0s1 tc.qdiscs ‘root pfifo_fast’
  3. Optional: To add another qdisc over the existing qdisc, use the +tc.qdisc option:

    # nmcli connection modify enp0s1 +tc.qdisc ‘ingress handle ffff:’
  4. Activate the changes:

    # nmcli connection up enp0s1

Verification

  • View current qdisc the network interface:

    # tc qdisc show dev enp0s1
    qdisc pfifo_fast 8001: root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
    qdisc ingress ffff: parent ffff:fff1 ----------------

Additional resources

  • nm-settings(5) man page

28.7. Configuring the rate limiting of packets by using the tc-ctinfo utility

You can limit network traffic and prevent the exhaustion of resources in the network by using rate limiting. With rate limiting, you can also reduce the load on servers by limiting repetitive packet requests in a specific time frame. In addition, you can manage bandwidth rate by configuring traffic control in the kernel with the tc-ctinfo utility.

The connection tracking entry stores the Netfilter mark and connection information. When a router forwards a packet from the firewall, the router either removes or modifies the connection tracking entry from the packet. The connection tracking information (ctinfo) module retrieves data from connection tracking marks into various fields. This kernel module preserves the Netfilter mark by copying it into socket buffer (skb) mark metadata field.

Prerequisites

  • The iperf3 utility is installed on a server and a client.

Procedure

  1. Perform the following steps on the server:

    1. Add a virtual link to the network interface:

      # ip link add name ifb4eth0 numtxqueues 48 numrxqueues 48 type ifb

      This command has the following parameters:

      name ifb4eth0
      Sets new virtual device interface.
      numtxqueues 48
      Sets the number of transmit queues.
      numrxqueues 48
      Sets the number of receive queues.
      type ifb
      Sets the type of the new device.
    2. Change the state of the interface:

      # ip link set dev ifb4eth0 up
    3. Add the qdisc attribute on the physical network interface and apply it to the incoming traffic:

      # tc qdisc add dev enp1s0 handle ffff: ingress

      In the handle ffff: option, the handle parameter assigns the major number ffff: as a default value to a classful qdisc on the enp1s0 physical network interface, where qdisc is a queueing discipline parameter to analyze traffic control.

    4. Add a filter on the physical interface of the ip protocol to classify packets:

      # tc filter add dev enp1s0 parent ffff: protocol ip u32 match u32 0 0 action ctinfo cpmark 100 action mirred egress redirect dev ifb4eth0

      This command has the following attributes:

      parent ffff:
      Sets major number ffff: for the parent qdisc.
      u32 match u32 0 0
      Sets the u32 filter to match the IP headers of u32 pattern. The first 0 represents the second byte of IP header while the other 0 is for the mask match telling the filter which bits to match.
      action ctinfo
      Sets action to retrieve data from the connection tracking mark into various fields.
      cpmark 100
      Copies the connection tracking mark (connmark) 100 into the packet IP header field.
      action mirred egress redirect dev ifb4eth0
      Sets action mirred to redirect the received packets to the ifb4eth0 destination interface.
    5. Add a classful qdisc to the interface:

      # tc qdisc add dev ifb4eth0 root handle 1: htb default 1000

      This command sets the major number 1 to root qdisc and uses the htb hierarchy token bucket with classful qdisc of minor-id 1000.

    6. Limit the traffic on the interface to 1 Mbit/s with an upper limit of 2 Mbit/s:

      # tc class add dev ifb4eth0 parent 1:1 classid 1:100 htb ceil 2mbit rate 1mbit prio 100

      This command has the following parameters:

      parent 1:1
      Sets parent with classid as 1 and root as 1.
      classid 1:100
      Sets classid as 1:100 where 1 is the number of parent qdisc and 100 is the number of classes of the parent qdisc.
      htb ceil 2mbit
      The htb classful qdisc allows upper limit bandwidth of 2 Mbit/s as the ceil rate limit.
    7. Apply the Stochastic Fairness Queuing (sfq) of classless qdisc to interface with a time interval of 60 seconds to reduce queue algorithm perturbation:

      # tc qdisc add dev ifb4eth0 parent 1:100 sfq perturb 60
    8. Add the firewall mark (fw) filter to the interface:

      # tc filter add dev ifb4eth0 parent 1:0 protocol ip prio 100 handle 100 fw classid 1:100
    9. Restore the packet meta mark from the connection mark (CONNMARK):

      # nft add rule ip mangle PREROUTING counter meta mark set ct mark

      In this command, the nft utility has a mangle table with the PREROUTING chain rule specification that alters incoming packets before routing to replace the packet mark with CONNMARK.

    10. If no nft table and chain exist, create a table and add a chain rule:

      # nft add table ip mangle
      # nft add chain ip mangle PREROUTING {type filter hook prerouting priority mangle \;}
    11. Set the meta mark on tcp packets that are received on the specified destination address 192.0.2.3:

      # nft add rule ip mangle PREROUTING ip daddr 192.0.2.3 counter meta mark set 0x64
    12. Save the packet mark into the connection mark:

      # nft add rule ip mangle PREROUTING counter ct mark set mark
    13. Run the iperf3 utility as the server on a system by using the -s parameter and the server then waits for the response of the client connection:

      # iperf3 -s
  2. On the client, run iperf3 as a client and connect to the server that listens on IP address 192.0.2.3 for periodic HTTP request-response timestamp:

    # iperf3 -c 192.0.2.3 -t TCP_STREAM | tee rate

    192.0.2.3 is the IP address of the server while 192.0.2.4 is the IP address of the client.

  3. Terminate the iperf3 utility on the server by pressing Ctrl+C:

    Accepted connection from 192.0.2.4, port 52128
    [5]  local 192.0.2.3 port 5201 connected to 192.0.2.4 port 52130
    [ID] Interval       	Transfer 	Bitrate
    [5]   0.00-1.00   sec   119 KBytes   973 Kbits/sec
    [5]   1.00-2.00   sec   116 KBytes   950 Kbits/sec
    ...
    [ID] Interval       	Transfer 	Bitrate
    [5]   0.00-14.81  sec  1.51 MBytes   853 Kbits/sec  receiver
    
    iperf3: interrupt - the server has terminated
  4. Terminate the iperf3 utility on the client by pressing Ctrl+C:

    Connecting to host 192.0.2.3, port 5201
    [5] local 192.0.2.4 port 52130 connected to 192.0.2.3 port 5201
    [ID] Interval       	Transfer 	Bitrate     	Retr  Cwnd
    [5]   0.00-1.00   sec   481 KBytes  3.94 Mbits/sec	0   76.4 KBytes
    [5]   1.00-2.00   sec   223 KBytes  1.83 Mbits/sec	0   82.0 KBytes
    ...
    [ID] Interval       	Transfer 	Bitrate     	Retr
    [5]   0.00-14.00  sec  3.92 MBytes  2.35 Mbits/sec   32     sender
    [5]   0.00-14.00  sec  0.00 Bytes  0.00 bits/sec            receiver
    
    iperf3: error - the server has terminated

Verification

  1. Display the statistics about packet counts of the htb and sfq classes on the interface:

    # tc -s qdisc show dev ifb4eth0
    
    qdisc htb 1: root
    ...
     Sent 26611455 bytes 3054 pkt (dropped 76, overlimits 4887 requeues 0)
    ...
    qdisc sfq 8001: parent
    ...
     Sent 26535030 bytes 2296 pkt (dropped 76, overlimits 0 requeues 0)
    ...
  2. Display the statistics of packet counts for the mirred and ctinfo actions:

    # tc -s filter show dev enp1s0 ingress
    filter parent ffff: protocol ip pref 49152 u32 chain 0
    filter parent ffff: protocol ip pref 49152 u32 chain 0 fh 800: ht divisor 1
    filter parent ffff: protocol ip pref 49152 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 terminal flowid not_in_hw (rule hit 8075 success 8075)
      match 00000000/00000000 at 0 (success 8075 )
        action order 1: ctinfo zone 0 pipe
          index 1 ref 1 bind 1 cpmark 0x00000064 installed 3105 sec firstused 3105 sec DSCP set 0 error 0
          CPMARK set 7712
        Action statistics:
        Sent 25891504 bytes 3137 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0
    
        action order 2: mirred (Egress Redirect to device ifb4eth0) stolen
           index 1 ref 1 bind 1 installed 3105 sec firstused 3105 sec
        Action statistics:
        Sent 25891504 bytes 3137 pkt (dropped 0, overlimits 61 requeues 0)
        backlog 0b 0p requeues 0
  3. Display the statistics of the htb rate-limiter and its configuration:

    # tc -s class show dev ifb4eth0
    class htb 1:100 root leaf 8001: prio 7 rate 1Mbit ceil 2Mbit burst 1600b cburst 1600b
     Sent 26541716 bytes 2373 pkt (dropped 61, overlimits 4887 requeues 0)
     backlog 0b 0p requeues 0
     lended: 7248 borrowed: 0 giants: 0
     tokens: 187250 ctokens: 93625

Additional resources

  • tc(8) and tc-ctinfo(8) man page
  • nft(8) man page

28.8. Available qdiscs in RHEL

Each qdisc addresses unique networking-related issues. The following is the list of qdiscs available in RHEL. You can use any of the following qdisc to shape network traffic based on your networking requirements.

Table 28.1. Available schedulers in RHEL

qdisc nameIncluded inOffload support

Asynchronous Transfer Mode (ATM)

kernel-modules-extra

 

Class-Based Queueing

kernel-modules-extra

 

Credit-Based Shaper

kernel-modules-extra

Yes

CHOose and Keep for responsive flows, CHOose and Kill for unresponsive flows (CHOKE)

kernel-modules-extra

 

Controlled Delay (CoDel)

kernel-core

 

Deficit Round Robin (DRR)

kernel-modules-extra

 

Differentiated Services marker (DSMARK)

kernel-modules-extra

 

Enhanced Transmission Selection (ETS)

kernel-modules-extra

Yes

Fair Queue (FQ)

kernel-core

 

Fair Queuing Controlled Delay (FQ_CODel)

kernel-core

 

Generalized Random Early Detection (GRED)

kernel-modules-extra

 

Hierarchical Fair Service Curve (HSFC)

kernel-core

 

Heavy-Hitter Filter (HHF)

kernel-core

 

Hierarchy Token Bucket (HTB)

kernel-core

 

INGRESS

kernel-core

Yes

Multi Queue Priority (MQPRIO)

kernel-modules-extra

Yes

Multiqueue (MULTIQ)

kernel-modules-extra

Yes

Network Emulator (NETEM)

kernel-modules-extra

 

Proportional Integral-controller Enhanced (PIE)

kernel-core

 

PLUG

kernel-core

 

Quick Fair Queueing (QFQ)

kernel-modules-extra

 

Random Early Detection (RED)

kernel-modules-extra

Yes

Stochastic Fair Blue (SFB)

kernel-modules-extra

 

Stochastic Fairness Queueing (SFQ)

kernel-core

 

Token Bucket Filter (TBF)

kernel-core

Yes

Trivial Link Equalizer (TEQL)

kernel-modules-extra

 
Important

The qdisc offload requires hardware and driver support on NIC.

Additional resources

  • tc(8) man page