Red Hat Training

A Red Hat training course is available for Red Hat Enterprise Linux

Chapter 2. Working with sysctl and kernel tunables

2.1. What is a kernel tunable?

Kernel tunables are used to customize the behavior of Red Hat Enterprise Linux at boot, or on demand while the system is running. Some hardware parameters are specified at boot time only and cannot be altered once the system is running, most however, can be altered as required and set permanent for the next boot.

2.2. How to work with kernel tunables

There are three ways to modify kernel tunables.

  1. Using the sysctl command
  2. By manually modifying configuration files in the /etc/sysctl.d/ directory
  3. Through a shell, interacting with the virtual file system mounted at /proc/sys
Note

Not all boot time parameters are under control of the sysfs subsystem, some hardware specific option must be set on the kernel command line, the Kernel Parameters section of this guide addresses those options

2.2.1. Using the sysctl command

The sysctl command is used to list, read, and set kernel tunables. It can filter tunables when listing or reading and set tunables temporarily or permanently.

  1. Listing variables

    # sysctl -a
  2. Reading variables

    # sysctl kernel.version +
    kernel.version = #1 SMP Fri Jan 19 13:19:54 UTC 2018
  3. Writing variables temporarily

    # sysctl <tunable class>.<tunable>=<value>
  4. Writing variables permanently

    # sysctl -w <tunable class>.<tunable>=<value>

2.2.2. Modifying files in /etc/sysctl.d

To override a default at boot, you can also manually populate files in /etc/sysctl.d.

  1. Create a new file in /etc/sysctl.d

    # vim /etc/sysctl.d/99-custom.conf
  2. Include the variables you wish to set, one per line, in the following form

    <tunable class>.<tunable> = <value> +
    <tunable class>.<tunable> = <value>
  3. Save the file
  4. Either reboot the machine to make the changes take effect
    or
    Execute sysctl -p /etc/sysctl.d/99-custom.conf to apply the changes without rebooting

2.3. What tunables can be controlled?

Tunables are diveded into groups by kernel sybsystem. A Red Hat Enterprise Linux system has the following classes of tunables:

Table 2.1. Table of sysctl interfaces

ClassSubsystem

abi

Execution domains and personalities

crypto

Cryptographic interfaces

debug

Kernel debugging interfaces

dev

Device specific information

fs

Global and specific filesystem tunables

kernel

Global kernel tunables

net

Network tunables

sunrpc

Sun Remote Procedure Call (NFS)

user

User Namespace limits

vm

Tuning and management of memory, buffer, and cache

2.3.1. Network interface tunables

System administrators are able to adjust the network configuration on a running system through the networking tunables.

Networking tunables are included in the /proc/sys/net directory, which contains multiple subdirectories for various networking topics. To adjust the network configuration, system administrators need to modify the files within such subdirectories.

The most frequently used directories are:

  1. /proc/sys/net/core/
  2. /proc/sys/net/ipv4/

The /proc/sys/net/core/ directory contains a variety of settings that control the interaction between the kernel and networking layers. By adjusting some of those tunables, you can improve performance of a system, for example by increasing the size of a receive queue, increasing the maximum connections or the memory dedicated to network interfaces. Note that the performance of a system depends on different aspects according to the individual issues.

The /proc/sys/net/ipv4/ directory contains additional networking settings, which are useful when preventing attacks on the system or when using the system to act as a router. The directory contains both IP and TCP variables. For detailed explaination of those variables, see /usr/share/doc/kernel-doc-<version>/Documentation/networking/ip-sysctl.txt.

Other directories within the /proc/sys/net/ipv4/ directory cover different aspects of the network stack:

  1. /proc/sys/net/ipv4/conf/ - alows you to configure each system interface in different ways, including the use of default settings for unconfigured devices and settings that override all special configurations
  2. /proc/sys/net/ipv4/neigh/ - contains settings for communicating with a host directly connected to the system and also contains different settings for systems more than one step away
  3. /proc/sys/net/ipv4/route/ - contains specifications that apply to routing with any interfaces on the system

This list of network tunables is relevant to IPv4 interfaces and are accessible from the /proc/sys/net/ipv4/{all,<interface_name>}/ directory.

Description of the following parameters have been adopted from the kernel documentation sites.[1]

log_martians

Log packets with impossible addresses to kernel log.

TypeDefault

Boolean

0

Enabled if one or more of conf/{all,interface}/log_martians is set to TRUE

+ Further Resources

accept_redirects

Accept ICMP redirect messages.

TypeDefault

Boolean

1

accept_redirects for the interface is enabled under the following conditions:

  • Both conf/{all,interface}/accept_redirects are TRUE (when forwarding for the interface is enabled)
  • At least one of conf/{all,interface}/accept_redirects is TRUE (forwarding for the interface is disabled)

For more information refer to How to enable or disable ICMP redirects

forwarding

Enable IP forwarding on an interface.

TypeDefault

Boolean

0

Further Resources

mc_forwarding

Do multicast routing.

TypeDefault

Boolean

0

  • Read only value
  • A multicast routing daemon is required.
  • conf/all/mc_forwarding must also be set to TRUE to enable multicast routing for the interface

Further Resources

medium_id

Arbitrary value used to differentiate the devices by the medium they are attached to.

TypeDefault

Integer

0

Notes

  • Two devices on the same medium can have different id values when the broadcast packets are received only on one of them.
  • The default value 0 means that the device is the only interface to its medium
  • value of -1 means that medium is not known.
  • Currently, it is used to change the proxy_arp behavior:
  • the proxy_arp feature is enabled for packets forwarded between two devices attached to different media.

Further Resources - For examples, see Using the "medium_id" feature in Linux 2.2 and 2.4

proxy_arp

Do proxy arp.

TypeDefault

Boolean

0

proxy_arp for the interface is enabled if at least one of conf/{all,interface}/proxy_arp is set to TRUE, otherwise it is disabled

proxy_arp_pvlan

Private VLAN proxy arp.

TypeDefault

Boolean

0

Allow proxy arp replies back to the same interface, to support features like RFC 3069

shared_media

Send(router) or accept(host) RFC1620 shared media redirects.

TypeDefault

Boolean

1

Notes

  • Overrides secure_redirects.
  • shared_media for the interface is enabled if at least one of conf/{all,interface}/shared_media is set to TRUE
secure_redirects

Accept ICMP redirect messages only to gateways listed in the interface’s current gateway list.

TypeDefault

Boolean

1

Notes

  • Even if disabled, RFC1122 redirect rules still apply.
  • Overridden by shared_media.
  • secure_redirects for the interface is enabled if at least one of conf/{all,interface}/secure_redirects is set to TRUE
send_redirects

Send redirects, if router.

TypeDefault

Boolean

1

Notes
send_redirects for the interface is enabled if at least one of conf/{all,interface}/send_redirects is set to TRUE

bootp_relay

Accept packets with source address 0.b.c.d destined not to this host as local ones.

TypeDefault

Boolean

0

Notes

  • A BOOTP daemon must be enabled to manage these packets
  • conf/all/bootp_relay must also be set to TRUE to enable BOOTP relay for the interface
  • Not implemented, see DHCP Relay Agent in the Red Hat Enterprise Linux Networking Guide
accept_source_route

Accept packets with SRR option.

TypeDefault

Boolean

1

Notes

  • conf/all/accept_source_route must also be set to TRUE to accept packets with SRR option on the interface
accept_local

Accept packets with local source addresses.

TypeDefault

Boolean

0

Notes

  • In combination with suitable routing, this can be used to direct packets between two local interfaces over the wire and have them accepted properly.
  • rp_filter must be set to a non-zero value in order for accept_local to have an effect.
route_localnet

Do not consider loopback addresses as martian source or destination while routing.

TypeDefault

Boolean

0

Notes

  • This enables the use of 127/8 for local routing purposes.
rp_filter

Enable source Validation

TypeDefault

Integer

0

ValueEffect

0

No source validation.

1

Strict mode as defined in RFC3704 Strict Reverse Path

2

Loose mode as defined in RFC3704 Loose Reverse Path

Notes

  • Current recommended practice in RFC3704 is to enable strict mode to prevent IP spoofing from DDos attacks.
  • If using asymmetric routing or other complicated routing, then loose mode is recommended.
  • The highest value from conf/{all,interface}/rp_filter is used when doing source validation on the {interface}
arp_filter
TypeDefault

Boolean

0

ValueEffect

0

(default) The kernel can respond to arp requests with addresses from other interfaces. It usually makes sense, because it increases the chance of successful communication.

1

Allows you to have multiple network interfaces on the samesubnet, and have the ARPs for each interface be answered based on whether or not the kernel would route a packet from the ARP’d IP out that interface (therefore you must use source based routing for this to work). In other words it allows control of cards (usually 1) that respond to an arp request.

Note

  • IP addresses are owned by the complete host on Linux, not by particular interfaces. Only for more complex setups like load-balancing, does this behavior cause problems.
  • arp_filter for the interface is enabled if at least one of conf/{all,interface}/arp_filter is set to TRUE
arp_announce

Define different restriction levels for announcing the local source IP address from IP packets in ARP requests sent on interface

TypeDefault

Integer

0

ValueEffect

0

(default) Use any local address, configured on any interface

1

Try to avoid local addresses that are not in the target’s subnet for this interface. This mode is useful when target hosts reachable via this interface require the source IP address in ARP requests to be part of their logical network configured on the receiving interface. When we generate the request we check all our subnets that include the target IP and preserve the source address if it is from such subnet. If there is no such subnet we select source address according to the rules for level 2.

2

Always use the best local address for this target. In this mode we ignore the source address in the IP packet and try to select local address that we prefer for talks with the target host. Such local address is selected by looking for primary IP addresses on all our subnets on the outgoing interface that include the target IP address. If no suitable local address is found we select the first local address we have on the outgoing interface or on all other interfaces, with the hope we receive reply for our request and even sometimes no matter the source IP address we announce.

Notes

  • The highest value from conf/{all,interface}/arp_announce is used.
  • Increasing the restriction level gives more chance for receiving answer from the resolved target while decreasing the level announces more valid sender’s information.
arp_ignore

Define different modes for sending replies in response to received ARP requests that resolve local target IP addresses

TypeDefault

Integer

0

ValueEffect

0

(default): reply for any local target IP address, configured on any interface

1

reply only if the target IP address is local address configured on the incoming interface

2

reply only if the target IP address is local address configured on the incoming interface and both with the sender’s IP address are part from same subnet on this interface

3

do not reply for local addresses configured with scope host, only resolutions for global and link addresses are replied

4-7

reserved

8

do not reply for all local addresses The max value from conf/{all,interface}/arp_ignore is used when ARP request is received on the {interface}

Notes

arp_notify

Define mode for notification of address and device changes.

TypeDefault

Boolean

0

ValueEffect

0

do nothing

1

Generate gratuitous arp requests when device is brought up or hardware address changes.

Notes

arp_accept

Define behavior for gratuitous ARP frames who’s IP is not already present in the ARP table

TypeDefault

Boolean

0

ValueEffect

0

do not create new entries in the ARP table

1

create new entries in the ARP table.

Notes
Both replies and requests type gratuitous arp trigger the ARP table to be updated, if this setting is on. If the ARP table already contains the IP address of the gratuitous arp frame, the arp table is updated regardless if this setting is on or off.

app_solicit

The maximum number of probes to send to the user space ARP daemon via netlink before dropping back to multicast probes (see mcast_solicit).

TypeDefault

Integer

0

Notes
See mcast_solicit

disable_policy

Disable IPSEC policy (SPD) for this interface

TypeDefault

Boolean

0

needinfo

disable_xfrm

Disable IPSEC encryption on this interface, whatever the policy

TypeDefault

Boolean

0

needinfo

igmpv2_unsolicited_report_interval

The interval in milliseconds in which the next unsolicited IGMPv1 or IGMPv2 report retransmit takes place.

TypeDefault

Integer

10000

Notes
Milliseconds

igmpv3_unsolicited_report_interval

The interval in milliseconds in which the next unsolicited IGMPv3 report retransmit takes place.

TypeDefault

Integer

1000

Notes
Milliseconds

tag

Allows you to write a number, which can be used as required.

TypeDefault

Integer

0

xfrm4_gc_thresh

The threshold at which we start garbage collecting for IPv4 destination cache entries.

TypeDefault

Integer

1

Notes
At twice this value the system refuses new allocations.

2.3.2. Global kernel tunables

System administrators are able to configure and monitor general settings on a running system through the global kernel tunables.

Global kernel tunables are included in the /proc/sys/kernel/ directory either directly as named control files or grouped in further subdirectories for various configuration topics. To adjust the global kernel tunables, system administrators need to modify the control files.

Descriptions of the following parameters have been adopted from the kernel documentation sites.[2]

dmesg_restrict

Indicates whether unprivileged users are prevented from using the dmesg command to view messages from the kernel’s log buffer.

For further information, see Kernel sysctl documentation.

core_pattern

Specifies a core dumpfile pattern name.

Max lengthDefault

128 characters

"core"

For further information, see Kernel sysctl documentation.

hardlockup_panic

Controls the kernel panic when a hard lockup is detected.

TypeValueEffect

Integer

0

kernel does not panic on hard lockup

Integer

1

kernel panics on hard lockup

In order to panic, the system needs to detect a hard lockup first. The detection is controlled by the nmi_watchdog parameter.

Further Resources

softlockup_panic

Controls the kernel panic when a soft lockup is detected.

TypeValueEffect

Integer

0

kernel does not panic on soft lockup

Integer

1

kernel panics on soft lockup

For more information about softlockup_panic, see kernel_parameters.

kptr_restrict

Indicates whether restrictions are placed on exposing kernel addresses via /proc and other interfaces.

TypeDefault

Integer

0

ValueEffect

0

hashes the kernel address before printing

1

replaces printed kernel pointers with 0’s under certain conditions

2

replaces printed kernel pointers with 0’s unconditionally

To learn more, see Kernel sysctl documentation.

nmi_watchdog

Controls the hard lockup detector on x86 systems.

TypeDefault

Integer

0

ValueEffect

0

disables the hard lockup detector

1

enables the hard lockup detector

The hard lockup detector monitors each CPU for its ability to respond to interrupts.

For more details, see Kernel sysctl documentation.

watchdog_thresh

Controls frequency of hrtimer, NMI events, soft/hard lockup thresholds.

Default thresholdSoft lockup threshold

10 seconds

2 * watchdog_thresh

Setting this tunable to zero disables lockup detection altogether.

For more info, consult Kernel sysctl documentation.

panic, panic_on_oops, panic_on_stackoverflow, panic_on_unrecovered_nmi, panic_on_warn, panic_on_rcu_stall, hung_task_panic

These tunables specify under what circumstances the kernel should panic.

To see more details about a group of panic parameters, see Kernel sysctl documentation.

printk, printk_delay, printk_ratelimit, printk_ratelimit_burst, printk_devkmsg

These tunables control logging or printing of kernel error messages.

For more details about a group of printk parameters, see Kernel sysctl documentation.

shmall, shmmax, shm_rmid_forced

These tunables control limits for shared memory.

For more information about a group of shm parameters, see Kernel sysctl documentation.

threads-max

Controls the maximum number of threads created by the fork() system call.

Min valueMax value

20

Given by FUTEX_TID_MASK (0x3fffffff)

The threads-max value is checked against the available RAM pages. If the thread structures occupy too much of the available RAM pages, threads-max is reduced accordingly.

For more details, see Kernel sysctl documentation.

pid_max

PID allocation wrap value.

To see more information, refer to Kernel sysctl documentation.

numa_balancing

This parameter enables or disables automatic NUMA memory balancing. On NUMA machines, there is a performance penalty if remote memory is accessed by a CPU.

For more details, see Kernel sysctl documentation.

numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb

These tunables detect if pages are properly placed of if the data should be migrated to a memory node local to where the task is running.

For more details about a group of numa_balancing_scan parameters, see Kernel sysctl documentation.