Red Hat Training

A Red Hat training course is available for Red Hat Enterprise Linux

E.3.9. /proc/sys/

The /proc/sys/ directory is different from others in /proc/ because it not only provides information about the system but also allows the system administrator to immediately enable and disable kernel features.

Warning

Use caution when changing settings on a production system using the various files in the /proc/sys/ directory. Changing the wrong setting may render the kernel unstable, requiring a system reboot.
For this reason, be sure the options are valid for that file before attempting to change any value in /proc/sys/.
A good way to determine if a particular file can be configured, or if it is only designed to provide information, is to list it with the -l option at the shell prompt. If the file is writable, it may be used to configure the kernel. For example, a partial listing of /proc/sys/fs looks like the following:
-r--r--r--    1 root     root            0 May 10 16:14 dentry-state
-rw-r--r--    1 root     root            0 May 10 16:14 dir-notify-enable
-rw-r--r--    1 root     root            0 May 10 16:14 file-max
-r--r--r--    1 root     root            0 May 10 16:14 file-nr
In this listing, the files dir-notify-enable and file-max can be written to and, therefore, can be used to configure the kernel. The other files only provide feedback on current settings.
Changing a value within a /proc/sys/ file is done by echoing the new value into the file. For example, to enable the System Request Key on a running kernel, type the command:
echo 1 > /proc/sys/kernel/sysrq
This changes the value for sysrq from 0 (off) to 1 (on).
A few /proc/sys/ configuration files contain more than one value. To correctly send new values to them, place a space character between each value passed with the echo command, such as is done in this example:
echo 4 2 45 > /proc/sys/kernel/acct

Note

Any configuration changes made using the echo command disappear when the system is restarted. To make configuration changes take effect after the system is rebooted, see Section E.4, “Using the sysctl Command”.
The /proc/sys/ directory contains several subdirectories controlling different aspects of a running kernel.

E.3.9.1. /proc/sys/dev/

This directory provides parameters for particular devices on the system. Most systems have at least two directories, cdrom/ and raid/. Customized kernels can have other directories, such as parport/, which provides the ability to share one parallel port between multiple device drivers.
The cdrom/ directory contains a file called info, which reveals a number of important CD-ROM parameters:
CD-ROM information, Id: cdrom.c 3.20 2003/12/17
drive name:             hdc
drive speed:            48
drive # of slots:       1
Can close tray:         1
Can open tray:          1
Can lock tray:          1
Can change speed:       1
Can select disk:        0
Can read multisession:  1
Can read MCN:           1
Reports media changed:  1
Can play audio:         1
Can write CD-R:         0
Can write CD-RW:        0
Can read DVD:           0
Can write DVD-R:        0
Can write DVD-RAM:      0
Can read MRW:           0
Can write MRW:          0
Can write RAM:          0
This file can be quickly scanned to discover the qualities of an unknown CD-ROM. If multiple CD-ROMs are available on a system, each device is given its own column of information.
Various files in /proc/sys/dev/cdrom, such as autoclose and checkmedia, can be used to control the system's CD-ROM. Use the echo command to enable or disable these features.
If RAID support is compiled into the kernel, a /proc/sys/dev/raid/ directory becomes available with at least two files in it: speed_limit_min and speed_limit_max. These settings determine the acceleration of RAID devices for I/O intensive tasks, such as resyncing the disks.

E.3.9.2. /proc/sys/fs/

This directory contains an array of options and information concerning various aspects of the file system, including quota, file handle, inode, and dentry information.
The binfmt_misc/ directory is used to provide kernel support for miscellaneous binary formats.
The important files in /proc/sys/fs/ include:
  • dentry-state — Provides the status of the directory cache. The file looks similar to the following:
    57411	52939	45	0	0	0
    
    The first number reveals the total number of directory cache entries, while the second number displays the number of unused entries. The third number tells the number of seconds between when a directory has been freed and when it can be reclaimed, and the fourth measures the pages currently requested by the system. The last two numbers are not used and display only zeros.
  • file-max — Lists the maximum number of file handles that the kernel allocates. Raising the value in this file can resolve errors caused by a lack of available file handles.
  • file-nr — Lists the number of allocated file handles, used file handles, and the maximum number of file handles.
  • overflowgid and overflowuid — Defines the fixed group ID and user ID, respectively, for use with file systems that only support 16-bit group and user IDs.

E.3.9.3. /proc/sys/kernel/

This directory contains a variety of different configuration files that directly affect the operation of the kernel. Some of the most important files include:
  • acct — Controls the suspension of process accounting based on the percentage of free space available on the file system containing the log. By default, the file looks like the following:
    4	2	30
    
    The first value dictates the percentage of free space required for logging to resume, while the second value sets the threshold percentage of free space when logging is suspended. The third value sets the interval, in seconds, that the kernel polls the file system to see if logging should be suspended or resumed.
  • ctrl-alt-del — Controls whether Ctrl+Alt+Delete gracefully restarts the computer using init (0) or forces an immediate reboot without syncing the dirty buffers to disk (1).
  • domainname — Configures the system domain name, such as example.com.
  • exec-shield — Configures the Exec Shield feature of the kernel. Exec Shield provides protection against certain types of buffer overflow attacks.
    There are two possible values for this virtual file:
    • 0 — Disables Exec Shield.
    • 1 — Enables Exec Shield. This is the default value.

    Important

    If a system is running security-sensitive applications that were started while Exec Shield was disabled, these applications must be restarted when Exec Shield is enabled in order for Exec Shield to take effect.
  • hostname — Configures the system host name, such as www.example.com.
  • hotplug — Configures the utility to be used when a configuration change is detected by the system. This is primarily used with USB and Cardbus PCI. The default value of /sbin/hotplug should not be changed unless testing a new program to fulfill this role.
  • modprobe — Sets the location of the program used to load kernel modules. The default value is /sbin/modprobe which means kmod calls it to load the module when a kernel thread calls kmod.
  • msgmax — Sets the maximum size of any message sent from one process to another and is set to 8192 bytes by default. Be careful when raising this value, as queued messages between processes are stored in non-swappable kernel memory. Any increase in msgmax would increase RAM requirements for the system.
  • msgmnb — Sets the maximum number of bytes in a single message queue. The default is 16384.
  • msgmni — Sets the maximum number of message queue identifiers. The default is 4008.
  • osrelease — Lists the Linux kernel release number. This file can only be altered by changing the kernel source and recompiling.
  • ostype — Displays the type of operating system. By default, this file is set to Linux, and this value can only be changed by changing the kernel source and recompiling.
  • overflowgid and overflowuid — Defines the fixed group ID and user ID, respectively, for use with system calls on architectures that only support 16-bit group and user IDs.
  • panic — Defines the number of seconds the kernel postpones rebooting when the system experiences a kernel panic. By default, the value is set to 0, which disables automatic rebooting after a panic.
  • printk — This file controls a variety of settings related to printing or logging error messages. Each error message reported by the kernel has a loglevel associated with it that defines the importance of the message. The loglevel values break down in this order:
    • 0 — Kernel emergency. The system is unusable.
    • 1 — Kernel alert. Action must be taken immediately.
    • 2 — Condition of the kernel is considered critical.
    • 3 — General kernel error condition.
    • 4 — General kernel warning condition.
    • 5 — Kernel notice of a normal but significant condition.
    • 6 — Kernel informational message.
    • 7 — Kernel debug-level messages.
    Four values are found in the printk file:
    6     4     1     7
    
    Each of these values defines a different rule for dealing with error messages. The first value, called the console loglevel, defines the lowest priority of messages printed to the console. (Note that, the lower the priority, the higher the loglevel number.) The second value sets the default loglevel for messages without an explicit loglevel attached to them. The third value sets the lowest possible loglevel configuration for the console loglevel. The last value sets the default value for the console loglevel.
  • random/ directory — Lists a number of values related to generating random numbers for the kernel.
  • sem — Configures semaphore settings within the kernel. A semaphore is a System V IPC object that is used to control utilization of a particular process.
  • shmall — Sets the total amount of shared memory that can be used at one time on the system, in bytes. By default, this value is 2097152.
  • shmmax — Sets the largest shared memory segment size allowed by the kernel. By default, this value is 33554432. However, the kernel supports much larger values than this.
  • shmmni — Sets the maximum number of shared memory segments for the whole system. By default, this value is 4096.
  • sysrq — Activates the System Request Key, if this value is set to anything other than zero (0), the default.
    The System Request Key allows immediate input to the kernel through simple key combinations. For example, the System Request Key can be used to immediately shut down or restart a system, sync all mounted file systems, or dump important information to the console. To initiate a System Request Key, type Alt+SysRq+system request code. Replace system request code with one of the following system request codes:
    • r — Disables raw mode for the keyboard and sets it to XLATE (a limited keyboard mode which does not recognize modifiers such as Alt, Ctrl, or Shift for all keys).
    • k — Kills all processes active in a virtual console. Also called Secure Access Key (SAK), it is often used to verify that the login prompt is spawned from init and not a trojan copy designed to capture user names and passwords.
    • b — Reboots the kernel without first unmounting file systems or syncing disks attached to the system.
    • c — Crashes the system without first unmounting file systems or syncing disks attached to the system.
    • o — Shuts off the system.
    • s — Attempts to sync disks attached to the system.
    • u — Attempts to unmount and remount all file systems as read-only.
    • p — Outputs all flags and registers to the console.
    • t — Outputs a list of processes to the console.
    • m — Outputs memory statistics to the console.
    • 0 through 9 — Sets the log level for the console.
    • e — Kills all processes except init using SIGTERM.
    • i — Kills all processes except init using SIGKILL.
    • l — Kills all processes using SIGKILL (including init). The system is unusable after issuing this System Request Key code.
    • h — Displays help text.
    This feature is most beneficial when using a development kernel or when experiencing system freezes.

    Warning

    The System Request Key feature is considered a security risk because an unattended console provides an attacker with access to the system. For this reason, it is turned off by default.
    See /usr/share/doc/kernel-doc-kernel_version/Documentation/sysrq.txt for more information about the System Request Key.
  • tainted — Indicates whether a non-GPL module is loaded.
    • 0 — No non-GPL modules are loaded.
    • 1 — At least one module without a GPL license (including modules with no license) is loaded.
    • 2 — At least one module was force-loaded with the command insmod -f.
  • threads-max — Sets the maximum number of threads to be used by the kernel, with a default value of 2048.
  • version — Displays the date and time the kernel was last compiled. The first field in this file, such as #3, relates to the number of times a kernel was built from the source base.

E.3.9.4. /proc/sys/net/

This directory contains subdirectories concerning various networking topics. Various configurations at the time of kernel compilation make different directories available here, such as ethernet/, ipv4/, ipx/, and ipv6/. By altering the files within these directories, system administrators are able to adjust the network configuration on a running system.
Given the wide variety of possible networking options available with Linux, only the most common /proc/sys/net/ directories are discussed.
The /proc/sys/net/core/ directory contains a variety of settings that control the interaction between the kernel and networking layers. The most important of these files are:
  • message_burst — Sets the maximum number of new warning messages to be written to the kernel log in the time interval defined by message_cost. The default value of this file is 10.
    In combination with message_cost, this setting is used to enforce a rate limit on warning messages written to the kernel log from the networking code and mitigate Denial of Service (DoS) attacks. The idea of a DoS attack is to bombard the targeted system with requests that generate errors and either fill up disk partitions with log files or require all of the system's resources to handle the error logging.
    The settings in message_burst and message_cost are designed to be modified based on the system's acceptable risk versus the need for comprehensive logging. For example, by setting message_burst to 10 and message_cost to 5, you allow the system to write the maximum number of 10 messages every 5 seconds.
  • message_cost — Sets a cost on every warning message by defining a time interval for message_burst. The higher the value is, the more likely the warning message is ignored. The default value of this file is 5.
  • netdev_max_backlog — Sets the maximum number of packets allowed to queue when a particular interface receives packets faster than the kernel can process them. The default value for this file is 1000.
  • optmem_max — Configures the maximum ancillary buffer size allowed per socket.
  • rmem_default — Sets the receive socket buffer default size in bytes.
  • rmem_max — Sets the receive socket buffer maximum size in bytes.
  • wmem_default — Sets the send socket buffer default size in bytes.
  • wmem_max — Sets the send socket buffer maximum size in bytes.
The /proc/sys/net/ipv4/ directory contains additional networking settings. Many of these settings, used in conjunction with one another, are useful in preventing attacks on the system or when using the system to act as a router.

Warning

An erroneous change to these files may affect remote connectivity to the system.
The following is a list of some of the more important files within the /proc/sys/net/ipv4/ directory:
  • icmp_echo_ignore_all and icmp_echo_ignore_broadcasts — Allows the kernel to ignore ICMP ECHO packets from every host or only those originating from broadcast and multicast addresses, respectively. A value of 0 allows the kernel to respond, while a value of 1 ignores the packets.
  • ip_default_ttl — Sets the default Time To Live (TTL), which limits the number of hops a packet may make before reaching its destination. Increasing this value can diminish system performance.
  • ip_forward — Permits interfaces on the system to forward packets. By default, this file is set to 0. Setting this file to 1 enables network packet forwarding.
  • ip_local_port_range — Specifies the range of ports to be used by TCP or UDP when a local port is needed. The first number is the lowest port to be used and the second number specifies the highest port. Any systems that expect to require more ports than the default 1024 to 4999 should use a range from 32768 to 61000.
  • tcp_syn_retries — Provides a limit on the number of times the system re-transmits a SYN packet when attempting to make a connection.
  • tcp_retries1 — Sets the number of permitted re-transmissions attempting to answer an incoming connection. Default of 3.
  • tcp_retries2 — Sets the number of permitted re-transmissions of TCP packets. Default of 15.
The /usr/share/doc/kernel-doc-kernel_version/Documentation/networking/ip-sysctl.txt file contains a list of files and options available in the /proc/sys/net/ipv4/ and /proc/sys/net/ipv6/ directories. Use the sysctl -a command to list the parameters in the sysctl key format.
A number of other directories exist within the /proc/sys/net/ipv4/ directory and each covers a different aspect of the network stack. The /proc/sys/net/ipv4/conf/ directory allows each system interface to be configured in different ways, including the use of default settings for unconfigured devices (in the /proc/sys/net/ipv4/conf/default/ subdirectory) and settings that override all special configurations (in the /proc/sys/net/ipv4/conf/all/ subdirectory).

Important

Red Hat Enterprise Linux 6 defaults to strict reverse path forwarding. Before changing the setting in the rp_filter file, see the entry on Reverse Path Forwarding in the Red Hat Enterprise Linux 6 Security Guide and The Red Hat Knowledgebase article about rp_filter.
The /proc/sys/net/ipv4/neigh/ directory contains settings for communicating with a host directly connected to the system (called a network neighbor) and also contains different settings for systems more than one hop away.
Routing over IPV4 also has its own directory, /proc/sys/net/ipv4/route/. Unlike conf/ and neigh/, the /proc/sys/net/ipv4/route/ directory contains specifications that apply to routing with any interfaces on the system. Many of these settings, such as max_size, max_delay, and min_delay, relate to controlling the size of the routing cache. To clear the routing cache, write any value to the flush file.
Additional information about these directories and the possible values for their configuration files can be found in:
/usr/share/doc/kernel-doc-kernel_version/Documentation/filesystems/proc.txt

E.3.9.5. /proc/sys/vm/

This directory facilitates the configuration of the Linux kernel's virtual memory (VM) subsystem. The kernel makes extensive and intelligent use of virtual memory, which is commonly referred to as swap space.
The following files are commonly found in the /proc/sys/vm/ directory:
  • block_dump — Configures block I/O debugging when enabled. All read/write and block dirtying operations done to files are logged accordingly. This can be useful if diagnosing disk spin up and spin downs for laptop battery conservation. All output when block_dump is enabled can be retrieved via dmesg. The default value is 0.

    Note

    If block_dump is enabled at the same time as kernel debugging, it is prudent to stop the klogd daemon, as it generates erroneous disk activity caused by block_dump.
  • dirty_background_ratio — Starts background writeback of dirty data at this percentage of total memory, via a pdflush daemon. The default value is 10.
  • dirty_expire_centisecs — Defines when dirty in-memory data is old enough to be eligible for writeout. Data which has been dirty in-memory for longer than this interval is written out next time a pdflush daemon wakes up. The default value is 3000, expressed in hundredths of a second.
  • dirty_ratio — Starts active writeback of dirty data at this percentage of total memory for the generator of dirty data, via pdflush. The default value is 20.
  • dirty_writeback_centisecs — Defines the interval between pdflush daemon wakeups, which periodically writes dirty in-memory data out to disk. The default value is 500, expressed in hundredths of a second.
  • laptop_mode — Minimizes the number of times that a hard disk needs to spin up by keeping the disk spun down for as long as possible, therefore conserving battery power on laptops. This increases efficiency by combining all future I/O processes together, reducing the frequency of spin ups. The default value is 0, but is automatically enabled in case a battery on a laptop is used.
    This value is controlled automatically by the acpid daemon once a user is notified battery power is enabled. No user modifications or interactions are necessary if the laptop supports the ACPI (Advanced Configuration and Power Interface) specification.
    For more information, see the following installed documentation:
    /usr/share/doc/kernel-doc-kernel_version/Documentation/laptop-mode.txt
  • max_map_count — Configures the maximum number of memory map areas a process may have. In most cases, the default value of 65536 is appropriate.
  • min_free_kbytes — Forces the Linux VM (virtual memory manager) to keep a minimum number of kilobytes free. The VM uses this number to compute a pages_min value for each lowmem zone in the system. The default value is in respect to the total memory on the machine.
  • nr_hugepages — Indicates the current number of configured hugetlb pages in the kernel.
    For more information, see the following installed documentation:
    /usr/share/doc/kernel-doc-kernel_version/Documentation/vm/hugetlbpage.txt
  • nr_pdflush_threads — Indicates the number of pdflush daemons that are currently running. This file is read-only, and should not be changed by the user. Under heavy I/O loads, the default value of two is increased by the kernel.
  • overcommit_memory — Configures the conditions under which a large memory request is accepted or denied. The following three modes are available:
    • 0 — The kernel performs heuristic memory over commit handling by estimating the amount of memory available and failing requests that are blatantly invalid. Unfortunately, since memory is allocated using a heuristic rather than a precise algorithm, this setting can sometimes allow available memory on the system to be overloaded. This is the default setting.
    • 1 — The kernel performs no memory over commit handling. Under this setting, the potential for memory overload is increased, but so is performance for memory intensive tasks (such as those executed by some scientific software).
    • 2 — The kernel fails any request for memory that would cause the total address space to exceed the sum of the allocated swap space and the percentage of physical RAM specified in /proc/sys/vm/overcommit_ratio. This setting is best for those who desire less risk of memory overcommitment.

      Note

      This setting is only recommended for systems with swap areas larger than physical memory.
  • overcommit_ratio — Specifies the percentage of physical RAM considered when /proc/sys/vm/overcommit_memory is set to 2. The default value is 50.
  • page-cluster — Sets the number of pages read in a single attempt. The default value of 3, which actually relates to 16 pages, is appropriate for most systems.
  • swappiness — Determines how much a machine should swap. The higher the value, the more swapping occurs. The default value, as a percentage, is set to 60.
All kernel-based documentation can be found in the following locally installed location:
/usr/share/doc/kernel-doc-kernel_version/Documentation/, which contains additional information.