Menu Close

Red Hat Training

A Red Hat training course is available for Red Hat Enterprise Linux

Deployment Guide

Red Hat Enterprise Linux 5

Deployment, configuration and administration of Red Hat Enterprise Linux 5

Edition 11

Logo

Abstract

The Deployment Guide documents relevant information regarding the deployment, configuration, and administration of Red Hat Enterprise Linux 5.

Introduction

Welcome to the Red Hat Enterprise Linux Deployment Guide.
The Red Hat Enterprise Linux Deployment Guide contains information on how to customize your Red Hat Enterprise Linux system to fit your needs. If you are looking for a comprehensive, task-oriented guide for configuring and customizing your system, this is the manual for you.
This manual discusses many intermediate topics such as the following:
  • Setting up a network interface card (NIC)
  • Configuring a Virtual Private Network (VPN)
  • Configuring Samba shares
  • Managing your software with RPM
  • Determining information about your system
  • Upgrading your kernel
This manual is divided into the following main categories:
  • File systems
  • Package management
  • Network-related configuration
  • System configuration
  • System monitoring
  • Kernel and Driver Configuration
  • Security and Authentication
  • Red Hat Training and Certification
This guide assumes you have a basic understanding of your Red Hat Enterprise Linux system. If you need help installing Red Hat Enterprise Linux, refer to the Red Hat Enterprise Linux Installation Guide.

1. Document Conventions

In this manual, certain words are represented in different fonts, typefaces, sizes, and weights. This highlighting is systematic; different words are represented in the same style to indicate their inclusion in a specific category. The types of words that are represented this way include the following:
command
Linux commands (and other operating system commands, when used) are represented this way. This style should indicate to you that you can type the word or phrase on the command line and press Enter to invoke a command. Sometimes a command contains words that would be displayed in a different style on their own (such as file names). In these cases, they are considered to be part of the command, so the entire phrase is displayed as a command. For example:
Use the cat testfile command to view the contents of a file, named testfile, in the current working directory.
file name
File names, directory names, paths, and RPM package names are represented this way. This style indicates that a particular file or directory exists with that name on your system. Examples:
The .bashrc file in your home directory contains bash shell definitions and aliases for your own use.
The /etc/fstab file contains information about different system devices and file systems.
Install the webalizer RPM if you want to use a Web server log file analysis program.
application
This style indicates that the program is an end-user application (as opposed to system software). For example:
Use Mozilla to browse the Web.
key
A key on the keyboard is shown in this style. For example:
To use Tab completion to list particular files in a directory, type ls, then a character, and finally the Tab key. Your terminal displays the list of files in the working directory that begin with that character.
key+combination
A combination of keystrokes is represented in this way. For example:
The Ctrl+Alt+Backspace key combination exits your graphical session and returns you to the graphical login screen or the console.
text found on a GUI interface
A title, word, or phrase found on a GUI interface screen or window is shown in this style. Text shown in this style indicates a particular GUI screen or an element on a GUI screen (such as text associated with a checkbox or field). Example:
Select the Require Password checkbox if you would like your screensaver to require a password before stopping.
top level of a menu on a GUI screen or window
A word in this style indicates that the word is the top level of a pulldown menu. If you click on the word on the GUI screen, the rest of the menu should appear. For example:
Under File on a GNOME terminal, the New Tab option allows you to open multiple shell prompts in the same window.
Instructions to type in a sequence of commands from a GUI menu look like the following example:
Go to Applications (the main menu on the panel) > Programming > Emacs Text Editor to start the Emacs text editor.
button on a GUI screen or window
This style indicates that the text can be found on a clickable button on a GUI screen. For example:
Click on the Back button to return to the webpage you last viewed.
computer output
Text in this style indicates text displayed to a shell prompt such as error messages and responses to commands. For example:
The ls command displays the contents of a directory. For example:
Desktop    about.html    logs     paulwesterberg.png
Mail    backupfiles    mail     reports
The output returned in response to the command (in this case, the contents of the directory) is shown in this style.
prompt
A prompt, which is a computer's way of signifying that it is ready for you to input something, is shown in this style. Examples:
$
#
[stephen@maturin stephen]$
leopard login:
user input
Text that the user types, either on the command line or into a text box on a GUI screen, is displayed in this style. In the following example, text is displayed in this style:
To boot your system into the text based installation program, you must type in the text command at the boot: prompt.
<replaceable>
Text used in examples that is meant to be replaced with data provided by the user is displayed in this style. In the following example, <version-number> is displayed in this style:
The directory for the kernel source is /usr/src/kernels/<version-number>/, where <version-number> is the version and type of kernel installed on this system.
Additionally, we use several different strategies to draw your attention to certain pieces of information. In order of urgency, these items are marked as a note, tip, important, caution, or warning. For example:

Note

Remember that Linux is case sensitive. In other words, a rose is not a ROSE is not a rOsE.

Note

The directory /usr/share/doc/ contains additional documentation for packages installed on your system.

Important

If you modify the DHCP configuration file, the changes do not take effect until you restart the DHCP daemon.

Warning

Do not perform routine tasks as root — use a regular user account unless you need to use the root account for system administration tasks.

Warning

Be careful to remove only the necessary partitions. Removing other partitions could result in data loss or a corrupted system environment.

2. Send in Your Feedback

If you find an error in the Red Hat Enterprise Linux Deployment Guide, or if you have thought of a way to make this manual better, we would like to hear from you! Submit a report in Bugzilla (http://bugzilla.redhat.com/bugzilla/) against the component Deployment_Guide.
If you have a suggestion for improving the documentation, try to be as specific as possible. If you have found an error, include the section number and some of the surrounding text so we can find it easily.

Part I. File Systems

File system refers to the files and directories stored on a computer. A file system can have different formats called file system types. These formats determine how the information is stored as files and directories. Some file system types store redundant copies of the data, while some file system types make hard drive access faster. This part discusses the ext3, swap, RAID, and LVM file system types. It also discusses the parted utility to manage partitions and access control lists (ACLs) to customize file permissions.

Chapter 1. File System Structure

1.1. Why Share a Common Structure?

The file system structure is the most basic level of organization in an operating system. Almost all of the ways an operating system interacts with its users, applications, and security model are dependent upon the way it organizes files on storage devices. Providing a common file system structure ensures users and programs are able to access and write files.
File systems break files down into two logical categories:
  • Shareable vs. unshareable files
  • Variable vs. static files
Shareable files are those that can be accessed locally and by remote hosts; unshareable files are only available locally. Variable files, such as documents, can be changed at any time; static files, such as binaries, do not change without an action from the system administrator.
The reason for looking at files in this manner is to help correlate the function of the file with the permissions assigned to the directories which hold them. The way in which the operating system and its users interact with a given file determines the directory in which it is placed, whether that directory is mounted with read-only or read/write permissions, and the level of access each user has to that file. The top level of this organization is crucial. Access to the underlying directories can be restricted or security problems could manifest themselves if, from the top level down, it does not adhere to a rigid structure.

1.2. Overview of File System Hierarchy Standard (FHS)

Red Hat Enterprise Linux uses the Filesystem Hierarchy Standard (FHS) file system structure, which defines the names, locations, and permissions for many file types and directories.
The FHS document is the authoritative reference to any FHS-compliant file system, but the standard leaves many areas undefined or extensible. This section is an overview of the standard and a description of the parts of the file system not covered by the standard.
Compliance with the standard means many things, but the two most important are compatibility with other compliant systems and the ability to mount a /usr/ partition as read-only. This second point is important because the directory contains common executables and should not be changed by users. Also, since the /usr/ directory is mounted as read-only, it can be mounted from the CD-ROM or from another machine via a read-only NFS mount.

1.2.1. FHS Organization

The directories and files noted here are a small subset of those specified by the FHS document. Refer to the latest FHS document for the most complete information.
The complete standard is available online at http://www.pathname.com/fhs/.

1.2.1.1. The /boot/ Directory

The /boot/ directory contains static files required to boot the system, such as the Linux kernel. These files are essential for the system to boot properly.

Warning

Do not remove the /boot/ directory. Doing so renders the system unbootable.

1.2.1.2. The /dev/ Directory

The /dev/ directory contains device nodes that either represent devices that are attached to the system or virtual devices that are provided by the kernel. These device nodes are essential for the system to function properly. The udev daemon takes care of creating and removing all these device nodes in /dev/.
Devices in the /dev directory and subdirectories are either character (providing only a serial stream of input/output) or block (accessible randomly). Character devices include mouse, keyboard, modem while block devices include hard disk, floppy drive etc. If you have GNOME or KDE installed in your system, devices such as external drives or cds are automatically detected when connected (e.g via usb) or inserted (e.g via CD or DVD drive) and a popup window displaying the contents is automatically displayed. Files in the /dev directory are essential for the system to function properly.

Table 1.1. Examples of common files in the /dev

File Description
/dev/hda The master device on primary IDE channel.
/dev/hdb The slave device on primary IDE channel.
/dev/tty0 The first virtual console.
/dev/tty1 The second virtual console.
/dev/sda The first device on primary SCSI or SATA channel.
/dev/lp0 The first parallel port.

1.2.1.3. The /etc/ Directory

The /etc/ directory is reserved for configuration files that are local to the machine. No binaries are to be placed in /etc/. Any binaries that were once located in /etc/ should be placed into /sbin/ or /bin/.
Examples of directories in /etc are the X11/ and skel/:
/etc
   |- X11/
   |- skel/
The /etc/X11/ directory is for X Window System configuration files, such as xorg.conf. The /etc/skel/ directory is for "skeleton" user files, which are used to populate a home directory when a user is first created. Applications also store their configuration files in this directory and may reference them when they are executed.

1.2.1.4. The /lib/ Directory

The /lib/ directory should contain only those libraries needed to execute the binaries in /bin/ and /sbin/. These shared library images are particularly important for booting the system and executing commands within the root file system.

1.2.1.5. The /media/ Directory

The /media/ directory contains subdirectories used as mount points for removable media such as usb storage media, DVDs, CD-ROMs, and Zip disks.

1.2.1.6. The /mnt/ Directory

The /mnt/ directory is reserved for temporarily mounted file systems, such as NFS file system mounts. For all removable media, please use the /media/ directory. Automatically detected removable media will be mounted in the /media directory.

Note

The /mnt directory must not be used by installation programs.

1.2.1.7. The /opt/ Directory

The /opt/ directory provides storage for most application software packages.
A package placing files in the /opt/ directory creates a directory bearing the same name as the package. This directory, in turn, holds files that otherwise would be scattered throughout the file system, giving the system administrator an easy way to determine the role of each file within a particular package.
For example, if sample is the name of a particular software package located within the /opt/ directory, then all of its files are placed in directories inside the /opt/sample/ directory, such as /opt/sample/bin/ for binaries and /opt/sample/man/ for manual pages.
Packages that encompass many different sub-packages, data files, extra fonts, clipart etc are also located in the /opt/ directory, giving that large package a way to organize itself. In this way, our sample package may have different tools that each go in their own sub-directories, such as /opt/sample/tool1/ and /opt/sample/tool2/, each of which can have their own bin/, man/, and other similar directories.

1.2.1.8. The /proc/ Directory

The /proc/ directory contains special files that either extract information from or send information to the kernel. Examples include system memory, cpu information, hardware configuration etc.
Due to the great variety of data available within /proc/ and the many ways this directory can be used to communicate with the kernel, an entire chapter has been devoted to the subject. For more information, refer to Chapter 5, The proc File System.

1.2.1.9. The /sbin/ Directory

The /sbin/ directory stores executables used by the root user. The executables in /sbin/ are used at boot time, for system administration and to perform system recovery operations. Of this directory, the FHS says:
/sbin contains binaries essential for booting, restoring, recovering, and/or repairing the system in addition to the binaries in /bin. Programs executed after /usr/ is known to be mounted (when there are no problems) are generally placed into /usr/sbin. Locally-installed system administration programs should be placed into /usr/local/sbin.
At a minimum, the following programs should be in /sbin/:
arp, clock,
halt, init,
fsck.*, grub,
ifconfig, mingetty,
mkfs.*, mkswap,
reboot, route,
shutdown, swapoff,
swapon

1.2.1.10. The /srv/ Directory

The /srv/ directory contains site-specific data served by your system running Red Hat Enterprise Linux. This directory gives users the location of data files for a particular service, such as FTP, WWW, or CVS. Data that only pertains to a specific user should go in the /home/ directory.

1.2.1.11. The /sys/ Directory

The /sys/ directory utilizes the new sysfs virtual file system specific to the 2.6 kernel. With the increased support for hot plug hardware devices in the 2.6 kernel, the /sys/ directory contains information similarly held in /proc/, but displays a hierarchical view of specific device information in regards to hot plug devices.

1.2.1.12. The /usr/ Directory

The /usr/ directory is for files that can be shared across multiple machines. The /usr/ directory is often on its own partition and is mounted read-only. At a minimum, the following directories should be subdirectories of /usr/:
/usr
   |- bin/
   |- etc/
   |- games/
   |- include/
   |- kerberos/
   |- lib/
   |- libexec/
   |- local/
   |- sbin/
   |- share/
   |- src/
   |- tmp -> ../var/tmp/
Under the /usr/ directory, the bin/ subdirectory contains executables, etc/ contains system-wide configuration files, games is for games, include/ contains C header files, kerberos/ contains binaries and other Kerberos-related files, and lib/ contains object files and libraries that are not designed to be directly utilized by users or shell scripts. The libexec/ directory contains small helper programs called by other programs, sbin/ is for system administration binaries (those that do not belong in the /sbin/ directory), share/ contains files that are not architecture-specific, src/ is for source code.

1.2.1.13. The /usr/local/ Directory

The FHS says:
The /usr/local hierarchy is for use by the system administrator when installing software locally. It needs to be safe from being overwritten when the system software is updated. It may be used for programs and data that are shareable among a group of hosts, but not found in /usr.
The /usr/local/ directory is similar in structure to the /usr/ directory. It has the following subdirectories, which are similar in purpose to those in the /usr/ directory:
/usr/local
	|- bin/
	|- etc/
	|- games/
	|- include/
	|- lib/
	|- libexec/
	|- sbin/
	|- share/
	|- src/
In Red Hat Enterprise Linux, the intended use for the /usr/local/ directory is slightly different from that specified by the FHS. The FHS says that /usr/local/ should be where software that is to remain safe from system software upgrades is stored. Since software upgrades can be performed safely with RPM Package Manager (RPM), it is not necessary to protect files by putting them in /usr/local/. Instead, the /usr/local/ directory is used for software that is local to the machine.
For instance, if the /usr/ directory is mounted as a read-only NFS share from a remote host, it is still possible to install a package or program under the /usr/local/ directory.

1.2.1.14. The /var/ Directory

Since the FHS requires Linux to mount /usr/ as read-only, any programs that write log files or need spool/ or lock/ directories should write them to the /var/ directory. The FHS states /var/ is for:
...variable data files. This includes spool directories and files, administrative and logging data, and transient and temporary files.
Below are some of the directories found within the /var/ directory:
/var
   |- account/
   |- arpwatch/
   |- cache/
   |- crash/
   |- db/
   |- empty/
   |- ftp/
   |- gdm/
   |- kerberos/
   |- lib/
   |- local/
   |- lock/
   |- log/
   |- mail -> spool/mail/
   |- mailman/
   |- named/
   |- nis/
   |- opt/
   |- preserve/
   |- run/
   +- spool/
       |- at/
       |- clientmqueue/
       |- cron/
       |- cups/
       |- exim/
       |- lpd/
       |- mail/
       |- mailman/
       |- mqueue/
       |- news/
       |- postfix/
       |- repackage/
       |- rwho/
       |- samba/
       |- squid/
       |- squirrelmail/
       |- up2date/
       |- uucp
       |- uucppublic/
       |- vbox/
|- tmp/
|- tux/
|- www/
|- yp/
System log files, such as messages and lastlog, go in the /var/log/ directory. The /var/lib/rpm/ directory contains RPM system databases. Lock files go in the /var/lock/ directory, usually in directories for the program using the file. The /var/spool/ directory has subdirectories for programs in which data files are stored.

1.3. Special File Locations Under Red Hat Enterprise Linux

Red Hat Enterprise Linux extends the FHS structure slightly to accommodate special files.
Most files pertaining to RPM are kept in the /var/lib/rpm/ directory. For more information on RPM, refer to the chapter Chapter 12, Package Management with RPM.
The /var/cache/yum/ directory contains files used by the Package Updater, including RPM header information for the system. This location may also be used to temporarily store RPMs downloaded while updating the system. For more information about Red Hat Network, refer to Chapter 15, Registering a System and Managing Subscriptions.
Another location specific to Red Hat Enterprise Linux is the /etc/sysconfig/ directory. This directory stores a variety of configuration information. Many scripts that run at boot time use the files in this directory. Refer to Chapter 32, The sysconfig Directory for more information about what is within this directory and the role these files play in the boot process.

Chapter 2. Using the mount Command

On Linux, UNIX, and similar operating systems, file systems on different partitions and removable devices like CDs, DVDs, or USB flash drives can be attached to a certain point (that is, the mount point) in the directory tree, and detached again. To attach or detach a file system, you can use the mount or umount command respectively. This chapter describes the basic usage of these commands, and covers some advanced topics such as moving a mount point or creating shared subtrees.

2.1. Listing Currently Mounted File Systems

To display all currently attached file systems, run the mount command with no additional arguments:
mount
This command displays the list of known mount points. Each line provides important information about the device name, the file system type, the directory in which it is mounted, and relevant mount options in the following form:
device on directory type type (options)
By default, the output includes various virtual file systems such as sysfs, tmpfs, and others. To display only the devices with a certain file system type, supply the -t option on the command line:
mount -t type
For a list of common file system types, refer to Table 2.1, “Common File System Types”. For an example on how to use the mount command to list the mounted file systems, see Example 2.1, “Listing Currently Mounted ext3 File Systems”.

Example 2.1. Listing Currently Mounted ext3 File Systems

Usually, both / and /boot partitions are formatted to use ext3. To display only the mount points that use this file system, type the following at a shell prompt:
~]$ mount -t ext3
/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)
/dev/vda1 on /boot type ext3 (rw)

2.2. Mounting a File System

To attach a certain file system, use the mount command in the following form:
mount [option] device directory
When the mount command is run, it reads the content of the /etc/fstab configuration file to see if the given file system is listed. This file contains a list of device names and the directory in which the selected file systems should be mounted, as well as the file system type and mount options. Because of this, when you are mounting a file system that is specified in this file, you can use one of the following variants of the command:
mount [option] directory
mount [option] device
Note that unless you are logged in as root, you must have permissions to mount the file system (see Section 2.2.2, “Specifying the Mount Options”).

2.2.1. Specifying the File System Type

In most cases, mount detects the file system automatically. However, there are certain file systems, such as NFS (Network File System) or CIFS (Common Internet File System), that are not recognized, and need to be specified manually. To specify the file system type, use the mount command in the following form:
mount -t type device directory
Table 2.1, “Common File System Types” provides a list of common file system types that can be used with the mount command. For a complete list of all available file system types, consult the relevant manual page as referred to in Section 2.4.1, “Installed Documentation”.

Table 2.1. Common File System Types

Type Description
ext2 The ext2 file system.
ext3 The ext3 file system.
ext4 The ext4 file system.
iso9660 The ISO 9660 file system. It is commonly used by optical media, typically CDs.
jfs The JFS file system created by IBM.
nfs The NFS file system. It is commonly used to access files over the network.
nfs4 The NFSv4 file system. It is commonly used to access files over the network.
ntfs The NTFS file system. It is commonly used on machines that are running the Windows operating system.
udf The UDF file system. It is commonly used by optical media, typically DVDs.
vfat The FAT file system. It is commonly used on machines that are running the Windows operating system, and on certain digital media such as USB flash drives or floppy disks.

Example 2.2. Mounting a USB Flash Drive

Older USB flash drives often use the FAT file system. Assuming that such drive uses the /dev/sdc1 device and that the /media/flashdisk/ directory exists, you can mount it to this directory by typing the following at a shell prompt as root:
~]# mount -t vfat /dev/sdc1 /media/flashdisk

2.2.2. Specifying the Mount Options

To specify additional mount options, use the command in the following form:
mount -o options
When supplying multiple options, do not insert a space after a comma, or mount will incorrectly interpret the values following spaces as additional parameters.
Table 2.2, “Common Mount Options” provides a list of common mount options. For a complete list of all available options, consult the relevant manual page as referred to in Section 2.4.1, “Installed Documentation”.

Table 2.2. Common Mount Options

Option Description
async Allows the asynchronous input/output operations on the file system.
auto Allows the file system to be mounted automatically using the mount -a command.
defaults Provides an alias for async,auto,dev,exec,nouser,rw,suid.
exec Allows the execution of binary files on the particular file system.
loop Mounts an image as a loop device.
noauto Disallows the automatic mount of the file system using the mount -a command.
noexec Disallows the execution of binary files on the particular file system.
nouser Disallows an ordinary user (that is, other than root) to mount and unmount the file system.
remount Remounts the file system in case it is already mounted.
ro Mounts the file system for reading only.
rw Mounts the file system for both reading and writing.
user Allows an ordinary user (that is, other than root) to mount and unmount the file system.
See Example 2.3, “Mounting an ISO Image” for an example usage.

Example 2.3. Mounting an ISO Image

An ISO image (or a disk image in general) can be mounted by using the loop device. Assuming that the ISO image of the Fedora 14 installation disc is present in the current working directory and that the /media/cdrom/ directory exists, you can mount the image to this directory by running the following command as root:
~]# mount -o ro,loop Fedora-14-x86_64-Live-Desktop.iso /media/cdrom
Note that ISO 9660 is by design a read-only file system.

2.2.3. Sharing Mounts

Occasionally, certain system administration tasks require access to the same file system from more than one place in the directory tree (for example, when preparing a chroot environment). To address such requirements, the mount command implements the --bind option that provides a means for duplicating certain mounts. Its usage is as follows:
mount --bind old_directory new_directory
Although the above command allows a user to access the file system from both places, it does not apply on the file systems that are mounted within the original directory. To include these mounts as well, type:
mount --rbind old_directory new_directory
Additionally, to provide as much flexibility as possible, Red Hat Enterprise Linux 5.10 implements the functionality known as shared subtrees. This feature allows you to use the following four mount types:
Shared Mount
A shared mount allows you to create an exact replica of a given mount point. When a shared mount is created, any mount within the original mount point is reflected in it, and vice versa. To create a shared mount, type the following at a shell prompt:
mount --make-shared mount_point
Alternatively, you can change the mount type for the selected mount point and all mount points under it:
mount --make-rshared mount_point

Example 2.4. Creating a Shared Mount Point

There are two places where other file systems are commonly mounted: the /media directory for removable media, and the /mnt directory for temporarily mounted file systems. By using a shared mount, you can make these two directories share the same content. To do so, as root, mark the /media directory as shared:
~]# mount --bind /media /media
~]# mount --make-shared /media
Then create its duplicate in /mnt by using the following command:
~]# mount --bind /media /mnt
You can now verify that a mount within /media also appears in /mnt. For example, if you have non-empty media in your CD-ROM drive and the /media/cdrom/ directory exists, run the following commands:
~]# mount /dev/cdrom /media/cdrom
~]# ls /media/cdrom
EFI  GPL  isolinux  LiveOS
~]# ls /mnt/cdrom
EFI  GPL  isolinux  LiveOS
Similarly, you can verify that any file system mounted in the /mnt directory is reflected in /media. For instance, if you have a non-empty USB flash drive that uses the /dev/sdc1 device plugged in and the /mnt/flashdisk/ directory is present, type:
~]# mount /dev/sdc1 /mnt/flashdisk
~]# ls /media/flashdisk
en-US  publican.cfg
~]# ls /mnt/flashdisk
en-US  publican.cfg
Slave Mount
A slave mount allows you to create a limited duplicate of a given mount point. When a slave mount is created, any mount within the original mount point is reflected in it, but no mount within a slave mount is reflected in its original. To create a slave mount, type the following at a shell prompt:
mount --make-slave mount_point
Alternatively, you can change the mount type for the selected mount point and all mount points under it:
mount --make-rslave mount_point

Example 2.5. Creating a Slave Mount Point

Imagine you want the content of the /media directory to appear in /mnt as well, but you do not want any mounts in the /mnt directory to be reflected in /media. To do so, as root, first mark the /media directory as shared:
~]# mount --bind /media /media
~]# mount --make-shared /media
Then create its duplicate in /mnt, but mark it as slave:
~]# mount --bind /media /mnt
~]# mount --make-slave /mnt
You can now verify that a mount within /media also appears in /mnt. For example, if you have non-empty media in your CD-ROM drive and the /media/cdrom/ directory exists, run the following commands:
~]# mount /dev/cdrom /media/cdrom
~]# ls /media/cdrom
EFI  GPL  isolinux  LiveOS
~]# ls /mnt/cdrom
EFI  GPL  isolinux  LiveOS
You can also verify that file systems mounted in the /mnt directory are not reflected in /media. For instance, if you have a non-empty USB flash drive that uses the /dev/sdc1 device plugged in and the /mnt/flashdisk/ directory is present, type: :
~]# mount /dev/sdc1 /mnt/flashdisk
~]# ls /media/flashdisk
~]# ls /mnt/flashdisk
en-US  publican.cfg
Private Mount
A private mount allows you to create an ordinary mount. When a private mount is created, no subsequent mounts within the original mount point are reflected in it, and no mount within a private mount is reflected in its original. To create a private mount, type the following at a shell prompt:
mount --make-private mount_point
Alternatively, you can change the mount type for the selected mount point and all mount points under it:
mount --make-rprivate mount_point

Example 2.6. Creating a Private Mount Point

Taking into account the scenario in Example 2.4, “Creating a Shared Mount Point”, assume that you have previously created a shared mount point by using the following commands as root:
~]# mount --bind /media /media
~]# mount --make-shared /media
~]# mount --bind /media /mnt
To mark the /mnt directory as private, type:
~]# mount --make-private /mnt
You can now verify that none of the mounts within /media appears in /mnt. For example, if you have non-empty media in your CD-ROM drive and the /media/cdrom/ directory exists, run the following commands:
~]# mount /dev/cdrom /media/cdrom
~]# ls /media/cdrom
EFI  GPL  isolinux  LiveOS
~]# ls /mnt/cdrom
~]#
You can also verify that file systems mounted in the /mnt directory are not reflected in /media. For instance, if you have a non-empty USB flash drive that uses the /dev/sdc1 device plugged in and the /mnt/flashdisk/ directory is present, type:
~]# mount /dev/sdc1 /mnt/flashdisk
~]# ls /media/flashdisk
~]# ls /mnt/flashdisk
en-US  publican.cfg
Unbindable Mount
An unbindable mount allows you to prevent a given mount point from being duplicated whatsoever. To create an unbindable mount, type the following at a shell prompt:
mount --make-unbindable mount_point
Alternatively, you can change the mount type for the selected mount point and all mount points under it:
mount --make-runbindable mount_point

Example 2.7. Creating an Unbindable Mount Point

To prevent the /media directory from being shared, as root, type the following at a shell prompt:
~]# mount --bind /media /media
~]# mount --make-unbindable /media
This way, any subsequent attempt to make a duplicate of this mount will fail with an error:
~]# mount --bind /media /mnt
mount: wrong fs type, bad option, bad superblock on /media/,
       missing code page or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

2.2.4. Moving a Mount Point

To change the directory in which a file system is mounted, use the following command:
mount --move old_directory new_directory

Example 2.8. Moving an Existing NFS Mount Point

Imagine that you have an NFS storage that contains user directories. Assuming that this storage is already mounted in /mnt/userdirs/, as root, you can move this mount point to /home by using the following command:
~]# mount --move /mnt/userdirs /home
To verify the mount point has been moved, list the content of both directories:
~]# ls /mnt/userdirs
~]# ls /home
jill  joe

2.3. Unmounting a File System

To detach a previously mounted file system, use either of the following variants of the umount command:
umount directory
umount device
Note that unless you are logged in as root, you must have permissions to unmount the file system (see Section 2.2.2, “Specifying the Mount Options”). See Example 2.9, “Unmounting a CD” for an example usage.

Important

When a file system is in use (for example, when a process is reading a file on this file system), running the umount command will fail with an error. To determine which processes are accessing the file system, use the fuser command in the following form:
fuser -m directory
For example, to list the processes that are accessing a file system mounted to the /media/cdrom/ directory, type:
~]$ fuser -m /media/cdrom
/media/cdrom:         1793  2013  2022  2435 10532c 10672c

Example 2.9. Unmounting a CD

To unmount a CD that was previously mounted to the /media/cdrom/ directory, type the following at a shell prompt:
~]$ umount /media/cdrom

2.4. Additional Resources

The following resources provide an in-depth documentation on the subject.

2.4.1. Installed Documentation

  • man 8 mount — The manual page for the mount command that provides a full documentation on its usage.
  • man 8 umount — The manual page for the umount command that provides a full documentation on its usage.
  • man 5 fstab — The manual page providing a thorough description of the /etc/fstab file format.

2.4.2. Useful Websites

  • Shared subtrees — An LWN article covering the concept of shared subtrees.
  • sharedsubtree.txt — Extensive documentation that is shipped with the shared subtrees patches.

Chapter 3. The ext3 File System

The default file system is the journaling ext3 file system.

3.1. Features of ext3

The ext3 file system is essentially an enhanced version of the ext2 file system. These improvements provide the following advantages:
Availability
After an unexpected power failure or system crash (also called an unclean system shutdown), each mounted ext2 file system on the machine must be checked for consistency by the e2fsck program. This is a time-consuming process that can delay system boot time significantly, especially with large volumes containing a large number of files. During this time, any data on the volumes is unreachable.
The journaling provided by the ext3 file system means that this sort of file system check is no longer necessary after an unclean system shutdown. The only time a consistency check occurs using ext3 is in certain rare hardware failure cases, such as hard drive failures. The time to recover an ext3 file system after an unclean system shutdown does not depend on the size of the file system or the number of files; rather, it depends on the size of the journal used to maintain consistency. The default journal size takes about a second to recover, depending on the speed of the hardware.
Data Integrity
The ext3 file system prevents loss of data integrity in the event that an unclean system shutdown occurs. The ext3 file system allows you to choose the type and level of protection that your data receives. By default, the ext3 volumes are configured to keep a high level of data consistency with regard to the state of the file system.
Speed
Despite writing some data more than once, ext3 has a higher throughput in most cases than ext2 because ext3's journaling optimizes hard drive head motion. You can choose from three journaling modes to optimize speed, but doing so means trade-offs in regards to data integrity if the system was to fail.
Easy Transition
It is easy to migrate from ext2 to ext3 and gain the benefits of a robust journaling file system without reformatting. Refer to Section 3.3, “Converting to an ext3 File System” for more on how to perform this task.
The following sections walk you through the steps for creating and tuning ext3 partitions. For ext2 partitions, skip the partitioning and formatting sections below and go directly to Section 3.3, “Converting to an ext3 File System”.

3.2. Creating an ext3 File System

After installation, it is sometimes necessary to create a new ext3 file system. For example, if you add a new disk drive to the system, you may want to partition the drive and use the ext3 file system.
The steps for creating an ext3 file system are as follows:
  1. Format the partition with the ext3 file system using mkfs.
  2. Label the partition using e2label.

3.3. Converting to an ext3 File System

The tune2fs allows you to convert an ext2 filesystem to ext3.

Note

Always use the e2fsck utility to check your filesystem before and after using tune2fs. A default installation of Red Hat Enterprise Linux uses ext3 for all file systems.
To convert an ext2 filesystem to ext3, log in as root and type the following command in a terminal:
tune2fs -j <block_device>
where <block_device> contains the ext2 filesystem you wish to convert.
A valid block device could be one of two types of entries:
  • A mapped device — A logical volume in a volume group, for example, /dev/mapper/VolGroup00-LogVol02.
  • A static device — A traditional storage volume, for example, /dev/hdbX, where hdb is a storage device name and X is the partition number.
Issue the df command to display mounted file systems.
For the remainder of this section, the sample commands use the following value for the block device:
/dev/mapper/VolGroup00-LogVol02
You must recreate the initrd image so that it will contain the ext3 kernel module. To create this, run the mkinitrd program. For information on using the mkinitrd command, type man mkinitrd. Also, make sure your GRUB configuration loads the initrd.
If you fail to make this change, the system still boots, but the file system is mounted as ext2 instead of ext3.

3.4. Reverting to an ext2 File System

If you wish to revert a partition from ext3 to ext2 for any reason, you must first unmount the partition by logging in as root and typing,
umount /dev/mapper/VolGroup00-LogVol02
Next, change the file system type to ext2 by typing the following command as root:
tune2fs -O ^has_journal /dev/mapper/VolGroup00-LogVol02
Check the partition for errors by typing the following command as root:
e2fsck -y /dev/mapper/VolGroup00-LogVol02
Then mount the partition again as ext2 file system by typing:
mount -t ext2 /dev/mapper/VolGroup00-LogVol02 /mount/point
In the above command, replace /mount/point with the mount point of the partition.
Next, remove the .journal file at the root level of the partition by changing to the directory where it is mounted and typing:
rm -f .journal
You now have an ext2 partition.
If you want to permanently change the partition to ext2, remember to update the /etc/fstab file.

Chapter 4. The ext4 File System

4.1. Features of ext4

The ext4 file system is a scalable extension of the ext3 file system, which is the default file system of Red Hat Enterprise Linux 5. The ext4 file system can support files and file systems of up to 16 terabytes in size. It also supports an unlimited number of sub-directories (the ext3 file system only supports up to 32,000), though once the link count exceeds 65,000 it resets to 1 and is no longer increased. The following are the most important features of ext4:
Main Features
The ext4 file system uses extents (as opposed to the traditional block mapping scheme used by ext2 and ext3), which improves performance when using large files and reduces metadata overhead for large files. In addition, ext4 also labels unallocated block groups and inode table sections accordingly, which allows them to be skipped during a file system check. This makes for quicker file system checks, which becomes more beneficial as the file system grows in size.
Allocation Features
The ext4 file system features the following allocation schemes:
  • Persistent pre-allocation
  • Delayed allocation
  • Multi-block allocation
  • Stripe-aware allocation
Because of delayed allocation and other performance optimizations, ext4's behavior of writing files to disk is different from ext3. In ext4, a program's writes to the file system are not guaranteed to be on-disk unless the program issues an fsync() call afterwards.
By default, ext3 automatically forces newly created files to disk almost immediately even without fsync(). This behavior hid bugs in programs that did not use fsync() to ensure that written data was on-disk. The ext4 file system, on the other hand, often waits several seconds to write out changes to disk, allowing it to combine and reorder writes for better disk performance than ext3.

Warning

Unlike ext3, the ext4 file system does not force data to disk on transaction commit. As such, it takes longer for buffered writes to be flushed to disk. As with any file system, use data integrity calls such as fsync() to ensure that data is written to permanent storage.
Other ext4 Features
The ext4 file system also supports the following:
  • Extended attributes (xattr), which allows the system to associate several additional name/value pairs per file.
  • Quota journaling, which avoids the need for lengthy quota consistency checks after a crash.

    Note

    The only supported journaling mode in ext4 is data=ordered (default).
  • Subsecond timestamps, which allow to specify inode timestamp fields in nanosecond resolution.

4.2. Managing an ext4 File System

In order to manage ext4 file systems on Red Hat Eterprise Linux 5, it is necessary to install the e4fsprogs package. You can use the Yum utility to install the package:
~]# yum install e4fsprogs
The e4fsprogs package contains renamed static binaries from the equivalent upstream e2fsprogs release. This has been done to ensure stability of the e2fsprogs core utilities with all the changes for ext4 included. The most important of these utilities are:
  • mke4fs — A utility used to create an ext4 file system.
  • mkfs.ext4 — Another command used to create an ext4 file system.
  • e4fsck — A utility used to repair inconsistencies of an ext4 file system.
  • tune4fs — A utility used to modify ext4 file system attributes.
  • resize4fs — A utility used to resize an ext4 file system.
  • e4label — A utility used to display or modify the label of the ext4 file system.
  • dumpe4fs — A utility used to display the super block and blocks group information for the ext4 file system.
  • debuge4fs — An interactive file system debugger, used to examine ext4 file systems, manually repair corrupted file systems and create test cases for e4fsck.
The following sections walk you through the steps for creating and tuning ext4 partitions.

4.3. Creating an ext4 File System

After installation, it is sometimes necessary to create a new ext4 file system. For example, if you add a new disk drive to the system, you may want to partition the drive and use the ext4 file system.
The default options are optimal for most usage scenarios but if you need to set your ext4 file system in a specific way, see manual pages for the mke4fs and mkfs.ext4 commands for available options. Also, you may want to examine and modify the configuration file of mke4fs, /etc/mke4fs.conf, if you plan to create ext4 file systems more often.
The steps for creating an ext4 file system are as follows:
  1. Format the partition with the ext4 file system using the mkfs.ext4 or mke4fs command:
    ~]# mkfs.ext4 block_device
    ~]# mke4fs -t ext4 block_device
    where block_device is a partition which will contain the ext4 filesystem you wish to create.
  2. Label the partition using the e4label command.
    ~]# e4label <block_device> new-label
  3. Create a mount point and mount the new file system to that mount point:
    ~]# mkdir /mount/point
    ~]# mount block_device /mount/point
A valid block device could be one of two types of entries:
  • A mapped device — A logical volume in a volume group, for example, /dev/mapper/VolGroup00-LogVol02.
  • A static device — A traditional storage volume, for example, /dev/hdbX, where hdb is a storage device name and X is the partition number.
For striped block devices (for example RAID5 arrays), the stripe geometry can be specified at the time of file system creation. Using proper stripe geometry greatly enhances performance of an ext4 file system.
When creating file systems on lvm or md volumes, mkfs.ext4 chooses an optimal geometry. This may also be true on some hardware RAIDs which export geometry information to the operating system.
To specify stripe geometry, use the -E option of mkfs.ext4 (that is, extended file system options) with the following sub-options:
stride=value
Specifies the RAID chunk size.
stripe-width=value
Specifies the number of data disks in a RAID device, or the number of stripe units in the stripe.
For both sub-options, value must be specified in file system block units. For example, to create a file system with a 64k stride (that is, 16 x 4096) on a 4k-block file system, use the following command:
~]# mkfs.ext4 -E stride=16,stripe-width=64 block_device
For more information about creating file systems, refer to man mkfs.ext4.

4.4. Mounting an ext4 File System

An ext4 file system can be mounted with no extra options, same as any other file system:
~]# mount block_device /mount/point
The default mount options are optimal for most users. Options, such as acl, noacl, data, quota, noquota, user_xattr, nouser_xattr, and many others that were already used with the ext2 and ext3 file systems, are backward compatible and have the same usage and functionality. Also, with the ext4 file system, several new ext4-specific mount options have been added, for example:
barrier / nobarrier
By default, ext4 uses write barriers to ensure file system integrity even when power is lost to a device with write caches enabled. For devices without write caches, or with battery-backed write caches, you disable barriers using the nobarrier option:
~]# mount -o nobarrier block_device /mount/point
stripe=value
This option allows you to specify the number of file system blocks allocated for a single file operation. For RAID5 this number should be equal the RAID chunk size multiplied by the number of disks.
journal_ioprio=value
This option allows you to set priority of I/O operations submitted during a commit operation. The option can have a value from 7 to 0 (0 is the highest priority), and is set to 3 by default, which is slightly higher priority than the default I/O priority.
Default mount options can be also set in the file system superblock using the tune4fs utility. For example, the following command sets the file system on the /dev/mapper/VolGroup00-LogVol02 device to be mounted by default with debugging disabled and user-specified extended attributes and Posix access control lists enabled:
~]# tune4fs -o ^debug,user_xattr,acl /dev/mapper/VolGroup00-LogVol02
For more information on this topic, refer to the tune4fs(8) manual page.
An ext3 file system can also be mounted as ext4 without changing the format, allowing it to be mounted as ext3 again in the future. To do so, run the following command on a block device that contains an ext3 file system:
~]# mount -t ext4 block_device /mount/point
Doing so will only allow the ext3 file system to use ext4-specific features that do not require a file format conversion. These features include delayed allocation and multi-block allocation, and exclude features such as extent mapping.

Warning

Using the ext4 driver to mount an ext3 file system has not been fully tested on Red Hat Enterprise Linux 5. Therefore, this action is not supported because Red Hat cannot guarantee consistent performance and predictable behavior for ext3 file systems in this way.
For more information on mount options for the ext4 file system, see Section 2.2.2, “Specifying the Mount Options” and the mount(8) manual page.

Note

If you want to enable persistent mounting of the file system, remember to update the /etc/fstab file accordingly. For example:
/dev/mapper/VolGroup00-LogVol02    /test    ext4    defaults    0 0

4.5. Resizing an ext4 File System

Before growing an ext4 file system, ensure that the underlying block device is of an appropriate size to hold the file system later. Use the appropriate resizing methods for the affected block device.
When grown, the ext4 filesystem can be mounted. When shrunk, the ext4 file system has to be unmounted. You can resize an ext4 file system using the resize4fs command:
~]# resize4fs block_devicenew_size
When resizing an ext4 file system, the resize2fs utility reads the size in units of file system block size, unless a suffix indicating a specific unit is used. The following suffixes indicate specific units:
  • s — 512 byte sectors
  • K — kilobytes
  • M — megabytes
  • G — gigabytes
The size parameter is optional (and often redundant) when expanding. The resize4fs automatically expands to fill all available space of the container, usually a logical volume or partition. For more information about resizing an ext4 file system, refer to the resize4fs(8) manual page.

Chapter 5. The proc File System

The Linux kernel has two primary functions: to control access to physical devices on the computer and to schedule when and how processes interact with these devices. The /proc/ directory — also called the proc file system — contains a hierarchy of special files which represent the current state of the kernel — allowing applications and users to peer into the kernel's view of the system.
Within the /proc/ directory, one can find a wealth of information detailing the system hardware and any processes currently running. In addition, some of the files within the /proc/ directory tree can be manipulated by users and applications to communicate configuration changes to the kernel.

5.1. A Virtual File System

Under Linux, all data are stored as files. Most users are familiar with the two primary types of files: text and binary. But the /proc/ directory contains another type of file called a virtual file. It is for this reason that /proc/ is often referred to as a virtual file system.
These virtual files have unique qualities. Most of them are listed as zero bytes in size and yet when one is viewed, it can contain a large amount of information. In addition, most of the time and date settings on virtual files reflect the current time and date, indicative of the fact they are constantly updated.
Virtual files such as /proc/interrupts, /proc/meminfo, /proc/mounts, and /proc/partitions provide an up-to-the-moment glimpse of the system's hardware. Others, like the /proc/filesystems file and the /proc/sys/ directory provide system configuration information and interfaces.
For organizational purposes, files containing information on a similar topic are grouped into virtual directories and sub-directories. For instance, /proc/ide/ contains information for all physical IDE devices. Likewise, process directories contain information about each running process on the system.

5.1.1. Viewing Virtual Files

By using the cat, more, or less commands on files within the /proc/ directory, users can immediately access enormous amounts of information about the system. For example, to display the type of CPU a computer has, type cat /proc/cpuinfo to receive output similar to the following:
processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 5
model		: 9
model name	: AMD-K6(tm) 3D+
Processor stepping	: 1 cpu
MHz		: 400.919
cache size	: 256 KB
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 1
wp		: yes
flags		: fpu vme de pse tsc msr mce cx8 pge mmx syscall 3dnow k6_mtrr
bogomips	: 799.53
When viewing different virtual files in the /proc/ file system, some of the information is easily understandable while some is not human-readable. This is in part why utilities exist to pull data from virtual files and display it in a useful way. Examples of these utilities include lspci, apm, free, and top.

Note

Some of the virtual files in the /proc/ directory are readable only by the root user.

5.1.2. Changing Virtual Files

As a general rule, most virtual files within the /proc/ directory are read-only. However, some can be used to adjust settings in the kernel. This is especially true for files in the /proc/sys/ subdirectory.
To change the value of a virtual file, use the echo command and a greater than symbol (>) to redirect the new value to the file. For example, to change the hostname on the fly, type:
echo www.example.com > /proc/sys/kernel/hostname 
Other files act as binary or Boolean switches. Typing cat /proc/sys/net/ipv4/ip_forward returns either a 0 or a 1. A 0 indicates that the kernel is not forwarding network packets. Using the echo command to change the value of the ip_forward file to 1 immediately turns packet forwarding on.

Note

Another command used to alter settings in the /proc/sys/ subdirectory is /sbin/sysctl. For more information on this command, refer to Section 5.4, “Using the sysctl Command”
For a listing of some of the kernel configuration files available in the /proc/sys/ subdirectory, refer to Section 5.3.9, “ /proc/sys/.

5.1.3. Restricting Access to Process Directories

On multi-user systems, it is often useful to secure the process directories stored in /proc/ so that they can be viewed only by the root user. You can restrict the access to these directories with the use of the hidepid option.
To change the file system parameters, you can use the mount command with the -o remount option. As root, type:
mount -o remount,hidepid=value /proc
Here, value passed to hidepid is one of:
  • 0 (default) — every user can read all world-readable files stored in a process directory.
  • 1 — users can access only their own process directories. This protects the sensitive files like cmdline, sched, or status from access by non-root users. This setting does not affect the actual file permissions.
  • 2 — process files are invisible to non-root users. The existence of a process can be learned by other means, but its effective UID and GID is hidden. Hiding these IDs complicates an intruder's task of gathering information about running processes.

Example 5.1. Restricting access to process directories

To make process files accessible only to the root user, type:
~]# mount -o remount,hidepid=1 /proc
With hidepid=1, a non-root user cannot access the contents of process directories. An attempt to do so fails with the following message:
~]$ ls /proc/1/       
ls: /proc/1/: Operation not permitted
With hidepid=2 enabled, process directories are made invisible to non-root users:
~]$ ls /proc/1/       
ls: /proc/1/: No such file or directory
Also, you can specify a user group that will have access to process files even when hidepid is set to 1 or 2. To do this, use the gid option. As root, type:
mount -o remount,hidepid=value,gid=gid /proc
Replace gid with the specific group id. For members of selected group, the process files will act as if hidepid was set to 0. However, users which are not supposed to monitor the tasks in the whole system should not be added to the group. For more information on managing users and groups see Chapter 37, Users and Groups.

5.2. Top-level Files within the proc File System

Below is a list of some of the more useful virtual files in the top-level of the /proc/ directory.

Note

In most cases, the content of the files listed in this section are not the same as those installed on your machine. This is because much of the information is specific to the hardware on which Red Hat Enterprise Linux is running for this documentation effort.

5.2.1.  /proc/apm

This file provides information about the state of the Advanced Power Management (APM) system and is used by the apm command. If a system with no battery is connected to an AC power source, this virtual file would look similar to the following:
1.16 1.2 0x07 0x01 0xff 0x80 -1% -1 ?
Running the apm -v command on such a system results in output similar to the following:
APM BIOS 1.2 (kernel driver 1.16ac) AC on-line, no system battery
For systems which do not use a battery as a power source, apm is able do little more than put the machine in standby mode. The apm command is much more useful on laptops. For example, the following output is from the command cat /proc/apm on a laptop while plugged into a power outlet:
1.16 1.2 0x03 0x01 0x03 0x09 100% -1 ?
When the same laptop is unplugged from its power source for a few minutes, the content of the apm file changes to something like the following:
1.16 1.2 0x03 0x00 0x00 0x01 99% 1792 min
The apm -v command now yields more useful data, such as the following:
APM BIOS 1.2 (kernel driver 1.16) AC off-line, battery status high: 99% (1 day, 5:52)

5.2.2.  /proc/buddyinfo

This file is used primarily for diagnosing memory fragmentation issues. Using the buddy algorithm, each column represents the number of pages of a certain order (a certain size) that are available at any given time. For example, for zone DMA (direct memory access), there are 90 of 2^(0*PAGE_SIZE) chunks of memory. Similarly, there are 6 of 2^(1*PAGE_SIZE) chunks, and 2 of 2^(2*PAGE_SIZE) chunks of memory available.
The DMA row references the first 16 MB on a system, the HighMem row references all memory greater than 4 GB on a system, and the Normal row references all memory in between.
The following is an example of the output typical of /proc/buddyinfo:
Node 0, zone      DMA     90      6      2      1      1      ...
Node 0, zone   Normal   1650    310      5      0      0      ...
Node 0, zone  HighMem      2      0      0      1      1      ...

5.2.3.  /proc/cmdline

This file shows the parameters passed to the kernel at the time it is started. A sample /proc/cmdline file looks like the following:
ro root=/dev/VolGroup00/LogVol00 rhgb quiet 3
This output tells us the following:
ro
The root device is mounted read-only at boot time. The presence of ro on the kernel boot line overrides any instances of rw.
root=/dev/VolGroup00/LogVol00
This tells us on which disk device or, in this case, on which logical volume, the root filesystem image is located. With our sample /proc/cmdline output, the root filesystem image is located on the first logical volume (LogVol00) of the first LVM volume group (VolGroup00). On a system not using Logical Volume Management, the root file system might be located on /dev/sda1 or /dev/sda2, meaning on either the first or second partition of the first SCSI or SATA disk drive, depending on whether we have a separate (preceding) boot or swap partition on that drive.
For more information on LVM used in Red Hat Enterprise Linux, refer to http://www.tldp.org/HOWTO/LVM-HOWTO/index.html.
rhgb
A short lowercase acronym that stands for Red Hat Graphical Boot, providing "rhgb" on the kernel command line signals that graphical booting is supported, assuming that /etc/inittab shows that the default runlevel is set to 5 with a line like this:
id:5:initdefault:
quiet
Indicates that all verbose kernel messages except those which are extremely serious should be suppressed at boot time.

5.2.4.  /proc/cpuinfo

This virtual file identifies the type of processor used by your system. The following is an example of the output typical of /proc/cpuinfo:
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 15
model		: 2
model name	: Intel(R) Xeon(TM) CPU 2.40GHz
stepping	: 7 cpu
MHz		: 2392.371
cache size	: 512 KB
physical id	: 0
siblings	: 2
runqueue	: 0
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 2
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca  cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips	: 4771.02
  • processor — Provides each processor with an identifying number. On systems that have one processor, only a 0 is present.
  • cpu family — Authoritatively identifies the type of processor in the system. For an Intel-based system, place the number in front of "86" to determine the value. This is particularly helpful for those attempting to identify the architecture of an older system such as a 586, 486, or 386. Because some RPM packages are compiled for each of these particular architectures, this value also helps users determine which packages to install.
  • model name — Displays the common name of the processor, including its project name.
  • cpu MHz — Shows the precise speed in megahertz for the processor to the thousandths decimal place.
  • cache size — Displays the amount of level 2 memory cache available to the processor.
  • siblings — Displays the number of sibling CPUs on the same physical CPU for architectures which use hyper-threading.
  • flags — Defines a number of different qualities about the processor, such as the presence of a floating point unit (FPU) and the ability to process MMX instructions.

5.2.5.  /proc/crypto

This file lists all installed cryptographic ciphers used by the Linux kernel, including additional details for each. A sample /proc/crypto file looks like the following:
name         : sha1
module       : kernel
type         : digest
blocksize    : 64
digestsize   : 20
name         : md5
module       : md5
type         : digest
blocksize    : 64
digestsize   : 16

5.2.6.  /proc/devices

This file displays the various character and block devices currently configured (not including devices whose modules are not loaded). Below is a sample output from this file:
Character devices:
  1 mem
  4 /dev/vc/0
  4 tty
  4 ttyS
  5 /dev/tty
  5 /dev/console
  5 /dev/ptmx
  7 vcs
  10 misc
  13 input
  29 fb
  36 netlink
  128 ptm
  136 pts
  180 usb

Block devices:
  1 ramdisk
  3 ide0
  9 md
  22 ide1
  253 device-mapper
  254 mdp
The output from /proc/devices includes the major number and name of the device, and is broken into two major sections: Character devices and Block devices.
Character devices are similar to block devices, except for two basic differences:
  1. Character devices do not require buffering. Block devices have a buffer available, allowing them to order requests before addressing them. This is important for devices designed to store information — such as hard drives — because the ability to order the information before writing it to the device allows it to be placed in a more efficient order.
  2. Character devices send data with no preconfigured size. Block devices can send and receive information in blocks of a size configured per device.
For more information about devices refer to the following installed documentation:
/usr/share/doc/kernel-doc-<version>/Documentation/devices.txt

5.2.7.  /proc/dma

This file contains a list of the registered ISA DMA channels in use. A sample /proc/dma files looks like the following:
4: cascade

5.2.8.  /proc/execdomains

This file lists the execution domains currently supported by the Linux kernel, along with the range of personalities they support.
0-0   Linux           [kernel]
Think of execution domains as the "personality" for an operating system. Because other binary formats, such as Solaris, UnixWare, and FreeBSD, can be used with Linux, programmers can change the way the operating system treats system calls from these binaries by changing the personality of the task. Except for the PER_LINUX execution domain, different personalities can be implemented as dynamically loadable modules.

5.2.9.  /proc/fb

This file contains a list of frame buffer devices, with the frame buffer device number and the driver that controls it. Typical output of /proc/fb for systems which contain frame buffer devices looks similar to the following:
0 VESA VGA

5.2.10.  /proc/filesystems

This file displays a list of the file system types currently supported by the kernel. Sample output from a generic /proc/filesystems file looks similar to the following:
nodev   sysfs
nodev   rootfs
nodev   bdev
nodev   proc
nodev   sockfs
nodev   binfmt_misc
nodev   usbfs
nodev   usbdevfs
nodev   futexfs
nodev   tmpfs
nodev   pipefs
nodev   eventpollfs
nodev   devpts
	ext2
nodev   ramfs
nodev   hugetlbfs
	iso9660
nodev   mqueue
	ext3
nodev   rpc_pipefs
nodev   autofs
The first column signifies whether the file system is mounted on a block device. Those beginning with nodev are not mounted on a device. The second column lists the names of the file systems supported.
The mount command cycles through the file systems listed here when one is not specified as an argument.

5.2.11.  /proc/interrupts

This file records the number of interrupts per IRQ on the x86 architecture. A standard /proc/interrupts looks similar to the following:
  CPU0
  0:   80448940          XT-PIC  timer
  1:     174412          XT-PIC  keyboard
  2:          0          XT-PIC  cascade
  8:          1          XT-PIC  rtc
 10:     410964          XT-PIC  eth0
 12:      60330          XT-PIC  PS/2 Mouse
 14:    1314121          XT-PIC  ide0
 15:    5195422          XT-PIC  ide1
NMI:          0
ERR:          0
For a multi-processor machine, this file may look slightly different:
	   CPU0       CPU1
  0: 1366814704          0          XT-PIC  timer
  1:        128        340    IO-APIC-edge  keyboard
  2:          0          0          XT-PIC  cascade
  8:          0          1    IO-APIC-edge  rtc
 12:       5323       5793    IO-APIC-edge  PS/2 Mouse
 13:          1          0          XT-PIC  fpu
 16:   11184294   15940594   IO-APIC-level  Intel EtherExpress Pro 10/100 Ethernet
 20:    8450043   11120093   IO-APIC-level  megaraid
 30:      10432      10722   IO-APIC-level  aic7xxx
 31:         23         22   IO-APIC-level  aic7xxx
NMI:          0
ERR:          0
The first column refers to the IRQ number. Each CPU in the system has its own column and its own number of interrupts per IRQ. The next column reports the type of interrupt, and the last column contains the name of the device that is located at that IRQ.
Each of the types of interrupts seen in this file, which are architecture-specific, mean something different. For x86 machines, the following values are common:
  • XT-PIC — This is the old AT computer interrupts.
  • IO-APIC-edge — The voltage signal on this interrupt transitions from low to high, creating an edge, where the interrupt occurs and is only signaled once. This kind of interrupt, as well as the IO-APIC-level interrupt, are only seen on systems with processors from the 586 family and higher.
  • IO-APIC-level — Generates interrupts when its voltage signal is high until the signal is low again.

5.2.12.  /proc/iomem

This file shows you the current map of the system's memory for each physical device:
00000000-0009fbff : System RAM
0009fc00-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000f0000-000fffff : System ROM
00100000-07ffffff : System RAM
00100000-00291ba8 : Kernel code
00291ba9-002e09cb : Kernel data
e0000000-e3ffffff : VIA Technologies, Inc. VT82C597 [Apollo VP3] e4000000-e7ffffff : PCI Bus #01
e4000000-e4003fff : Matrox Graphics, Inc. MGA G200 AGP
e5000000-e57fffff : Matrox Graphics, Inc. MGA G200 AGP
e8000000-e8ffffff : PCI Bus #01
e8000000-e8ffffff : Matrox Graphics, Inc. MGA G200 AGP
ea000000-ea00007f : Digital Equipment Corporation DECchip 21140 [FasterNet]
ea000000-ea00007f : tulip ffff0000-ffffffff : reserved
The first column displays the memory registers used by each of the different types of memory. The second column lists the kind of memory located within those registers and displays which memory registers are used by the kernel within the system RAM or, if the network interface card has multiple Ethernet ports, the memory registers assigned for each port.

5.2.13.  /proc/ioports

The output of /proc/ioports provides a list of currently registered port regions used for input or output communication with a device. This file can be quite long. The following is a partial listing:
0000-001f : dma1
0020-003f : pic1
0040-005f : timer
0060-006f : keyboard
0070-007f : rtc
0080-008f : dma page reg
00a0-00bf : pic2
00c0-00df : dma2
00f0-00ff : fpu
0170-0177 : ide1
01f0-01f7 : ide0
02f8-02ff : serial(auto)
0376-0376 : ide1
03c0-03df : vga+
03f6-03f6 : ide0
03f8-03ff : serial(auto)
0cf8-0cff : PCI conf1
d000-dfff : PCI Bus #01
e000-e00f : VIA Technologies, Inc. Bus Master IDE
e000-e007 : ide0
e008-e00f : ide1
e800-e87f : Digital Equipment Corporation DECchip 21140 [FasterNet]
e800-e87f : tulip
The first column gives the I/O port address range reserved for the device listed in the second column.

5.2.14.  /proc/kcore

This file represents the physical memory of the system and is stored in the core file format. Unlike most /proc/ files, kcore displays a size. This value is given in bytes and is equal to the size of the physical memory (RAM) used plus 4 KB.
The contents of this file are designed to be examined by a debugger, such as gdb, and is not human readable.

Warning

Do not view the /proc/kcore virtual file. The contents of the file scramble text output on the terminal. If this file is accidentally viewed, press Ctrl+C to stop the process and then type reset to bring back the command line prompt.

5.2.15.  /proc/kmsg

This file is used to hold messages generated by the kernel. These messages are then picked up by other programs, such as /sbin/klogd or /bin/dmesg.

5.2.16.  /proc/loadavg

This file provides a look at the load average in regard to both the CPU and IO over time, as well as additional data used by uptime and other commands. A sample /proc/loadavg file looks similar to the following:
0.20 0.18 0.12 1/80 11206
The first three columns measure CPU and IO utilization of the last one, five, and 15 minute periods. The fourth column shows the number of currently running processes and the total number of processes. The last column displays the last process ID used.
In addition, load average also refers to the number of processes ready to run (i.e. in the run queue, waiting for a CPU share.

5.2.17.  /proc/locks

This file displays the files currently locked by the kernel. The contents of this file contain internal kernel debugging data and can vary tremendously, depending on the use of the system. A sample /proc/locks file for a lightly loaded system looks similar to the following:
1: POSIX  ADVISORY  WRITE 3568 fd:00:2531452 0 EOF
2: FLOCK  ADVISORY  WRITE 3517 fd:00:2531448 0 EOF
3: POSIX  ADVISORY  WRITE 3452 fd:00:2531442 0 EOF
4: POSIX  ADVISORY  WRITE 3443 fd:00:2531440 0 EOF
5: POSIX  ADVISORY  WRITE 3326 fd:00:2531430 0 EOF
6: POSIX  ADVISORY  WRITE 3175 fd:00:2531425 0 EOF
7: POSIX  ADVISORY  WRITE 3056 fd:00:2548663 0 EOF
Each lock has its own line which starts with a unique number. The second column refers to the class of lock used, with FLOCK signifying the older-style UNIX file locks from a flock system call and POSIX representing the newer POSIX locks from the lockf system call.
The third column can have two values: ADVISORY or MANDATORY. ADVISORY means that the lock does not prevent other people from accessing the data; it only prevents other attempts to lock it. MANDATORY means that no other access to the data is permitted while the lock is held. The fourth column reveals whether the lock is allowing the holder READ or WRITE access to the file. The fifth column shows the ID of the process holding the lock. The sixth column shows the ID of the file being locked, in the format of MAJOR-DEVICE:MINOR-DEVICE:INODE-NUMBER . The seventh and eighth column shows the start and end of the file's locked region.

5.2.18.  /proc/mdstat

This file contains the current information for multiple-disk, RAID configurations. If the system does not contain such a configuration, then /proc/mdstat looks similar to the following:
Personalities :  read_ahead not set unused devices: <none>
This file remains in the same state as seen above unless a software RAID or md device is present. In that case, view /proc/mdstat to find the current status of mdX RAID devices.
The /proc/mdstat file below shows a system with its md0 configured as a RAID 1 device, while it is currently re-syncing the disks:
Personalities : [linear] [raid1] read_ahead 1024 sectors
md0: active raid1 sda2[1] sdb2[0] 9940 blocks [2/2] [UU] resync=1% finish=12.3min algorithm 2 [3/3] [UUU]
unused devices: <none>

5.2.19.  /proc/meminfo

This is one of the more commonly used files in the /proc/ directory, as it reports a large amount of valuable information about the systems RAM usage.
The following sample /proc/meminfo virtual file is from a system with 256 MB of RAM and 512 MB of swap space:
MemTotal:       255908 kB
MemFree:         69936 kB
Buffers:         15812 kB
Cached:         115124 kB
SwapCached:          0 kB
Active:          92700 kB
Inactive:        63792 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       255908 kB
LowFree:         69936 kB
SwapTotal:      524280 kB
SwapFree:       524280 kB
Dirty:               4 kB
Writeback:           0 kB
Mapped:          42236 kB
Slab:            25912 kB
Committed_AS:   118680 kB
PageTables:       1236 kB
VmallocTotal:  3874808 kB
VmallocUsed:      1416 kB
VmallocChunk:  3872908 kB
HugePages_Total:     0
HugePages_Free:      0
Hugepagesize:     4096 kB
Much of the information here is used by the free, top, and ps commands. In fact, the output of the free command is similar in appearance to the contents and structure of /proc/meminfo. But by looking directly at /proc/meminfo, more details are revealed:
  • MemTotal — Total amount of physical RAM, in kilobytes.
  • MemFree — The amount of physical RAM, in kilobytes, left unused by the system.
  • Buffers — The amount of physical RAM, in kilobytes, used for file buffers.
  • Cached — The amount of physical RAM, in kilobytes, used as cache memory.
  • SwapCached — The amount of swap, in kilobytes, used as cache memory.
  • Active — The total amount of buffer or page cache memory, in kilobytes, that is in active use. This is memory that has been recently used and is usually not reclaimed for other purposes.
  • Inactive — The total amount of buffer or page cache memory, in kilobytes, that are free and available. This is memory that has not been recently used and can be reclaimed for other purposes.
  • HighTotal and HighFree — The total and free amount of memory, in kilobytes, that is not directly mapped into kernel space. The HighTotal value can vary based on the type of kernel used.
  • LowTotal and LowFree — The total and free amount of memory, in kilobytes, that is directly mapped into kernel space. The LowTotal value can vary based on the type of kernel used.
  • SwapTotal — The total amount of swap available, in kilobytes.
  • SwapFree — The total amount of swap free, in kilobytes.
  • Dirty — The total amount of memory, in kilobytes, waiting to be written back to the disk.
  • Writeback — The total amount of memory, in kilobytes, actively being written back to the disk.
  • Mapped — The total amount of memory, in kilobytes, which have been used to map devices, files, or libraries using the mmap command.
  • Slab — The total amount of memory, in kilobytes, used by the kernel to cache data structures for its own use.
  • Committed_AS — The total amount of memory, in kilobytes, estimated to complete the workload. This value represents the worst case scenario value, and also includes swap memory.
  • PageTables — The total amount of memory, in kilobytes, dedicated to the lowest page table level.
  • VMallocTotal — The total amount of memory, in kilobytes, of total allocated virtual address space.
  • VMallocUsed — The total amount of memory, in kilobytes, of used virtual address space.
  • VMallocChunk — The largest contiguous block of memory, in kilobytes, of available virtual address space.
  • HugePages_Total — The total number of hugepages for the system. The number is derived by dividing Hugepagesize by the megabytes set aside for hugepages specified in /proc/sys/vm/hugetlb_pool. This statistic only appears on the x86, Itanium, and AMD64 architectures.
  • HugePages_Free — The total number of hugepages available for the system. This statistic only appears on the x86, Itanium, and AMD64 architectures.
  • Hugepagesize — The size for each hugepages unit in kilobytes. By default, the value is 4096 KB on uniprocessor kernels for 32 bit architectures. For SMP, hugemem kernels, and AMD64, the default is 2048 KB. For Itanium architectures, the default is 262144 KB. This statistic only appears on the x86, Itanium, and AMD64 architectures.

5.2.20.  /proc/misc

This file lists miscellaneous drivers registered on the miscellaneous major device, which is device number 10:
63 device-mapper 175 agpgart 135 rtc 134 apm_bios
The first column is the minor number of each device, while the second column shows the driver in use.

5.2.21.  /proc/modules

This file displays a list of all modules loaded into the kernel. Its contents vary based on the configuration and use of your system, but it should be organized in a similar manner to this sample /proc/modules file output:

Note

This example has been reformatted into a readable format. Most of this information can also be viewed via the /sbin/lsmod command.
nfs      170109  0 -          Live 0x129b0000
lockd    51593   1 nfs,       Live 0x128b0000
nls_utf8 1729    0 -          Live 0x12830000
vfat     12097   0 -          Live 0x12823000
fat      38881   1 vfat,      Live 0x1287b000
autofs4  20293   2 -          Live 0x1284f000
sunrpc   140453  3 nfs,lockd, Live 0x12954000
3c59x    33257   0 -          Live 0x12871000
uhci_hcd 28377   0 -          Live 0x12869000
md5      3777    1 -          Live 0x1282c000
ipv6     211845 16 -          Live 0x128de000
ext3     92585   2 -          Live 0x12886000
jbd      65625   1 ext3,      Live 0x12857000
dm_mod   46677   3 -          Live 0x12833000
The first column contains the name of the module.
The second column refers to the memory size of the module, in bytes.
The third column lists how many instances of the module are currently loaded. A value of zero represents an unloaded module.
The fourth column states if the module depends upon another module to be present in order to function, and lists those other modules.
The fifth column lists what load state the module is in: Live, Loading, or Unloading are the only possible values.
The sixth column lists the current kernel memory offset for the loaded module. This information can be useful for debugging purposes, or for profiling tools such as oprofile.

5.2.22.  /proc/mounts

This file provides a list of all mounts in use by the system:
rootfs / rootfs rw 0 0
/proc /proc proc rw,nodiratime 0 0 none
/dev ramfs rw 0 0
/dev/mapper/VolGroup00-LogVol00 / ext3 rw 0 0
none /dev ramfs rw 0 0
/proc /proc proc rw,nodiratime 0 0
/sys /sys sysfs rw 0 0
none /dev/pts devpts rw 0 0
usbdevfs /proc/bus/usb usbdevfs rw 0 0
/dev/hda1 /boot ext3 rw 0 0
none /dev/shm tmpfs rw 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0
The output found here is similar to the contents of /etc/mtab, except that /proc/mount is more up-to-date.
The first column specifies the device that is mounted, the second column reveals the mount point, and the third column tells the file system type, and the fourth column tells you if it is mounted read-only (ro) or read-write (rw). The fifth and sixth columns are dummy values designed to match the format used in /etc/mtab.

5.2.23.  /proc/mtrr

This file refers to the current Memory Type Range Registers (MTRRs) in use with the system. If the system architecture supports MTRRs, then the /proc/mtrr file may look similar to the following:
reg00: base=0x00000000 (   0MB), size= 256MB: write-back, count=1
reg01: base=0xe8000000 (3712MB), size=  32MB: write-combining, count=1
MTRRs are used with the Intel P6 family of processors (Pentium II and higher) and control processor access to memory ranges. When using a video card on a PCI or AGP bus, a properly configured /proc/mtrr file can increase performance more than 150%.
Most of the time, this value is properly configured by default. More information on manually configuring this file can be found locally at the following location:
/usr/share/doc/kernel-doc-<version>/Documentation/mtrr.txt

5.2.24.  /proc/partitions

This file contains partition block allocation information. A sampling of this file from a basic system looks similar to the following:
major minor  #blocks  name
  3     0   19531250 hda
  3     1     104391 hda1
  3     2   19422585 hda2
253     0   22708224 dm-0
253     1     524288 dm-1
Most of the information here is of little importance to the user, except for the following columns:
  • major — The major number of the device with this partition. The major number in the /proc/partitions, (3), corresponds with the block device ide0, in /proc/devices.
  • minor — The minor number of the device with this partition. This serves to separate the partitions into different physical devices and relates to the number at the end of the name of the partition.
  • #blocks — Lists the number of physical disk blocks contained in a particular partition.
  • name — The name of the partition.

5.2.25.  /proc/pci

This file contains a full listing of every PCI device on the system. Depending on the number of PCI devices, /proc/pci can be rather long. A sampling of this file from a basic system looks similar to the following:
Bus  0, device 0, function 0: Host bridge: Intel Corporation 440BX/ZX - 82443BX/ZX Host bridge (rev 3). Master Capable. Latency=64. Prefetchable 32 bit memory at 0xe4000000 [0xe7ffffff].
Bus  0, device 1, function 0: PCI bridge: Intel Corporation 440BX/ZX - 82443BX/ZX AGP bridge (rev 3).   Master Capable. Latency=64. Min Gnt=128.
Bus  0, device 4, function 0: ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 2).
Bus  0, device 4, function 1: IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 1). Master Capable. Latency=32. I/O at 0xd800 [0xd80f].
Bus  0, device 4, function 2: USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 1). IRQ 5. Master Capable. Latency=32. I/O at 0xd400 [0xd41f].
Bus  0, device 4, function 3: Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 2). IRQ 9.
Bus  0, device 9, function 0: Ethernet controller: Lite-On Communications Inc LNE100TX (rev 33). IRQ 5. Master Capable. Latency=32. I/O at 0xd000 [0xd0ff].
Bus  0, device 12, function  0: VGA compatible controller: S3 Inc. ViRGE/DX or /GX (rev 1). IRQ 11. Master Capable. Latency=32. Min Gnt=4.Max Lat=255.
This output shows a list of all PCI devices, sorted in the order of bus, device, and function. Beyond providing the name and version of the device, this list also gives detailed IRQ information so an administrator can quickly look for conflicts.

Note

To get a more readable version of this information, type:
lspci -vb

5.2.26.  /proc/slabinfo

This file gives full information about memory usage on the slab level. Linux kernels greater than version 2.2 use slab pools to manage memory above the page level. Commonly used objects have their own slab pools.
Instead of parsing the highly verbose /proc/slabinfo file manually, the /usr/bin/slabtop program displays kernel slab cache information in real time. This program allows for custom configurations, including column sorting and screen refreshing.
A sample screen shot of /usr/bin/slabtop usually looks like the following example:
Active / Total Objects (% used)    : 133629 / 147300 (90.7%)
Active / Total Slabs (% used)      : 11492 / 11493 (100.0%)
Active / Total Caches (% used)     : 77 / 121 (63.6%)
Active / Total Size (% used)       : 41739.83K / 44081.89K (94.7%)
Minimum / Average / Maximum Object : 0.01K / 0.30K / 128.00K
OBJS   ACTIVE USE      OBJ   SIZE     SLABS OBJ/SLAB CACHE SIZE NAME
44814  43159  96%    0.62K   7469      6     29876K ext3_inode_cache
36900  34614  93%    0.05K    492     75      1968K buffer_head
35213  33124  94%    0.16K   1531     23      6124K dentry_cache
7364   6463  87%    0.27K    526      14      2104K radix_tree_node
2585   1781  68%    0.08K     55      47       220K vm_area_struct
2263   2116  93%    0.12K     73      31       292K size-128
1904   1125  59%    0.03K     16      119        64K size-32
1666    768  46%    0.03K     14      119        56K anon_vma
1512   1482  98%    0.44K    168       9       672K inode_cache
1464   1040  71%    0.06K     24      61        96K size-64
1320    820  62%    0.19K     66      20       264K filp
678    587  86%    0.02K      3      226        12K dm_io
678    587  86%    0.02K      3      226        12K dm_tio
576    574  99%    0.47K     72        8       288K proc_inode_cache
528    514  97%    0.50K     66        8       264K size-512
492    372  75%    0.09K     12       41        48K bio
465    314  67%    0.25K     31       15       124K size-256
452    331  73%    0.02K      2      226         8K biovec-1
420    420 100%    0.19K     21       20        84K skbuff_head_cache
305    256  83%    0.06K      5       61        20K biovec-4
290      4   1%    0.01K      1      290         4K revoke_table
264    264 100%    4.00K    264        1      1056K size-4096
260    256  98%    0.19K     13       20        52K biovec-16
260    256  98%    0.75K     52        5       208K biovec-64
Some of the more commonly used statistics in /proc/slabinfo that are included into /usr/bin/slabtop include:
  • OBJS — The total number of objects (memory blocks), including those in use (allocated), and some spares not in use.
  • ACTIVE — The number of objects (memory blocks) that are in use (allocated).
  • USE — Percentage of total objects that are active. ((ACTIVE/OBJS)(100))
  • OBJ SIZE — The size of the objects.
  • SLABS — The total number of slabs.
  • OBJ/SLAB — The number of objects that fit into a slab.
  • CACHE SIZE — The cache size of the slab.
  • NAME — The name of the slab.
For more information on the /usr/bin/slabtop program, refer to the slabtop man page.

5.2.27.  /proc/stat

This file keeps track of a variety of different statistics about the system since it was last restarted. The contents of /proc/stat, which can be quite long, usually begins like the following example:
cpu  259246 7001 60190 34250993 137517 772 0
cpu0 259246 7001 60190 34250993 137517 772 0
intr 354133732 347209999 2272 0 4 4 0 0 3 1 1249247 0 0 80143 0 422626 5169433
ctxt 12547729
btime 1093631447
processes 130523
procs_running 1
procs_blocked 0
preempt 5651840
cpu  209841 1554 21720 118519346 72939 154 27168
cpu0 42536 798 4841 14790880 14778 124 3117
cpu1 24184 569 3875 14794524 30209 29 3130
cpu2 28616 11 2182 14818198 4020 1 3493
cpu3 35350 6 2942 14811519 3045 0 3659
cpu4 18209 135 2263 14820076 12465 0 3373
cpu5 20795 35 1866 14825701 4508 0 3615
cpu6 21607 0 2201 14827053 2325 0 3334
cpu7 18544 0 1550 14831395 1589 0 3447
intr 15239682 14857833 6 0 6 6 0 5 0 1 0 0 0 29 0 2 0 0 0 0 0 0 0 94982 0 286812
ctxt 4209609
btime 1078711415
processes 21905
procs_running 1
procs_blocked 0
Some of the more commonly used statistics include:
  • cpu — Measures the number of jiffies (1/100 of a second for x86 systems) that the system has been in user mode, user mode with low priority (nice), system mode, idle task, I/O wait, IRQ (hardirq), and softirq respectively. The IRQ (hardirq) is the direct response to a hardware event. The IRQ takes minimal work for queuing the "heavy" work up for the softirq to execute. The softirq runs at a lower priority than the IRQ and therefore may be interrupted more frequently. The total for all CPUs is given at the top, while each individual CPU is listed below with its own statistics. The following example is a 4-way Intel Pentium Xeon configuration with multi-threading enabled, therefore showing four physical processors and four virtual processors totaling eight processors.
  • page — The number of memory pages the system has written in and out to disk.
  • swap — The number of swap pages the system has brought in and out.
  • intr — The number of interrupts the system has experienced.
  • btime — The boot time, measured in the number of seconds since January 1, 1970, otherwise known as the epoch.

5.2.28.  /proc/swaps

This file measures swap space and its utilization. For a system with only one swap partition, the output of /proc/swaps may look similar to the following:
Filename                          Type        Size     Used    Priority
/dev/mapper/VolGroup00-LogVol01   partition   524280   0       -1
While some of this information can be found in other files in the /proc/ directory, /proc/swaps provides a snapshot of every swap file name, the type of swap space, the total size, and the amount of space in use (in kilobytes). The priority column is useful when multiple swap files are in use. The lower the priority, the more likely the swap file is to be used.

5.2.29.  /proc/sysrq-trigger

Using the echo command to write to this file, a remote root user can execute most System Request Key commands remotely as if at the local terminal. To echo values to this file, the /proc/sys/kernel/sysrq must be set to a value other than 0. For more information about the System Request Key, refer to Section 5.3.9.3, “ /proc/sys/kernel/.
Although it is possible to write to this file, it cannot be read, even by the root user.

5.2.30.  /proc/uptime

This file contains information detailing how long the system has been on since its last restart. The output of /proc/uptime is quite minimal:
350735.47 234388.90
The first number is the total number of seconds the system has been up. The second number is how much of that time the machine has spent idle, in seconds.

5.2.31.  /proc/version

This file specifies the version of the Linux kernel and gcc in use, as well as the version of Red Hat Enterprise Linux installed on the system:
Linux version 2.6.8-1.523 (user@foo.redhat.com) (gcc version 3.4.1 20040714 \  (Red Hat Enterprise Linux 3.4.1-7)) #1 Mon Aug 16 13:27:03 EDT 2004
This information is used for a variety of purposes, including the version data presented when a user logs in.

5.3. Directories within /proc/

Common groups of information concerning the kernel are grouped into directories and subdirectories within the /proc/ directory.

5.3.1. Process Directories

Every /proc/ directory contains a number of directories with numerical names. A listing of them may be similar to the following:
dr-xr-xr-x    3 root     root            0 Feb 13 01:28 1
dr-xr-xr-x    3 root     root            0 Feb 13 01:28 1010
dr-xr-xr-x    3 xfs      xfs             0 Feb 13 01:28 1087
dr-xr-xr-x    3 daemon   daemon          0 Feb 13 01:28 1123
dr-xr-xr-x    3 root     root            0 Feb 13 01:28 11307
dr-xr-xr-x    3 apache   apache          0 Feb 13 01:28 13660
dr-xr-xr-x    3 rpc      rpc             0 Feb 13 01:28 637
dr-xr-xr-x    3 rpcuser  rpcuser         0 Feb 13 01:28 666
These directories are called process directories, as they are named after a program's process ID and contain information specific to that process. The owner and group of each process directory is set to the user running the process. When the process is terminated, its /proc/ process directory vanishes.
Each process directory contains the following files:
  • cmdline — Contains the command issued when starting the process.
  • cwd — A symbolic link to the current working directory for the process.
  • environ — A list of the environment variables for the process. The environment variable is given in all upper-case characters, and the value is in lower-case characters.
  • exe — A symbolic link to the executable of this process.
  • fd — A directory containing all of the file descriptors for a particular process. These are given in numbered links:
    total 0
    lrwx------    1 root     root           64 May  8 11:31 0 -> /dev/null
    lrwx------    1 root     root           64 May  8 11:31 1 -> /dev/null
    lrwx------    1 root     root           64 May  8 11:31 2 -> /dev/null
    lrwx------    1 root     root           64 May  8 11:31 3 -> /dev/ptmx
    lrwx------    1 root     root           64 May  8 11:31 4 -> socket:[7774817]
    lrwx------    1 root     root           64 May  8 11:31 5 -> /dev/ptmx
    lrwx------    1 root     root           64 May  8 11:31 6 -> socket:[7774829]
    lrwx------    1 root     root           64 May  8 11:31 7 -> /dev/ptmx
  • maps — A list of memory maps to the various executables and library files associated with this process. This file can be rather long, depending upon the complexity of the process, but sample output from the sshd process begins like the following:
    08048000-08086000 r-xp 00000000 03:03 391479     /usr/sbin/sshd
    08086000-08088000 rw-p 0003e000 03:03 391479	/usr/sbin/sshd
    08088000-08095000 rwxp 00000000 00:00 0
    40000000-40013000 r-xp 0000000 03:03 293205	/lib/ld-2.2.5.so
    40013000-40014000 rw-p 00013000 03:03 293205	/lib/ld-2.2.5.so
    40031000-40038000 r-xp 00000000 03:03 293282	/lib/libpam.so.0.75
    40038000-40039000 rw-p 00006000 03:03 293282	/lib/libpam.so.0.75
    40039000-4003a000 rw-p 00000000 00:00 0
    4003a000-4003c000 r-xp 00000000 03:03 293218	/lib/libdl-2.2.5.so
    4003c000-4003d000 rw-p 00001000 03:03 293218	/lib/libdl-2.2.5.so
  • mem — The memory held by the process. This file cannot be read by the user.
  • root — A link to the root directory of the process.
  • stat — The status of the process.
  • statm — The status of the memory in use by the process. Below is a sample /proc/statm file:
    263 210 210 5 0 205 0
    The seven columns relate to different memory statistics for the process. From left to right, they report the following aspects of the memory used:
    1. Total program size, in kilobytes.
    2. Size of memory portions, in kilobytes.
    3. Number of pages that are shared.
    4. Number of pages that are code.
    5. Number of pages of data/stack.
    6. Number of library pages.
    7. Number of dirty pages.
  • status — The status of the process in a more readable form than stat or statm. Sample output for sshd looks similar to the following:
    Name:	sshd
    State:	S (sleeping)
    Tgid:	797
    Pid:	797
    PPid:	1
    TracerPid:	0
    Uid:	0	0	0	0
    Gid:	0	0	0	0
    FDSize:	32
    Groups:
    VmSize:	    3072 kB
    VmLck:	       0 kB
    VmRSS:	     840 kB
    VmData:	     104 kB
    VmStk:	      12 kB
    VmExe:	     300 kB
    VmLib:	    2528 kB
    SigPnd:	0000000000000000
    SigBlk:	0000000000000000
    SigIgn:	8000000000001000
    SigCgt:	0000000000014005
    CapInh:	0000000000000000
    CapPrm:	00000000fffffeff
    CapEff:	00000000fffffeff
    The information in this output includes the process name and ID, the state (such as S (sleeping) or R (running)), user/group ID running the process, and detailed data regarding memory usage.

5.3.1.1.  /proc/self/

The /proc/self/ directory is a link to the currently running process. This allows a process to look at itself without having to know its process ID.
Within a shell environment, a listing of the /proc/self/ directory produces the same contents as listing the process directory for that process.

5.3.2.  /proc/bus/

This directory contains information specific to the various buses available on the system. For example, on a standard system containing PCI and USB buses, current data on each of these buses is available within a subdirectory within /proc/bus/ by the same name, such as /proc/bus/pci/.
The subdirectories and files available within /proc/bus/ vary depending on the devices connected to the system. However, each bus type has at least one directory. Within these bus directories are normally at least one subdirectory with a numerical name, such as 001, which contain binary files.
For example, the /proc/bus/usb/ subdirectory contains files that track the various devices on any USB buses, as well as the drivers required for them. The following is a sample listing of a /proc/bus/usb/ directory:
total 0 dr-xr-xr-x    1 root     root            0 May  3 16:25 001
-r--r--r--    1 root     root            0 May  3 16:25 devices
-r--r--r--    1 root     root            0 May  3 16:25 drivers
The /proc/bus/usb/001/ directory contains all devices on the first USB bus and the devices file identifies the USB root hub on the motherboard.
The following is a example of a /proc/bus/usb/devices file:
T:  Bus=01 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=12  MxCh= 2
B:  Alloc=  0/900 us ( 0%), #Int=  0, #Iso=  0
D:  Ver= 1.00 Cls=09(hub  ) Sub=00 Prot=00 MxPS= 8 #Cfgs=  1
P:  Vendor=0000 ProdID=0000 Rev= 0.00
S:  Product=USB UHCI Root Hub
S:  SerialNumber=d400
C:* #Ifs= 1 Cfg#= 1 Atr=40 MxPwr=  0mA
I:  If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
E:  Ad=81(I) Atr=03(Int.) MxPS=   8 Ivl=255ms

5.3.3.  /proc/driver/

This directory contains information for specific drivers in use by the kernel.
A common file found here is rtc which provides output from the driver for the system's Real Time Clock (RTC), the device that keeps the time while the system is switched off. Sample output from /proc/driver/rtc looks like the following:
rtc_time        : 16:21:00
rtc_date        : 2004-08-31
rtc_epoch       : 1900
alarm           : 21:16:27
DST_enable      : no
BCD             : yes
24hr            : yes
square_wave     : no
alarm_IRQ       : no
update_IRQ      : no
periodic_IRQ    : no
periodic_freq   : 1024
batt_status     : okay
For more information about the RTC, refer to the following installed documentation:
/usr/share/doc/kernel-doc-<version>/Documentation/rtc.txt.

5.3.4.  /proc/fs

This directory shows which file systems are exported. If running an NFS server, typing cat /proc/fs/nfsd/exports displays the file systems being shared and the permissions granted for those file systems. For more on file system sharing with NFS, refer to Chapter 21, Network File System (NFS).

5.3.5.  /proc/ide/

This directory contains information about IDE devices on the system. Each IDE channel is represented as a separate directory, such as /proc/ide/ide0 and /proc/ide/ide1. In addition, a drivers file is available, providing the version number of the various drivers used on the IDE channels:
ide-floppy version 0.99.
newide ide-cdrom version 4.61
ide-disk version 1.18
Many chipsets also provide a file in this directory with additional data concerning the drives connected through the channels. For example, a generic Intel PIIX4 Ultra 33 chipset produces the /proc/ide/piix file which reveals whether DMA or UDMA is enabled for the devices on the IDE channels:
Intel PIIX4 Ultra 33 Chipset.
------------- Primary Channel ---------------- Secondary Channel -------------
		enabled                          enabled

------------- drive0 --------- drive1 -------- drive0 ---------- drive1 ------
DMA enabled:    yes              no              yes               no
UDMA enabled:   yes              no              no                no
UDMA enabled:   2                X               X                 X
UDMA DMA PIO
Navigating into the directory for an IDE channel, such as ide0, provides additional information. The channel file provides the channel number, while the model identifies the bus type for the channel (such as pci).

5.3.5.1. Device Directories

Within each IDE channel directory is a device directory. The name of the device directory corresponds to the drive letter in the /dev/ directory. For instance, the first IDE drive on ide0 would be hda.

Note

There is a symbolic link to each of these device directories in the /proc/ide/ directory.
Each device directory contains a collection of information and statistics. The contents of these directories vary according to the type of device connected. Some of the more useful files common to many devices include:
  • cache — The device cache.
  • capacity — The capacity of the device, in 512 byte blocks.
  • driver — The driver and version used to control the device.
  • geometry — The physical and logical geometry of the device.
  • media — The type of device, such as a disk.
  • model — The model name or number of the device.
  • settings — A collection of current device parameters. This file usually contains quite a bit of useful, technical information. A sample settings file for a standard IDE hard disk looks similar to the following:
    name                value          min          max          mode
    ----                -----          ---          ---          ----
    acoustic            0              0            254          rw
    address             0              0            2            rw
    bios_cyl            38752          0            65535        rw
    bios_head           16             0            255          rw
    bios_sect           63             0            63           rw
    bswap               0              0            1            r
    current_speed       68             0            70           rw
    failures            0              0            65535        rw
    init_speed          68             0            70           rw
    io_32bit            0              0            3            rw
    keepsettings        0              0            1            rw
    lun                 0              0            7            rw
    max_failures        1              0            65535        rw
    multcount           16             0            16           rw
    nice1               1              0            1            rw
    nowerr              0              0            1            rw
    number              0              0            3            rw
    pio_mode            write-only     0            255          w
    unmaskirq           0              0            1            rw
    using_dma           1              0            1            rw
    wcache              1              0            1            rw

5.3.6.  /proc/irq/

This directory is used to set IRQ to CPU affinity, which allows the system to connect a particular IRQ to only one CPU. Alternatively, it can exclude a CPU from handling any IRQs.
Each IRQ has its own directory, allowing for the individual configuration of each IRQ. The /proc/irq/prof_cpu_mask file is a bitmask that contains the default values for the smp_affinity file in the IRQ directory. The values in smp_affinity specify which CPUs handle that particular IRQ.
For more information about the /proc/irq/ directory, refer to the following installed documentation:
/usr/share/doc/kernel-doc-<version>/Documentation/filesystems/proc.txt

5.3.7.  /proc/net/

This directory provides a comprehensive look at various networking parameters and statistics. Each directory and virtual file within this directory describes aspects of the system's network configuration. Below is a partial list of the /proc/net/ directory:
  • arp — Lists the kernel's ARP table. This file is particularly useful for connecting a hardware address to an IP address on a system.
  • atm/ directory — The files within this directory contain Asynchronous Transfer Mode (ATM) settings and statistics. This directory is primarily used with ATM networking and ADSL cards.
  • dev — Lists the various network devices configured on the system, complete with transmit and receive statistics. This file displays the number of bytes each interface has sent and received, the number of packets inbound and outbound, the number of errors seen, the number of packets dropped, and more.
  • dev_mcast — Lists Layer2 multicast groups on which each device is listening.
  • igmp — Lists the IP multicast addresses which this system joined.
  • ip_conntrack — Lists tracked network connections for machines that are forwarding IP connections.
  • ip_tables_names — Lists the types of iptables in use. This file is only present if iptables is active on the system and contains one or more of the following values: filter, mangle, or nat.
  • ip_mr_cache — Lists the multicast routing cache.
  • ip_mr_vif — Lists multicast virtual interfaces.
  • netstat — Contains a broad yet detailed collection of networking statistics, including TCP timeouts, SYN cookies sent and received, and much more.
  • psched — Lists global packet scheduler parameters.
  • raw — Lists raw device statistics.
  • route — Lists the kernel's routing table.
  • rt_cache — Contains the current routing cache.
  • snmp — List of Simple Network Management Protocol (SNMP) data for various networking protocols in use.
  • sockstat — Provides socket statistics.
  • tcp — Contains detailed TCP socket information.
  • tr_rif — Lists the token ring RIF routing table.
  • udp — Contains detailed UDP socket information.
  • unix — Lists UNIX domain sockets currently in use.
  • wireless — Lists wireless interface data.

5.3.8.  /proc/scsi/

This directory is analogous to the /proc/ide/ directory, but it is for connected SCSI devices.
The primary file in this directory is /proc/scsi/scsi, which contains a list of every recognized SCSI device. From this listing, the type of device, as well as the model name, vendor, SCSI channel and ID data is available.
For example, if a system contains a SCSI CD-ROM, a tape drive, a hard drive, and a RAID controller, this file looks similar to the following:
Attached devices:
Host: scsi1
Channel: 00
Id: 05
Lun: 00
Vendor: NEC
Model: CD-ROM DRIVE:466
Rev: 1.06
Type:   CD-ROM
ANSI SCSI revision: 02
Host: scsi1
Channel: 00
Id: 06
Lun: 00
Vendor: ARCHIVE
Model: Python 04106-XXX
Rev: 7350
Type:   Sequential-Access
ANSI SCSI revision: 02
Host: scsi2
Channel: 00
Id: 06
Lun: 00
Vendor: DELL
Model: 1x6 U2W SCSI BP
Rev: 5.35
Type:   Processor
ANSI SCSI revision: 02
Host: scsi2
Channel: 02
Id: 00
Lun: 00
Vendor: MegaRAID
Model: LD0 RAID5 34556R
Rev: 1.01
Type:   Direct-Access
ANSI SCSI revision: 02
Each SCSI driver used by the system has its own directory within /proc/scsi/, which contains files specific to each SCSI controller using that driver. From the previous example, aic7xxx/ and megaraid/ directories are present, since two drivers are in use. The files in each of the directories typically contain an I/O address range, IRQ information, and statistics for the SCSI controller using that driver. Each controller can report a different type and amount of information. The Adaptec AIC-7880 Ultra SCSI host adapter's file in this example system produces the following output:
Adaptec AIC7xxx driver version: 5.1.20/3.2.4
Compile Options:
TCQ Enabled By Default : Disabled
AIC7XXX_PROC_STATS     : Enabled
AIC7XXX_RESET_DELAY    : 5
Adapter Configuration:
SCSI Adapter: Adaptec AIC-7880 Ultra SCSI host adapter
Ultra Narrow Controller     PCI MMAPed
I/O Base: 0xfcffe000
Adapter SEEPROM Config: SEEPROM found and used.
Adaptec SCSI BIOS: Enabled
IRQ: 30
SCBs: Active 0, Max Active 1, Allocated 15, HW 16, Page 255
Interrupts: 33726
BIOS Control Word: 0x18a6
Adapter Control Word: 0x1c5f
Extended Translation: Enabled
Disconnect Enable Flags: 0x00ff
Ultra Enable Flags: 0x0020
Tag Queue Enable Flags: 0x0000
Ordered Queue Tag Flags: 0x0000
Default Tag Queue Depth: 8
Tagged Queue By Device array for aic7xxx
host instance 1:       {255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255}
Actual queue depth per device for aic7xxx host instance 1:       {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1}
Statistics:

(scsi1:0:5:0) Device using Narrow/Sync transfers at 20.0 MByte/sec, offset 15
Transinfo settings: current(12/15/0/0), goal(12/15/0/0), user(12/15/0/0)
Total transfers 0 (0 reads and 0 writes)
		< 2K      2K+     4K+     8K+    16K+    32K+    64K+   128K+
Reads:        0       0       0       0       0       0       0       0
Writes:       0       0       0       0       0       0       0       0

(scsi1:0:6:0) Device using Narrow/Sync transfers at 10.0 MByte/sec, offset 15
Transinfo settings: current(25/15/0/0), goal(12/15/0/0), user(12/15/0/0)
Total transfers 132 (0 reads and 132 writes)
		< 2K      2K+     4K+     8K+    16K+    32K+    64K+   128K+
Reads:        0       0       0       0       0       0       0       0
Writes:       0       0       0       1     131       0       0       0
This output reveals the transfer speed to the SCSI devices connected to the controller based on channel ID, as well as detailed statistics concerning the amount and sizes of files read or written by that device. For example, this controller is communicating with the CD-ROM at 20 megabytes per second, while the tape drive is only communicating at 10 megabytes per second.

5.3.9.  /proc/sys/

The /proc/sys/ directory is different from others in /proc/ because it not only provides information about the system but also allows the system administrator to immediately enable and disable kernel features.

Warning

Use caution when changing settings on a production system using the various files in the /proc/sys/ directory. Changing the wrong setting may render the kernel unstable, requiring a system reboot.
For this reason, be sure the options are valid for that file before attempting to change any value in /proc/sys/.
A good way to determine if a particular file can be configured, or if it is only designed to provide information, is to list it with the -l option at the shell prompt. If the file is writable, it may be used to configure the kernel. For example, a partial listing of /proc/sys/fs looks like the following:
-r--r--r--    1 root     root            0 May 10 16:14 dentry-state
-rw-r--r--    1 root     root            0 May 10 16:14 dir-notify-enable
-r--r--r--    1 root     root            0 May 10 16:14 dquot-nr
-rw-r--r--    1 root     root            0 May 10 16:14 file-max
-r--r--r--    1 root     root            0 May 10 16:14 file-nr
In this listing, the files dir-notify-enable and file-max can be written to and, therefore, can be used to configure the kernel. The other files only provide feedback on current settings.
Changing a value within a /proc/sys/ file is done by echoing the new value into the file. For example, to enable the System Request Key on a running kernel, type the command:
echo 1 > /proc/sys/kernel/sysrq
This changes the value for sysrq from 0 (off) to 1 (on).
A few /proc/sys/ configuration files contain more than one value. To correctly send new values to them, place a space character between each value passed with the echo command, such as is done in this example:
echo 4 2 45 > /proc/sys/kernel/acct

Note

Any configuration changes made using the echo command disappear when the system is restarted. To make configuration changes take effect after the system is rebooted, refer to Section 5.4, “Using the sysctl Command”.
The /proc/sys/ directory contains several subdirectories controlling different aspects of a running kernel.

5.3.9.1.  /proc/sys/dev/

This directory provides parameters for particular devices on the system. Most systems have at least two directories, cdrom/ and raid/. Customized kernels can have other directories, such as parport/, which provides the ability to share one parallel port between multiple device drivers.
The cdrom/ directory contains a file called info, which reveals a number of important CD-ROM parameters:
CD-ROM information, Id: cdrom.c 3.20 2003/12/17
drive name:             hdc
drive speed:            48
drive # of slots:       1
Can close tray:         1
Can open tray:          1
Can lock tray:          1
Can change speed:       1
Can select disk:        0
Can read multisession:  1
Can read MCN:           1
Reports media changed:  1
Can play audio:         1
Can write CD-R:         0
Can write CD-RW:        0
Can read DVD:           0
Can write DVD-R:        0
Can write DVD-RAM:      0
Can read MRW:           0
Can write MRW:          0
Can write RAM:          0
This file can be quickly scanned to discover the qualities of an unknown CD-ROM. If multiple CD-ROMs are available on a system, each device is given its own column of information.
Various files in /proc/sys/dev/cdrom, such as autoclose and checkmedia, can be used to control the system's CD-ROM. Use the echo command to enable or disable these features.
If RAID support is compiled into the kernel, a /proc/sys/dev/raid/ directory becomes available with at least two files in it: speed_limit_min and speed_limit_max. These settings determine the acceleration of RAID devices for I/O intensive tasks, such as resyncing the disks.

5.3.9.2.  /proc/sys/fs/

This directory contains an array of options and information concerning various aspects of the file system, including quota, file handle, inode, and dentry information.
The binfmt_misc/ directory is used to provide kernel support for miscellaneous binary formats.
The important files in /proc/sys/fs/ include:
  • dentry-state — Provides the status of the directory cache. The file looks similar to the following:
    57411	52939	45	0	0	0
    The first number reveals the total number of directory cache entries, while the second number displays the number of unused entries. The third number tells the number of seconds between when a directory has been freed and when it can be reclaimed, and the fourth measures the pages currently requested by the system. The last two numbers are not used and display only zeros.
  • dquot-nr — Lists the maximum number of cached disk quota entries.
  • file-max — Lists the maximum number of file handles that the kernel allocates. Raising the value in this file can resolve errors caused by a lack of available file handles.
  • file-nr — Lists the number of allocated file handles, used file handles, and the maximum number of file handles.
  • overflowgid and overflowuid — Defines the fixed group ID and user ID, respectively, for use with file systems that only support 16-bit group and user IDs.
  • super-max — Controls the maximum number of superblocks available.
  • super-nr — Displays the current number of superblocks in use.

5.3.9.3.  /proc/sys/kernel/

This directory contains a variety of different configuration files that directly affect the operation of the kernel. Some of the most important files include:
  • acct — Controls the suspension of process accounting based on the percentage of free space available on the file system containing the log. By default, the file looks like the following:
    4	2	30
    The first value dictates the percentage of free space required for logging to resume, while the second value sets the threshold percentage of free space when logging is suspended. The third value sets the interval, in seconds, that the kernel polls the file system to see if logging should be suspended or resumed.
  • cap-bound — Controls the capability bounding settings, which provides a list of capabilities for any process on the system. If a capability is not listed here, then no process, no matter how privileged, can do it. The idea is to make the system more secure by ensuring that certain things cannot happen, at least beyond a certain point in the boot process.
    For a valid list of values for this virtual file, refer to the following installed documentation:
    /lib/modules/<kernel-version>/build/include/linux/capability.h.
  • ctrl-alt-del — Controls whether Ctrl+Alt+Delete gracefully restarts the computer using init (0) or forces an immediate reboot without syncing the dirty buffers to disk (1).
  • domainname — Configures the system domain name, such as example.com.
  • exec-shield — Configures the Exec Shield feature of the kernel. Exec Shield provides protection against certain types of buffer overflow attacks.
    There are two possible values for this virtual file:
    • 0 — Disables Exec Shield.
    • 1 — Enables Exec Shield. This is the default value.

    Important

    If a system is running security-sensitive applications that were started while Exec Shield was disabled, these applications must be restarted when Exec Shield is enabled in order for Exec Shield to take effect.
  • exec-shield-randomize — Enables location randomization of various items in memory. This helps deter potential attackers from locating programs and daemons in memory. Each time a program or daemon starts, it is put into a different memory location each time, never in a static or absolute memory address.
    There are two possible values for this virtual file:
    • 0 — Disables randomization of Exec Shield. This may be useful for application debugging purposes.
    • 1 — Enables randomization of Exec Shield. This is the default value. Note: The exec-shield file must also be set to 1 for exec-shield-randomize to be effective.
  • hostname — Configures the system hostname, such as www.example.com.
  • hotplug — Configures the utility to be used when a configuration change is detected by the system. This is primarily used with USB and Cardbus PCI. The default value of /sbin/hotplug should not be changed unless testing a new program to fulfill this role.
  • modprobe — Sets the location of the program used to load kernel modules. The default value is /sbin/modprobe which means kmod calls it to load the module when a kernel thread calls kmod.
  • msgmax — Sets the maximum size of any message sent from one process to another and is set to 8192 bytes by default. Be careful when raising this value, as queued messages between processes are stored in non-swappable kernel memory. Any increase in msgmax would increase RAM requirements for the system.
  • msgmnb — Sets the maximum number of bytes in a single message queue. The default is 16384.
  • msgmni — Sets the maximum number of message queue identifiers. The default is 16.
  • osrelease — Lists the Linux kernel release number. This file can only be altered by changing the kernel source and recompiling.
  • ostype — Displays the type of operating system. By default, this file is set to Linux, and this value can only be changed by changing the kernel source and recompiling.
  • overflowgid and overflowuid — Defines the fixed group ID and user ID, respectively, for use with system calls on architectures that only support 16-bit group and user IDs.
  • panic — Defines the number of seconds the kernel postpones rebooting when the system experiences a kernel panic. By default, the value is set to 0, which disables automatic rebooting after a panic.
  • printk — This file controls a variety of settings related to printing or logging error messages. Each error message reported by the kernel has a loglevel associated with it that defines the importance of the message. The loglevel values break down in this order:
    • 0 — Kernel emergency. The system is unusable.
    • 1 — Kernel alert. Action must be taken immediately.
    • 2 — Condition of the kernel is considered critical.
    • 3 — General kernel error condition.
    • 4 — General kernel warning condition.
    • 5 — Kernel notice of a normal but significant condition.
    • 6 — Kernel informational message.
    • 7 — Kernel debug-level messages.
    Four values are found in the printk file:
    6     4     1     7
    Each of these values defines a different rule for dealing with error messages. The first value, called the console loglevel, defines the lowest priority of messages printed to the console. (Note that, the lower the priority, the higher the loglevel number.) The second value sets the default loglevel for messages without an explicit loglevel attached to them. The third value sets the lowest possible loglevel configuration for the console loglevel. The last value sets the default value for the console loglevel.
  • random/ directory — Lists a number of values related to generating random numbers for the kernel.
  • rtsig-max — Configures the maximum number of POSIX real-time signals that the system may have queued at any one time. The default value is 1024.
  • rtsig-nr — Lists the current number of POSIX real-time signals queued by the kernel.
  • sem — Configures semaphore settings within the kernel. A semaphore is a System V IPC object that is used to control utilization of a particular process.
  • shmall— Sets the total amount of shared memory pages that can be used at one time, system-wide. By default, this value is 2097152.
  • shmmax — Sets the largest shared memory segment size allowed by the kernel. By default, this value is 33554432. However, the kernel supports much larger values than this.
  • shmmni — Sets the maximum number of shared memory segments for the whole system. By default, this value is 4096.
  • sysrq — Activates the System Request Key, if this value is set to anything other than zero (0), the default.
    The System Request Key allows immediate input to the kernel through simple key combinations. For example, the System Request Key can be used to immediately shut down or restart a system, sync all mounted file systems, or dump important information to the console. To initiate a System Request Key, type Alt+SysRq+ <system request code> . Replace <system request code> with one of the following system request codes:
    • r — Disables raw mode for the keyboard and sets it to XLATE (a limited keyboard mode which does not recognize modifiers such as Alt, Ctrl, or Shift for all keys).
    • k — Kills all processes active in a virtual console. Also called Secure Access Key (SAK), it is often used to verify that the login prompt is spawned from init and not a Trojan copy designed to capture usernames and passwords.
    • b — Reboots the kernel without first unmounting file systems or syncing disks attached to the system.
    • c — Crashes the system without first unmounting file systems or syncing disks attached to the system.
    • o — Shuts off the system.
    • s — Attempts to sync disks attached to the system.
    • u — Attempts to unmount and remount all file systems as read-only.
    • p — Outputs all flags and registers to the console.
    • t — Outputs a list of processes to the console.
    • m — Outputs memory statistics to the console.
    • 0 through 9 — Sets the log level for the console.
    • e — Kills all processes except init using SIGTERM.
    • i — Kills all processes except init using SIGKILL.
    • l — Kills all processes using SIGKILL (including init). The system is unusable after issuing this System Request Key code.
    • h — Displays help text.
    This feature is most beneficial when using a development kernel or when experiencing system freezes.

    Warning

    The System Request Key feature is considered a security risk because an unattended console provides an attacker with access to the system. For this reason, it is turned off by default.
    Refer to /usr/share/doc/kernel-doc-<version>/Documentation/sysrq.txt for more information about the System Request Key.
  • sysrq-key — Defines the key code for the System Request Key (84 is the default).
  • sysrq-sticky — Defines whether the System Request Key is a chorded key combination. The accepted values are as follows:
    • 0Alt+SysRq and the system request code must be pressed simultaneously. This is the default value.
    • 1Alt+SysRq must be pressed simultaneously, but the system request code can be pressed anytime before the number of seconds specified in /proc/sys/kernel/sysrq-timer elapses.
  • sysrq-timer — Specifies the number of seconds allowed to pass before the system request code must be pressed. The default value is 10.
  • tainted — Indicates whether a non-GPL module is loaded.
    • 0 — No non-GPL modules are loaded.
    • 1 — At least one module without a GPL license (including modules with no license) is loaded.
    • 2 — At least one module was force-loaded with the command insmod -f.
  • threads-max — Sets the maximum number of threads to be used by the kernel, with a default value of 2048.
  • version — Displays the date and time the kernel was last compiled. The first field in this file, such as #3, relates to the number of times a kernel was built from the source base.

5.3.9.4.  /proc/sys/net/

This directory contains subdirectories concerning various networking topics. Various configurations at the time of kernel compilation make different directories available here, such as ethernet/, ipv4/, ipx/, and ipv6/. By altering the files within these directories, system administrators are able to adjust the network configuration on a running system.
Given the wide variety of possible networking options available with Linux, only the most common /proc/sys/net/ directories are discussed.
The /proc/sys/net/core/ directory contains a variety of settings that control the interaction between the kernel and networking layers. The most important of these files are:
  • message_burst — Sets the amount of time in tenths of a second required to write a new warning message. This setting is used to mitigate Denial of Service (DoS) attacks. The default setting is 50.
  • message_cost — Sets a cost on every warning message. The higher the value of this file (default of 5), the more likely the warning message is ignored. This setting is used to mitigate DoS attacks.
    The idea of a DoS attack is to bombard the targeted system with requests that generate errors and fill up disk partitions with log files or require all of the system's resources to handle the error logging. The settings in message_burst and message_cost are designed to be modified based on the system's acceptable risk versus the need for comprehensive logging.
  • netdev_max_backlog — Sets the maximum number of packets allowed to queue when a particular interface receives packets faster than the kernel can process them. The default value for this file is 300.
  • optmem_max — Configures the maximum ancillary buffer size allowed per socket.
  • rmem_default — Sets the receive socket buffer default size in bytes.
  • rmem_max — Sets the receive socket buffer maximum size in bytes.
  • wmem_default — Sets the send socket buffer default size in bytes.
  • wmem_max — Sets the send socket buffer maximum size in bytes.
The /proc/sys/net/ipv4/ directory contains additional networking settings. Many of these settings, used in conjunction with one another, are useful in preventing attacks on the system or when using the system to act as a router.

Warning

An erroneous change to these files may affect remote connectivity to the system.
The following is a list of some of the more important files within the /proc/sys/net/ipv4/ directory:
  • icmp_destunreach_rate, icmp_echoreply_rate, icmp_paramprob_rate, and icmp_timeexeed_rate — Set the maximum ICMP send packet rate, in 1/100 of a second, to hosts under certain conditions. A setting of 0 removes any delay and is not a good idea.
  • icmp_echo_ignore_all and icmp_echo_ignore_broadcasts — Allows the kernel to ignore ICMP ECHO packets from every host or only those originating from broadcast and multicast addresses, respectively. A value of 0 allows the kernel to respond, while a value of 1 ignores the packets.
  • ip_default_ttl — Sets the default Time To Live (TTL), which limits the number of hops a packet may make before reaching its destination. Increasing this value can diminish system performance.
  • ip_forward — Permits interfaces on the system to forward packets to one other. By default, this file is set to 0. Setting this file to 1 enables network packet forwarding.
  • ip_local_port_range — Specifies the range of ports to be used by TCP or UDP when a local port is needed. The first number is the lowest port to be used and the second number specifies the highest port. Any systems that expect to require more ports than the default 1024 to 4999 should use a range from 32768 to 61000.
  • tcp_syn_retries — Provides a limit on the number of times the system re-transmits a SYN packet when attempting to make a connection.
  • tcp_retries1 — Sets the number of permitted re-transmissions attempting to answer an incoming connection. Default of 3.
  • tcp_retries2 — Sets the number of permitted re-transmissions of TCP packets. Default of 15.
The file called
/usr/share/doc/kernel-doc-<version>/Documentation/networking/ ip-sysctl.txt
contains a complete list of files and options available in the /proc/sys/net/ipv4/ directory.
A number of other directories exist within the /proc/sys/net/ipv4/ directory and each covers a different aspect of the network stack. The /proc/sys/net/ipv4/conf/ directory allows each system interface to be configured in different ways, including the use of default settings for unconfigured devices (in the /proc/sys/net/ipv4/conf/default/ subdirectory) and settings that override all special configurations (in the /proc/sys/net/ipv4/conf/all/ subdirectory).
The /proc/sys/net/ipv4/neigh/ directory contains settings for communicating with a host directly connected to the system (called a network neighbor) and also contains different settings for systems more than one hop away.
Routing over IPV4 also has its own directory, /proc/sys/net/ipv4/route/. Unlike conf/ and neigh/, the /proc/sys/net/ipv4/route/ directory contains specifications that apply to routing with any interfaces on the system. Many of these settings, such as max_size, max_delay, and min_delay, relate to controlling the size of the routing cache. To clear the routing cache, write any value to the flush file.
Additional information about these directories and the possible values for their configuration files can be found in:
/usr/share/doc/kernel-doc-<version>/Documentation/filesystems/proc.txt

5.3.9.5.  /proc/sys/vm/

This directory facilitates the configuration of the Linux kernel's virtual memory (VM) subsystem. The kernel makes extensive and intelligent use of virtual memory, which is commonly referred to as swap space.
The following files are commonly found in the /proc/sys/vm/ directory:
  • block_dump — Configures block I/O debugging when enabled. All read/write and block dirtying operations done to files are logged accordingly. This can be useful if diagnosing disk spin up and spin downs for laptop battery conservation. All output when block_dump is enabled can be retrieved via dmesg. The default value is 0.

    Note

    If block_dump is enabled at the same time as kernel debugging, it is prudent to stop the klogd daemon, as it generates erroneous disk activity caused by block_dump.
  • dirty_background_ratio — Starts background writeback of dirty data at this percentage of total memory, via a pdflush daemon. The default value is 10.
  • dirty_expire_centisecs — Defines when dirty in-memory data is old enough to be eligible for writeout. Data which has been dirty in-memory for longer than this interval is written out next time a pdflush daemon wakes up. The default value is 3000, expressed in hundredths of a second.
  • dirty_ratio — Starts active writeback of dirty data at this percentage of total memory for the generator of dirty data, via pdflush. The default value is 40.
  • dirty_writeback_centisecs — Defines the interval between pdflush daemon wakeups, which periodically writes dirty in-memory data out to disk. The default value is 500, expressed in hundredths of a second.
  • laptop_mode — Minimizes the number of times that a hard disk needs to spin up by keeping the disk spun down for as long as possible, therefore conserving battery power on laptops. This increases efficiency by combining all future I/O processes together, reducing the frequency of spin ups. The default value is 0, but is automatically enabled in case a battery on a laptop is used.
    This value is controlled automatically by the acpid daemon once a user is notified battery power is enabled. No user modifications or interactions are necessary if the laptop supports the ACPI (Advanced Configuration and Power Interface) specification.
    For more information, refer to the following installed documentation:
    /usr/share/doc/kernel-doc-<version>/Documentation/laptop-mode.txt
  • lower_zone_protection — Determines how aggressive the kernel is in defending lower memory allocation zones. This is effective when utilized with machines configured with highmem memory space enabled. The default value is 0, no protection at all. All other integer values are in megabytes, and lowmem memory is therefore protected from being allocated by users.
    For more information, refer to the following installed documentation:
    /usr/share/doc/kernel-doc-<version>/Documentation/filesystems/proc.txt
  • max_map_count — Configures the maximum number of memory map areas a process may have. In most cases, the default value of 65536 is appropriate.
  • min_free_kbytes — Forces the Linux VM (virtual memory manager) to keep a minimum number of kilobytes free. The VM uses this number to compute a pages_min value for each lowmem zone in the system. The default value is in respect to the total memory on the machine.
  • nr_hugepages — Indicates the current number of configured hugetlb pages in the kernel.
    For more information, refer to the following installed documentation:
    /usr/share/doc/kernel-doc-<version>/Documentation/vm/hugetlbpage.txt
  • nr_pdflush_threads — Indicates the number of pdflush daemons that are currently running. This file is read-only, and should not be changed by the user. Under heavy I/O loads, the default value of two is increased by the kernel.
  • overcommit_memory — Configures the conditions under which a large memory request is accepted or denied. The following three modes are available:
    • 0 — The kernel performs heuristic memory over commit handling by estimating the amount of memory available and failing requests that are blatantly invalid. Unfortunately, since memory is allocated using a heuristic rather than a precise algorithm, this setting can sometimes allow available memory on the system to be overloaded. This is the default setting.
    • 1 — The kernel performs no memory over commit handling. Under this setting, the potential for memory overload is increased, but so is performance for memory intensive tasks (such as those executed by some scientific software).
    • 2 — The kernel fails requests for memory that add up to all of swap plus the percent of physical RAM specified in /proc/sys/vm/overcommit_ratio. This setting is best for those who desire less risk of memory overcommitment.

      Note

      This setting is only recommended for systems with swap areas larger than physical memory.
  • overcommit_ratio — Specifies the percentage of physical RAM considered when /proc/sys/vm/overcommit_memory is set to 2. The default value is 50.
  • page-cluster — Sets the number of pages read in a single attempt. The default value of 3, which actually relates to 16 pages, is appropriate for most systems.
  • swappiness — Determines how much a machine should swap. The higher the value, the more swapping occurs. The default value, as a percentage, is set to 60.
All kernel-based documentation can be found in the following locally installed location:
/usr/share/doc/kernel-doc-<version>/Documentation/, which contains additional information.

5.3.10.  /proc/sysvipc/

This directory contains information about System V IPC resources. The files in this directory relate to System V IPC calls for messages (msg), semaphores (sem), and shared memory (shm).

5.3.11.  /proc/tty/

This directory contains information about the available and currently used tty devices on the system. Originally called teletype devices, any character-based data terminals are called tty devices.
In Linux, there are three different kinds of tty devices. Serial devices are used with serial connections, such as over a modem or using a serial cable. Virtual terminals create the common console connection, such as the virtual consoles available when pressing Alt+<F-key> at the system console. Pseudo terminals create a two-way communication that is used by some higher level applications, such as XFree86. The drivers file is a list of the current tty devices in use, as in the following example:
serial               /dev/cua        5  64-127 serial:callout
serial               /dev/ttyS       4  64-127 serial
pty_slave            /dev/pts      136   0-255 pty:slave
pty_master           /dev/ptm      128   0-255 pty:master
pty_slave            /dev/ttyp       3   0-255 pty:slave
pty_master           /dev/pty        2   0-255 pty:master
/dev/vc/0            /dev/vc/0       4       0 system:vtmaster
/dev/ptmx            /dev/ptmx       5       2 system
/dev/console         /dev/console    5       1 system:console
/dev/tty             /dev/tty        5       0 system:/dev/tty
unknown              /dev/vc/%d      4    1-63 console
The /proc/tty/driver/serial file lists the usage statistics and status of each of the serial tty lines.
In order for tty devices to be used as network devices, the Linux kernel enforces line discipline on the device. This allows the driver to place a specific type of header with every block of data transmitted over the device, making it possible for the remote end of the connection to a block of data as just one in a stream of data blocks. SLIP and PPP are common line disciplines, and each are commonly used to connect systems to one other over a serial link.
Registered line disciplines are stored in the ldiscs file, and more detailed information is available within the ldisc/ directory.

5.3.12.  /proc/<PID>/

Out of Memory (OOM) refers to a computing state where all available memory, including swap space, has been allocated. When this situation occurs, it will cause the system to panic and stop functioning as expected. There is a switch that controls OOM behavior in /proc/sys/vm/panic_on_oom. When set to 1 the kernel will panic on OOM. A setting of 0 instructs the kernel to call a function named oom_killer on an OOM. Usually, oom_killer can kill rogue processes and the system will survive.
The easiest way to change this is to echo the new value to /proc/sys/vm/panic_on_oom.
~]# cat /proc/sys/vm/panic_on_oom
1
~]# echo 0 > /proc/sys/vm/panic_on_oom
~]# cat /proc/sys/vm/panic_on_oom
0
It is also possible to prioritize which processes get killed by adjusting the oom_killer score. In /proc/<PID>/ there are two tools labelled oom_adj and oom_score. Valid scores for oom_adj are in the range -16 to +15. To see the current oom_killer score, view the oom_score for the process. oom_killer will kill processes with the highest scores first.
This example adjusts the oom_score of a process with a PID of 12465 to make it less likely that oom_killer will kill it.
~]# cat /proc/12465/oom_score
79872
~]# echo -5 > /proc/12465/oom_adj
~]# cat /proc/12465/oom_score
78
There is also a special value of -17, which disables oom_killer for that process. In the example below, oom_score returns a value of 0, indicating that this process would not be killed.
~]# cat /proc/12465/oom_score
78
~]# echo -17 > /proc/12465/oom_adj
~]# cat /proc/12465/oom_score
0
A function called badness() is used to determine the actual score for each process. This is done by adding up 'points' for each examined process. The process scoring is done in the following way:
  1. The basis of each process's score is its memory size.
  2. The memory size of any of the process's children (not including a kernel thread) is also added to the score
  3. The process's score is increased for 'niced' processes and decreased for long running processes.
  4. Processes with the CAP_SYS_ADMIN and CAP_SYS_RAWIO capabilities have their scores reduced.
  5. The final score is then bitshifted by the value saved in the oom_adj file.
Thus, a process with the highest oom_score value will most probably be a non-privileged, recently started process that, along with its children, uses a large amount of memory, has been 'niced', and handles no raw I/O.

5.4. Using the sysctl Command

The /sbin/sysctl command is used to view, set, and automate kernel settings in the /proc/sys/ directory.
For a quick overview of all settings configurable in the /proc/sys/ directory, type the /sbin/sysctl -a command as root. This creates a large, comprehensive list, a small portion of which looks something like the following:
net.ipv4.route.min_delay = 2 kernel.sysrq = 0 kernel.sem = 250     32000     32     128
This is the same information seen if each of the files were viewed individually. The only difference is the file location. For example, the /proc/sys/net/ipv4/route/min_delay file is listed as net.ipv4.route.min_delay, with the directory slashes replaced by dots and the proc.sys portion assumed.
The sysctl command can be used in place of echo to assign values to writable files in the /proc/sys/ directory. For example, instead of using the command
echo 1 > /proc/sys/kernel/sysrq
use the equivalent sysctl command as follows:
~]# sysctl -w kernel.sysrq="1"
kernel.sysrq = 1
While quickly setting single values like this in /proc/sys/ is helpful during testing, this method does not work as well on a production system as special settings within /proc/sys/ are lost when the machine is rebooted. To preserve custom settings, add them to the /etc/sysctl.conf file.
Each time the system boots, the init program runs the /etc/rc.d/rc.sysinit script. This script contains a command to execute sysctl using /etc/sysctl.conf to determine the values passed to the kernel. Any values added to /etc/sysctl.conf therefore take effect each time the system boots.

5.5. Additional Resources

Below are additional sources of information about proc file system.

5.5.1. Installed Documentation

Some of the best documentation about the proc file system is installed on the system by default.
  • /usr/share/doc/kernel-doc-<version>/Documentation/filesystems/proc.txt — Contains assorted, but limited, information about all aspects of the /proc/ directory.
  • /usr/share/doc/kernel-doc-<version>/Documentation/sysrq.txt — An overview of System Request Key options.
  • /usr/share/doc/kernel-doc-<version>/Documentation/sysctl/ — A directory containing a variety of sysctl tips, including modifying values that concern the kernel (kernel.txt), accessing file systems (fs.txt), and virtual memory use (vm.txt).
  • /usr/share/doc/kernel-doc-<version>/Documentation/networking/ip-sysctl.txt — A detailed overview of IP networking options.

5.5.2. Useful Websites

  • http://www.linuxhq.com/ — This website maintains a complete database of source, patches, and documentation for various versions of the Linux kernel.

Chapter 6. Redundant Array of Independent Disks (RAID)

The basic idea behind RAID is to combine multiple small, inexpensive disk drives into an array to accomplish performance or redundancy goals not attainable with one large and expensive drive. This array of drives appears to the computer as a single logical storage unit or drive.

6.1. What is RAID?

RAID allows information to access several disks. RAID uses techniques such as disk striping (RAID Level 0), disk mirroring (RAID Level 1), and disk striping with parity (RAID Level 5) to achieve redundancy, lower latency, increased bandwidth, and maximized ability to recover from hard disk crashes.
RAID consistently distributes data across each drive in the array. RAID then breaks down the data into consistently-sized chunks (commonly 32K or 64k, although other values are acceptable). Each chunk is then written to a hard drive in the RAID array according to the RAID level employed. When the data is read, the process is reversed, giving the illusion that the multiple drives in the array are actually one large drive.

6.1.1. Who Should Use RAID?

System Administrators and others who manage large amounts of data would benefit from using RAID technology. Primary reasons to deploy RAID include:
  • Enhances speed
  • Increases storage capacity using a single virtual disk
  • Minimizes disk failure

6.1.2. Hardware RAID versus Software RAID

There are two possible RAID approaches: hardware RAID and software RAID.
Hardware RAID
The hardware-based array manages the RAID subsystem independently from the host. It presents a single disk per RAID array to the host.
A hardware RAID device connects to the SCSI controller and presents the RAID arrays as a single SCSI drive. An external RAID system moves all RAID handling intelligence into a controller located in the external disk subsystem. The whole subsystem is connected to the host via a normal SCSI controller and appears to the host as a single disk.
RAID controller cards function like a SCSI controller to the operating system, and handle all the actual drive communications. The user plugs the drives into the RAID controller (just like a normal SCSI controller) and then adds them to the RAID controllers configuration, and the operating system won't know the difference.
Software RAID
Software RAID implements the various RAID levels in the kernel disk (block device) code. It offers the cheapest possible solution, as expensive disk controller cards or hot-swap chassis[1] are not required. Software RAID also works with cheaper IDE disks as well as SCSI disks. With today's faster CPUs, software RAID outperforms hardware RAID.
The Linux kernel contains an MD driver that allows the RAID solution to be completely hardware independent. The performance of a software-based array depends on the server CPU performance and load.
To learn more about software RAID, here are the key features:
  • Threaded rebuild process
  • Kernel-based configuration
  • Portability of arrays between Linux machines without reconstruction
  • Backgrounded array reconstruction using idle system resources
  • Hot-swappable drive support
  • Automatic CPU detection to take advantage of certain CPU optimizations

6.1.3. RAID Levels and Linear Support

RAID supports various configurations, including levels 0, 1, 4, 5, and linear. These RAID types are defined as follows:
Level 0
RAID level 0, often called striping, is a performance-oriented striped data mapping technique. This means the data being written to the array is broken down into strips and written across the member disks of the array, allowing high I/O performance at low inherent cost but provides no redundancy. The storage capacity of a level 0 array is equal to the total capacity of the member disks in a hardware RAID or the total capacity of member partitions in a software RAID.
Level 1
RAID level 1, or mirroring, has been used longer than any other form of RAID. Level 1 provides redundancy by writing identical data to each member disk of the array, leaving a mirrored copy on each disk. Mirroring remains popular due to its simplicity and high level of data availability. Level 1 operates with two or more disks that may use parallel access for high data-transfer rates when reading but more commonly operate independently to provide high I/O transaction rates. Level 1 provides very good data reliability and improves performance for read-intensive applications but at a relatively high cost. The storage capacity of the level 1 array is equal to the capacity of one of the mirrored hard disks in a hardware RAID or one of the mirrored partitions in a software RAID.

Note

RAID level 1 comes at a high cost because you write the same information to all of the disks in the array, which wastes drive space. For example, if you have RAID level 1 set up so that your root (/) partition exists on two 40G drives, you have 80G total but are only able to access 40G of that 80G. The other 40G acts like a mirror of the first 40G.
Level 4
RAID level 4 uses parity[2] concentrated on a single disk drive to protect data. It is better suited to transaction I/O rather than large file transfers. Because the dedicated parity disk represents an inherent bottleneck, level 4 is seldom used without accompanying technologies such as write-back caching. Although RAID level 4 is an option in some RAID partitioning schemes, it is not an option allowed in Red Hat Enterprise Linux RAID installations. The storage capacity of hardware RAID level 4 is equal to the capacity of member disks, minus the capacity of one member disk. The storage capacity of software RAID level 4 is equal to the capacity of the member partitions, minus the size of one of the partitions if they are of equal size.

Note

RAID level 4 takes up the same amount of space as RAID level 5, but level 5 has more advantages. For this reason, level 4 is not supported.
Level 5
RAID level 5 is the most common type of RAID. By distributing parity across some or all of an array's member disk drives, RAID level 5 eliminates the write bottleneck inherent in level 4. The only performance bottleneck is the parity calculation process. With modern CPUs and software RAID, that usually is not a very big problem. As with level 4, the result is asymmetrical performance, with reads substantially outperforming writes. Level 5 is often used with write-back caching to reduce the asymmetry. The storage capacity of hardware RAID level 5 is equal to the capacity of member disks, minus the capacity of one member disk. The storage capacity of software RAID level 5 is equal to the capacity of the member partitions, minus the size of one of the partitions if they are of equal size.
Linear RAID
Linear RAID is a simple grouping of drives to create a larger virtual drive. In linear RAID, the chunks are allocated sequentially from one member drive, going to the next drive only when the first is completely filled. This grouping provides no performance benefit, as it is unlikely that any I/O operations will be split between member drives. Linear RAID also offers no redundancy and, in fact, decreases reliability — if any one member drive fails, the entire array cannot be used. The capacity is the total of all member disks.

6.2. Configuring Software RAID

Users can configure software RAID during the graphical installation process, the text-based installation process, or during a kickstart installation. This section discusses software RAID configuration during the installation process using the Disk Druid application, and covers the following steps:
  1. Creating software RAID partitions on physical hard drives.
  2. Creating RAID devices from the software RAID partitions.
  3. (Optional) Configuring LVM from the RAID devices.
  4. Creating file systems from the RAID devices.
To configure software RAID, select Create custom layout from the pulldown list on the Disk Partitioning Setup screen, click the Next button, and follow the instructions in the rest of this section. The example screenshots in this section use two 10 GB disk drives (/dev/hda and /dev/hdb) to illustrate the creation of simple RAID 1 and RAID 0 configurations, and detail how to create a simple RAID configuration by implementing multiple RAID devices.

6.2.1. Creating the RAID Partitions

In a typical situation, the disk drives are new or are formatted. Both drives are shown as raw devices with no partition configuration in Figure 6.1, “Two Blank Drives, Ready For Configuration”.
Two Blank Drives, Ready For Configuration

Figure 6.1. Two Blank Drives, Ready For Configuration

  1. In Disk Druid, click the RAID button to enter the software RAID creation screen.
  2. Choose Create a software RAID partition to create a RAID partition as shown in Figure 6.2, “RAID Partition Options”. Note that no other RAID options (such as entering a mount point) are available until RAID partitions, as well as RAID devices, are created. Click OK to confirm the choice.
    RAID Partition Options

    Figure 6.2. RAID Partition Options

  3. A software RAID partition must be constrained to one drive. For Allowable Drives, select the drive to use for RAID. If you have multiple drives, by default all drives are selected and you must deselect the drives you do not want.
    Adding a RAID Partition

    Figure 6.3. Adding a RAID Partition

  4. Edit the Size (MB) field, and enter the size that you want the partition to be (in MB).
  5. Select Fixed Size to specify partition size. Select Fill all space up to (MB) and enter a value (in MB) to specify partition size range. Select Fill to maximum allowable size to allow maximum available space of the hard disk. Note that if you make more than one space growable, they share the available free space on the disk.
  6. Select Force to be a primary partition if you want the partition to be a primary partition. A primary partition is one of the first four partitions on the hard drive. If unselected, the partition is created as a logical partition. If other operating systems are already on the system, unselecting this option should be considered. For more information on primary versus logical/extended partitions, refer to the appendix section of the Red Hat Enterprise Linux Installation Guide.
Repeat these steps to create as many partitions as needed for your RAID setup. Notice that all the partitions do not have to be RAID partitions. For example, you can configure only the /boot partition as a software RAID device, leaving the root partition (/), /home, and swap as regular file systems. Figure 6.4, “RAID 1 Partitions Ready, Pre-Device and Mount Point Creation” shows successfully allocated space for the RAID 1 configuration (for /boot), which is now ready for RAID device and mount point creation:
RAID 1 Partitions Ready, Pre-Device and Mount Point Creation

Figure 6.4. RAID 1 Partitions Ready, Pre-Device and Mount Point Creation

6.2.2. Creating the RAID Devices and Mount Points

Once you create all of your partitions as software RAID partitions, you must create the RAID device and mount point.
  1. On the main partitioning screen, click the RAID button. The RAID Options dialog appears as shown in Figure 6.5, “RAID Options”.
    RAID Options

    Figure 6.5. RAID Options

  2. Select the Create a RAID device option, and click OK. As shown in Figure 6.6, “Making a RAID Device and Assigning a Mount Point”, the Make RAID Device dialog appears, allowing you to make a RAID device and assign a mount point.
    Making a RAID Device and Assigning a Mount Point

    Figure 6.6. Making a RAID Device and Assigning a Mount Point

  3. Select a mount point from the Mount Point pulldown list.
  4. Choose the file system type for the partition from the File System Type pulldown list. At this point you can either configure a dynamic LVM file system or a traditional static ext2/ext3 file system. For more information on LVM and its configuration during the installation process, refer to Chapter 11, LVM (Logical Volume Manager). If LVM is not required, continue on with the following instructions.
  5. From the RAID Device pulldown list, select a device name such as md0.
  6. From the RAID Level, choose the required RAID level.

    Note

    If you are making a RAID partition of /boot, you must choose RAID level 1, and it must use one of the first two drives (IDE first, SCSI second). If you are not creating a separate RAID partition of /boot, and you are making a RAID partition for the root file system (that is, /), it must be RAID level 1 and must use one of the first two drives (IDE first, SCSI second).
  7. The RAID partitions created appear in the RAID Members list. Select which of these partitions should be used to create the RAID device.
  8. If configuring RAID 1 or RAID 5, specify the number of spare partitions in the Number of spares field. If a software RAID partition fails, the spare is automatically used as a replacement. For each spare you want to specify, you must create an additional software RAID partition (in addition to the partitions for the RAID device). Select the partitions for the RAID device and the partition(s) for the spare(s).
  9. Click OK to confirm the setup. The RAID device appears in the Drive Summary list.
  10. Repeat this chapter's entire process for configuring additional partitions, devices, and mount points, such as the root partition (/), home partition (/home), or swap.
After completing the entire configuration, the figure as shown in Figure 6.7, “Sample RAID Configuration” resembles the default configuration, except for the use of RAID.
Sample RAID Configuration

Figure 6.7. Sample RAID Configuration

The figure as shown in Figure 6.8, “Sample RAID With LVM Configuration” is an example of a RAID and LVM configuration.
Sample RAID With LVM Configuration

Figure 6.8. Sample RAID With LVM Configuration

You can proceed with your installation process by clicking Next. Refer to the Red Hat Enterprise Linux Installation Guide for further instructions.

6.3. Managing Software RAID

This section discusses software RAID configuration and management after the installation, and covers the following topics:
  • Reviewing existing software RAID configuration.
  • Creating a new RAID device.
  • Replacing a faulty device in an array.
  • Adding a new device to an existing array.
  • Deactivating and removing an existing RAID device.
  • Saving the configuration.
All examples in this section use the software RAID configuration from the previous section.

6.3.1. Reviewing RAID Configuration

When a software RAID is in use, basic information about all presently active RAID devices are stored in the /proc/mdstat special file. To list these devices, display the content of this file by typing the following at a shell prompt:
cat /proc/mdstat
To determine whether a certain device is a RAID device or a component device, run the command in the following form as root:
mdadm --query device
In order to examine a RAID device in more detail, use the following command:
mdadm --detail raid_device
Similarly, to examine a component device, type:
mdadm --examine component_device
While the mdadm --detail command displays information about a RAID device, mdadm --examine only relays information about a RAID device as it relates to a given component device. This distinction is particularly important when working with a RAID device that itself is a component of another RAID device.
The mdadm --query command, as well as both mdadm --detail and mdadm --examine commands allow you to specify multiple devices at once.

Example 6.1. Reviewing RAID configuration

Assume the system uses configuration from Figure 6.7, “Sample RAID Configuration”. You can verify that /dev/md0 is a RAID device by typing the following at a shell prompt:
~]# mdadm --query /dev/md0
/dev/md0: 125.38MiB raid1 2 devices, 0 spares. Use mdadm --detail for more detail.
/dev/md0: No md super block found, not an md component.
As you can see, the above command produces only a brief overview of the RAID device and its configuration. To display more detailed information, use the following command instead:
~]# mdadm --detail /dev/md0
/dev/md0:
        Version : 0.90
  Creation Time : Tue Jun 28 16:05:49 2011
     Raid Level : raid1
     Array Size : 128384 (125.40 MiB 131.47 MB)
  Used Dev Size : 128384 (125.40 MiB 131.47 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Thu Jun 30 17:06:34 2011
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 49c5ac74:c2b79501:5c28cb9c:16a6dd9f
         Events : 0.6

    Number   Major   Minor   RaidDevice State
       0       3        1        0      active sync   /dev/hda1
       1       3       65        1      active sync   /dev/hdb1
Finally, to list all presently active RAID devices, type:
~]$ cat /proc/mdstat
Personalities : [raid0] [raid1]
md0 : active raid1 hdb1[1] hda1[0]
      128384 blocks [2/2] [UU]
      
md1 : active raid0 hdb2[1] hda2[0]
      1573888 blocks 256k chunks

md2 : active raid0 hdb3[1] hda3[0]
      19132928 blocks 256k chunks

unused devices: <none>

6.3.2. Creating a New RAID Device

To create a new RAID device, use the command in the following form as root:
mdadm --create raid_device --level=level --raid-devices=number component_device
This is the simplest way to create a RAID array. There are many more options that allow you to specify the number of spare devices, the block size of a stripe array, if the array has a write-intent bitmap, and much more. All these options can have a significant impact on the performance, but are beyond the scope of this document. For more detailed information, refer to the CREATE MODE section of the mdadm(8) manual page.

Example 6.2. Creating a new RAID device

Assume that the system has two unused SCSI disk drives available, and that each of these devices has exactly one partition of the same size:
~]# ls /dev/sd*
/dev/sda  /dev/sda1  /dev/sdb  /dev/sdb1
To create /dev/md3 as a new RAID level 1 array from /dev/sda1 and /dev/sdb1, run the following command:
~]# mdadm --create /dev/md3 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1
mdadm: array /dev/md3 started.

6.3.3. Replacing a Faulty Device

To replace a particular device in a software RAID, first make sure it is marked as faulty by running the following command as root:
mdadm raid_device --fail component_device
Then remove the faulty device from the array by using the command in the following form:
mdadm raid_device --remove component_device
Once the device is operational again, you can re-add it to the array:
mdadm raid_device --add component_device

Example 6.3. Replacing a faulty device

Assume the system has an active RAID device, /dev/md3, with the following layout (that is, the RAID device created in Example 6.2, “Creating a new RAID device”):
~]# mdadm --detail /dev/md3 | tail -n 3
    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
Imagine the first disk drive fails and needs to be replaced. To do so, first mark the /dev/sdb1 device as faulty:
~]# mdadm /dev/md3 --fail /dev/sdb1
mdadm: set /dev/sdb1 faulty in /dev/md3
Then remove it from the RAID device:
~]# mdadm /dev/md3 --remove /dev/sdb1
mdadm: hot removed /dev/sdb1
As soon as the hardware is replaced, you can add the device back to the array by using the following command:
~]# mdadm /dev/md3 --add /dev/sdb1
mdadm: added /dev/sdb1

6.3.4. Extending a RAID Device

To add a new device to an existing array, use the command in the following form as root:
mdadm raid_device --add component_device
This will add the device as a spare device. To grow the array to use this device actively, type the following at a shell prompt:
mdadm --grow raid_device --raid-devices=number

Example 6.4. Extending a RAID device

Assume the system has an active RAID device, /dev/md3, with the following layout (that is, the RAID device created in Example 6.2, “Creating a new RAID device”):
~]# mdadm --detail /dev/md3 | tail -n 3
    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
Also assume that a new SCSI disk drive, /dev/sdc, has been added and has exactly one partition. To add it to the /dev/md3 array, type the following at a shell prompt:
~]# mdadm /dev/md3 --add /dev/sdc1
mdadm: added /dev/sdc1
This will add /dev/sdc1 as a spare device. To change the size of the array to actually use it, type:
~]# mdadm --grow /dev/md3 --raid-devices=3

6.3.5. Removing a RAID Device

To remove an existing RAID device, first deactivate it by running the following command as root:
mdadm --stop raid_device
Once deactivated, remove the RAID device itself:
mdadm --remove raid_device
Finally, zero superblocks on all devices that were associated with the particular array:
mdadm --zero-superblock component_device

Example 6.5. Removing a RAID device

Assume the system has an active RAID device, /dev/md3, with the following layout (that is, the RAID device created in Example 6.4, “Extending a RAID device”):
~]# mdadm --detail /dev/md3 | tail -n 4
    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
       2       8       33        2      active sync   /dev/sdc1
In order to remove this device, first stop it by typing the following at a shell prompt:
~]# mdadm --stop /dev/md3
mdadm: stopped /dev/md3
Once stopped, you can remove the /dev/md3 device by running the following command:
~]# mdadm --remove /dev/md3
Finally, to remove the superblocks from all associated devices, type:
~]# mdadm --zero-superblock /dev/sda1 /dev/sdb1 /dev/sdc1

6.3.6. Preserving the Configuration

By default, changes made by the mdadm command only apply to the current session, and will not survive a system restart. At boot time, the mdmonitor service reads the content of the /etc/mdadm.conf configuration file to see which RAID devices to start. If the software RAID was configured during the graphical installation process, this file contains directives listed in Table 6.1, “Common mdadm.conf directives” by default.

Table 6.1. Common mdadm.conf directives

Option Description
ARRAY
Allows you to identify a particular array.
DEVICE
Allows you to specify a list of devices to scan for a RAID component (for example, /dev/hda1). You can also use the keyword partitions to use all partitions listed in /proc/partitions, or containers to specify an array container.
MAILADDR Allows you to specify an email address to use in case of an alert.
To list what ARRAY lines are presently in use regardless of the configuration, run the following command as root:
mdadm --detail --scan
Use the output of this command to determine which lines to add to the /etc/mdadm.conf file. You can also display the ARRAY line for a particular device:
mdadm --detail --brief raid_device
By redirecting the output of this command, you can add such a line to the configuration file with a single command:
mdadm --detail --brief raid_device >> /etc/mdadm.conf

Example 6.6. Preserving the configuration

By default, the /etc/mdadm.conf contains the software RAID configuration created during the system installation:
# mdadm.conf written out by anaconda
DEVICE partitions
MAILADDR root
ARRAY /dev/md0 level=raid1 num-devices=2 UUID=49c5ac74:c2b79501:5c28cb9c:16a6dd9f
ARRAY /dev/md1 level=raid0 num-devices=2 UUID=76914c11:5bfa2c00:dc6097d1:a1f4506d
ARRAY /dev/md2 level=raid0 num-devices=2 UUID=2b5d38d0:aea898bf:92be20e2:f9d893c5
Assuming you have created the /dev/md3 device as shown in Example 6.2, “Creating a new RAID device”, you can make it persistent by running the following command:
~]# mdadm --detail --brief /dev/md3 >> /etc/mdadm.conf

6.4. Additional Resources

For more information on RAID, refer to the following resources.

6.4.1. Installed Documentation

  • mdadm man page — A manual page for the mdadm utility.
  • mdadm.conf man page — A manual page that provides a comprehensive list of available /etc/mdadm.conf configuration options.


[1] A hot-swap chassis allows you to remove a hard drive without having to power-down your system.
[2] Parity information is calculated based on the contents of the rest of the member disks in the array. This information can then be used to reconstruct data when one disk in the array fails. The reconstructed data can then be used to satisfy I/O requests to the failed disk before it is replaced and to repopulate the failed disk after it has been replaced.

Chapter 7. Swap Space

7.1. What is Swap Space?

Swap space in Linux is used when the amount of physical memory (RAM) is full. If the system needs more memory resources and the RAM is full, inactive pages in memory are moved to the swap space. While swap space can help machines with a small amount of RAM, it should not be considered a replacement for more RAM. Swap space is located on hard drives, which have a slower access time than physical memory.
Swap space can be a dedicated swap partition (recommended), a swap file, or a combination of swap partitions and swap files.
In years past, the recommended amount of swap space increased linearly with the amount of RAM in the system. But because the amount of memory in modern systems has increased into the hundreds of gigabytes, it is now recognized that the amount of swap space that a system needs is a function of the memory workload running on that system. However, given that swap space is usually designated at install time, and that it can be difficult to determine beforehand the memory workload of a system, we recommend determining system swap using the following table.

Important

File systems and LVM2 volumes assigned as swap space cannot be in use when being modified. For example, no system processes can be assigned the swap space, as well as no amount of swap should be allocated and used by the kernel. Use the free and cat /proc/swaps commands to verify how much and where swap is in use.
The best way to achieve swap space modifications is to boot your system in rescue mode, and then follow the instructions (for each scenario) in the remainder of this chapter. Refer to the Red Hat Enterprise Linux Installation Guide for instructions on booting into rescue mode. When prompted to mount the file system, select Skip.

7.2. Adding Swap Space

Sometimes it is necessary to add more swap space after installation. For example, you may upgrade the amount of RAM in your system from 128 MB to 256 MB, but there is only 256 MB of swap space. It might be advantageous to increase the amount of swap space to 512 MB if you perform memory-intense operations or run applications that require a large amount of memory.
You have three options: create a new swap partition, create a new swap file, or extend swap on an existing LVM2 logical volume. It is recommended that you extend an existing logical volume.

7.2.1. Extending Swap on an LVM2 Logical Volume

To extend an LVM2 swap logical volume (assuming /dev/VolGroup00/LogVol01 is the volume you want to extend):
  1. Disable swapping for the associated logical volume:
    swapoff -v /dev/VolGroup00/LogVol01
  2. Resize the LVM2 logical volume by 256 MB:
    lvm lvresize /dev/VolGroup00/LogVol01 -L +256M
  3. Format the new swap space:
    mkswap /dev/VolGroup00/LogVol01
  4. Enable the extended logical volume:
    swapon -va
  5. Test that the logical volume has been extended properly:
    cat /proc/swaps
    free

7.2.2. Creating an LVM2 Logical Volume for Swap

To add a swap volume group (assuming /dev/VolGroup00/LogVol02 is the swap volume you want to add):
  1. Create the LVM2 logical volume of size 256 MB:
    lvm lvcreate VolGroup00 -n LogVol02 -L 256M
  2. Format the new swap space:
    mkswap /dev/VolGroup00/LogVol02
  3. Add the following entry to the /etc/fstab file:
    /dev/VolGroup00/LogVol02   swap     swap    defaults     0 0
  4. Enable the extended logical volume:
    swapon -va
  5. Test that the logical volume has been extended properly:
    cat /proc/swaps
    free

7.2.3. Creating a Swap File

To add a swap file:
  1. Determine the size of the new swap file in megabytes and multiply by 1024 to determine the number of blocks. For example, the block size of a 64 MB swap file is 65536.
  2. At a shell prompt as root, type the following command with count being equal to the desired block size:
    dd if=/dev/zero of=/swapfile bs=1024 count=65536
  3. Change the persmissions of the newly created file:
    chmod 0600 /swapfile
  4. Setup the swap file with the command:
    mkswap /swapfile
  5. To enable the swap file immediately but not automatically at boot time:
    swapon /swapfile
  6. To enable it at boot time, edit /etc/fstab to include the following entry:
    /swapfile          swap            swap    defaults        0 0
    The next time the system boots, it enables the new swap file.
  7. After adding the new swap file and enabling it, verify it is enabled by viewing the output of the command cat /proc/swaps or free.

7.3. Removing Swap Space

Sometimes it can be prudent to reduce swap space after installation. For example, say you downgraded the amount of RAM in your system from 1 GB to 512 MB, but there is 2 GB of swap space still assigned. It might be advantageous to reduce the amount of swap space to 1 GB, since the larger 2 GB could be wasting disk space.
You have three options: remove an entire LVM2 logical volume used for swap, remove a swap file, or reduce swap space on an existing LVM2 logical volume.

7.3.1. Reducing Swap on an LVM2 Logical Volume

To reduce an LVM2 swap logical volume (assuming /dev/VolGroup00/LogVol01 is the volume you want to reduce):
  1. Disable swapping for the associated logical volume:
    swapoff -v /dev/VolGroup00/LogVol01
  2. Reduce the LVM2 logical volume by 512 MB:
    lvm lvreduce /dev/VolGroup00/LogVol01 -L -512M
  3. Format the new swap space:
    mkswap /dev/VolGroup00/LogVol01
  4. Enable the extended logical volume:
    swapon -va
  5. Test that the logical volume has been reduced properly:
    cat /proc/swaps
    free

7.3.2. Removing an LVM2 Logical Volume for Swap

The swap logical volume cannot be in use (no system locks or processes on the volume). The easiest way to achieve this is to boot your system in rescue mode. Refer to the Red Hat Enterprise Linux Installation Guide for instructions on booting into rescue mode. When prompted to mount the file system, select Skip.
To remove a swap volume group (assuming /dev/VolGroup00/LogVol02 is the swap volume you want to remove):
  1. Disable swapping for the associated logical volume:
    swapoff -v /dev/VolGroup00/LogVol02
  2. Remove the LVM2 logical volume of size 512 MB:
    lvm lvremove /dev/VolGroup00/LogVol02
  3. Remove the following entry from the /etc/fstab file:
    /dev/VolGroup00/LogVol02   swap     swap    defaults     0 0
  4. Test that the logical volume has been removed:
    cat /proc/swaps
    free

7.3.3. Removing a Swap File

To remove a swap file:
  1. At a shell prompt as root, execute the following command to disable the swap file (where /swapfile is the swap file):
    swapoff -v /swapfile
  2. Remove its entry from the /etc/fstab file.
  3. Remove the actual file:
    rm /swapfile

7.4. Moving Swap Space

To move swap space from one location to another, follow the steps for removing swap space, and then follow the steps for adding swap space.

Chapter 8. Managing Disk Storage

8.1. Standard Partitions using parted

The utility parted allows users to:
  • View the existing partition table
  • Change the size of existing partitions
  • Add partitions from free space or additional hard drives
If you want to view the system's disk space usage or monitor the disk space usage, refer to Section 42.3, “File Systems”.
By default, the parted package is included when installing Red Hat Enterprise Linux. To start parted, log in as root and type the command parted /dev/sda at a shell prompt (where /dev/sda is the device name for the drive you want to configure).
If you want to remove or resize a partition, the device on which that partition resides must not be in use. Creating a new partition on a device which is in use—while possible—is not recommended.
For a device to not be in use, none of the partitions on the device can be mounted, and any swap space on the device must not be enabled.
As well, the partition table should not be modified while it is in use because the kernel may not properly recognize the changes. If the partition table does not match the actual state of the mounted partitions, information could be written to the wrong partition, resulting in lost and overwritten data.
The easiest way to achieve this is to boot your system in rescue mode. When prompted to mount the file system, select Skip.
Alternately, if the drive does not contain any partitions in use (system processes that use or lock the file system from being unmounted), you can unmount them with the umount command and turn off all the swap space on the hard drive with the swapoff command.
Table 8.1, “parted commands” contains a list of commonly used parted commands. The sections that follow explain some of these commands and arguments in more detail.

Table 8.1. parted commands

Command Description
check minor-num Perform a simple check of the file system
cp from to Copy file system from one partition to another; from and to are the minor numbers of the partitions
help Display list of available commands
mklabel label Create a disk label for the partition table
mkfs minor-num file-system-type Create a file system of type file-system-type
mkpart part-type fs-type start-mb end-mb Make a partition without creating a new file system
mkpartfs part-type fs-type start-mb end-mb Make a partition and create the specified file system
move minor-num start-mb end-mb Move the partition
name minor-num name Name the partition for Mac and PC98 disklabels only
print Display the partition table
quit Quit parted
rescue start-mb end-mb Rescue a lost partition from start-mb to end-mb
resize minor-num start-mb end-mb Resize the partition from start-mb to end-mb
rm minor-num Remove the partition
select device Select a different device to configure
set minor-num flag state Set the flag on a partition; state is either on or off
toggle [NUMBER [FLAG] Toggle the state of FLAG on partition NUMBER
unit UNIT Set the default unit to UNIT

8.1.1. Viewing the Partition Table

After starting parted, use the command print to view the partition table. A table similar to the following appears:
Model: ATA ST3160812AS (scsi)
Disk /dev/sda: 160GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start   End    Size    Type      File system  Flags
 1      32.3kB  107MB  107MB   primary   ext3         boot
 2      107MB   105GB  105GB   primary   ext3
 3      105GB   107GB  2147MB  primary   linux-swap
 4      107GB   160GB  52.9GB  extended		      root
 5      107GB   133GB  26.2GB  logical   ext3
 6      133GB   133GB  107MB   logical   ext3
 7      133GB   160GB  26.6GB  logical                lvm
The first line contains the disk type, manufacturer, model number and interface, and the second line displays the disk label type. The remaining output below the fourth line shows the partition table.
In the partition table, the Minor number is the partition number. For example, the partition with minor number 1 corresponds to /dev/sda1. The Start and End values are in megabytes. Valid Type are metadata, free, primary, extended, or logical. The Filesystem is the file system type, which can be any of the following:
  • ext2
  • ext3
  • fat16
  • fat32
  • hfs
  • jfs
  • linux-swap
  • ntfs
  • reiserfs
  • hp-ufs
  • sun-ufs
  • xfs
If a Filesystem of a device shows no value, this means that its file system type is unknown.
The Flags column lists the flags set for the partition. Available flags are boot, root, swap, hidden, raid, lvm, or lba.

Note

To select a different device without having to restart parted, use the select command followed by the device name (for example, /dev/sda). Doing so allows you to view or configure the partition table of a device.

8.1.2. Creating a Partition

Warning

Do not attempt to create a partition on a device that is in use.
Before creating a partition, boot into rescue mode (or unmount any partitions on the device and turn off any swap space on the device).
Start parted, where /dev/sda is the device on which to create the partition:
parted /dev/sda
View the current partition table to determine if there is enough free space:
print
If there is not enough free space, you can resize an existing partition. Refer to Section 8.1.4, “Resizing a Partition” for details.

8.1.2.1. Making the Partition

From the partition table, determine the start and end points of the new partition and what partition type it should be. You can only have four primary partitions (with no extended partition) on a device. If you need more than four partitions, you can have three primary partitions, one extended partition, and multiple logical partitions within the extended. For an overview of disk partitions, refer to the appendix An Introduction to Disk Partitions in the Red Hat Enterprise Linux Installation Guide.
For example, to create a primary partition with an ext3 file system from 1024 megabytes until 2048 megabytes on a hard drive type the following command:
mkpart primary ext3 1024 2048

Note

If you use the mkpartfs command instead, the file system is created after the partition is created. However, parted does not support creating an ext3 file system. Thus, if you wish to create an ext3 file system, use mkpart and create the file system with the mkfs command as described later.
The changes start taking place as soon as you press Enter, so review the command before executing to it.
After creating the partition, use the print command to confirm that it is in the partition table with the correct partition type, file system type, and size. Also remember the minor number of the new partition so that you can label it. You should also view the output of
cat /proc/partitions
to make sure the kernel recognizes the new partition.

8.1.2.2. Formatting the Partition

The partition still does not have a file system. Create the file system:
mkfs -t ext3 /dev/sda6

Warning

Formatting the partition permanently destroys any data that currently exists on the partition.

8.1.2.3. Labeling the Partition

Next, give the partition a label. For example, if the new partition is /dev/sda6 and you want to label it /work:
e2label /dev/sda6 /work
By default, the installation program uses the mount point of the partition as the label to make sure the label is unique. You can use any label you want.

8.1.2.4. Creating the Mount Point

As root, create the mount point:
mkdir /work

8.1.2.5. Add to /etc/fstab

As root, edit the /etc/fstab file to include the new partition. The new line should look similar to the following:
LABEL=/work           /work                 ext3    defaults        1 2
The first column should contain LABEL= followed by the label you gave the partition. The second column should contain the mount point for the new partition, and the next column should be the file system type (for example, ext3 or swap). If you need more information about the format, read the man page with the command man fstab.
If the fourth column is the word defaults, the partition is mounted at boot time. To mount the partition without rebooting, as root, type the command:
mount /work

8.1.3. Removing a Partition

Warning

Do not attempt to remove a partition on a device that is in use.
Before removing a partition, boot into rescue mode (or unmount any partitions on the device and turn off any swap space on the device).
Start parted, where /dev/sda is the device on which to remove the partition:
parted /dev/sda
View the current partition table to determine the minor number of the partition to remove:
print
Remove the partition with the command rm. For example, to remove the partition with minor number 3:
rm 3
The changes start taking place as soon as you press Enter, so review the command before committing to it.
After removing the partition, use the print command to confirm that it is removed from the partition table. You should also view the output of
cat /proc/partitions
to make sure the kernel knows the partition is removed.
The last step is to remove it from the /etc/fstab file. Find the line that declares the removed partition, and remove it from the file.

8.1.4. Resizing a Partition

Warning

Do not attempt to resize a partition on a device that is in use.
Before resizing a partition, boot into rescue mode (or unmount any partitions on the device and turn off any swap space on the device).
Start parted, where /dev/sda is the device on which to resize the partition:
parted /dev/sda
View the current partition table to determine the minor number of the partition to resize as well as the start and end points for the partition:
print
To resize the partition, use the resize command followed by the minor number for the partition, the starting place in megabytes, and the end place in megabytes. For example:
resize 3 1024 2048

Warning

A partition cannot be made larger than the space available on the device
After resizing the partition, use the print command to confirm that the partition has been resized correctly, is the correct partition type, and is the correct file system type.
After rebooting the system into normal mode, use the command df to make sure the partition was mounted and is recognized with the new size.

8.2. LVM Partition Management

The following commands can be found by issuing lvm help at a command prompt.

Table 8.2. LVM commands

Command Description
dumpconfig Dump the active configuration
formats List the available metadata formats
help Display the help commands
lvchange Change the attributes of logical volume(s)
lvcreate Create a logical volume
lvdisplay Display information about a logical volume
lvextend Add space to a logical volume
lvmchange Due to use of the device mapper, this command has been deprecated
lvmdiskscan List devices that may be used as physical volumes
lvmsadc Collect activity data
lvmsar Create activity report
lvreduce Reduce the size of a logical volume
lvremove Remove logical volume(s) from the system
lvrename Rename a logical volume
lvresize Resize a logical volume
lvs Display information about logical volumes
lvscan List all logical volumes in all volume groups
pvchange Change attributes of physical volume(s)
pvcreate Initialize physical volume(s) for use by LVM
pvdata Display the on-disk metadata for physical volume(s)
pvdisplay Display various attributes of physical volume(s)
pvmove Move extents from one physical volume to another
pvremove Remove LVM label(s) from physical volume(s)
pvresize Resize a physical volume in use by a volume group
pvs Display information about physical volumes
pvscan List all physical volumes
segtypes List available segment types
vgcfgbackup Backup volume group configuration
vgcfgrestore Restore volume group configuration
vgchange Change volume group attributes
vgck Check the consistency of a volume group
vgconvert Change volume group metadata format
vgcreate Create a volume group
vgdisplay Display volume group information
vgexport Unregister a volume group from the system
vgextend Add physical volumes to a volume group
vgimport Register exported volume group with system
vgmerge Merge volume groups
vgmknodes Create the special files for volume group devices in /dev/
vgreduce Remove a physical volume from a volume group
vgremove Remove a volume group
vgrename Rename a volume group
vgs Display information about volume groups
vgscan Search for all volume groups
vgsplit Move physical volumes into a new volume group
version Display software and driver version information

Chapter 9. Implementing Disk Quotas

Disk space can be restricted by implementing disk quotas which alert a system administrator before a user consumes too much disk space or a partition becomes full.
Disk quotas can be configured for individual users as well as user groups. This makes it possible to manage the space allocated for user-specific files (such as email) separately from the space allocated to the projects a user works on (assuming the projects are given their own groups).
In addition, quotas can be set not just to control the number of disk blocks consumed but to control the number of inodes (data structures that contain information about files in UNIX file systems). Because inodes are used to contain file-related information, this allows control over the number of files that can be created.
The quota RPM must be installed to implement disk quotas.

Note

For more information on installing RPM packages, refer to Part II, “Package Management”.

9.1. Configuring Disk Quotas

To implement disk quotas, use the following steps:
  1. Enable quotas per file system by modifying the /etc/fstab file.
  2. Remount the file system(s).
  3. Create the quota database files and generate the disk usage table.
  4. Assign quota policies.
Each of these steps is discussed in detail in the following sections.

9.1.1. Enabling Quotas

As root, using a text editor, edit the /etc/fstab file. Add the usrquota and/or grpquota options to the file systems that require quotas:
/dev/VolGroup00/LogVol00 /         ext3    defaults        1 1
LABEL=/boot              /boot     ext3    defaults        1 2
none                     /dev/pts  devpts  gid=5,mode=620  0 0
none                     /dev/shm  tmpfs   defaults        0 0
none                     /proc     proc    defaults        0 0
none                     /sys      sysfs   defaults        0 0
/dev/VolGroup00/LogVol02 /home     ext3    defaults,usrquota,grpquota  1 2
/dev/VolGroup00/LogVol01 swap      swap    defaults        0 0 . . .
In this example, the /home file system has both user and group quotas enabled.

Note

The following examples assume that a separate /home partition was created during the installation of Red Hat Enterprise Linux. The root (/) partition can be used for setting quota policies in the /etc/fstab file.

9.1.2. Remounting the File Systems

After adding the usrquota and/or grpquota options, remount each file system whose fstab entry has been modified. If the file system is not in use by any process, use one of the following methods:
  • Issue the umount command followed by the mount command to remount the file system.(See the man page for both umount and mount for the specific syntax for mounting and unmounting various filesystem types.)
  • Issue the mount -o remount <file-system> command (where <file-system> is the name of the file system) to remount the file system. For example, to remount the /home file system, the command to issue is mount -o remount /home.
If the file system is currently in use, the easiest method for remounting the file system is to reboot the system.

9.1.3. Creating the Quota Database Files

After each quota-enabled file system is remounted, the system is capable of working with disk quotas. However, the file system itself is not yet ready to support quotas. The next step is to run the quotacheck command.
The quotacheck command examines quota-enabled file systems and builds a table of the current disk usage per file system. The table is then used to update the operating system's copy of disk usage. In addition, the file system's disk quota files are updated.
To create the quota files (aquota.user and aquota.group) on the file system, use the -c option of the quotacheck command. For example, if user and group quotas are enabled for the /home file system, create the files in the /home directory:
quotacheck -cug /home
The -c option specifies that the quota files should be created for each file system with quotas enabled, the -u option specifies to check for user quotas, and the -g option specifies to check for group quotas.
If neither the -u or -g options are specified, only the user quota file is created. If only -g is specified, only the group quota file is created.
After the files are created, run the following command to generate the table of current disk usage per file system with quotas enabled:
quotacheck -avug
The options used are as follows:
  • a — Check all quota-enabled, locally-mounted file systems
  • v — Display verbose status information as the quota check proceeds
  • u — Check user disk quota information
  • g — Check group disk quota information
After quotacheck has finished running, the quota files corresponding to the enabled quotas (user and/or group) are populated with data for each quota-enabled locally-mounted file system such as /home.

9.1.4. Assigning Quotas per User

The last step is assigning the disk quotas with the edquota command.
To configure the quota for a user, as root in a shell prompt, execute the command:
edquota username
Perform this step for each user who needs a quota. For example, if a quota is enabled in /etc/fstab for the /home partition (/dev/VolGroup00/LogVol02 in the example below) and the command edquota testuser is executed, the following is shown in the editor configured as the default for the system:
Disk quotas for user testuser (uid 501):
Filesystem                blocks     soft     hard    inodes   soft   hard
/dev/VolGroup00/LogVol02  440436        0        0     37418      0      0

Note

The text editor defined by the EDITOR environment variable is used by edquota. To change the editor, set the EDITOR environment variable in your ~/.bash_profile file to the full path of the editor of your choice.
The first column is the name of the file system that has a quota enabled for it. The second column shows how many blocks the user is currently using. The next two columns are used to set soft and hard block limits for the user on the file system. The inodes column shows how many inodes the user is currently using. The last two columns are used to set the soft and hard inode limits for the user on the file system.
The hard block limit is the absolute maximum amount of disk space that a user or group can use. Once this limit is reached, no further disk space can be used.
The soft block limit defines the maximum amount of disk space that can be used. However, unlike the hard limit, the soft limit can be exceeded for a certain amount of time. That time is known as the grace period. The grace period can be expressed in seconds, minutes, hours, days, weeks, or months.
If any of the values are set to 0, that limit is not set. In the text editor, change the desired limits. For example:
Disk quotas for user testuser (uid 501):
Filesystem                blocks     soft     hard   inodes   soft   hard
/dev/VolGroup00/LogVol02  440436   500000   550000    37418      0      0
To verify that the quota for the user has been set, use the command:
quota testuser

9.1.5. Assigning Quotas per Group

Quotas can also be assigned on a per-group basis. For example, to set a group quota for the devel group (the group must exist prior to setting the group quota), use the command:
edquota -g devel
This command displays the existing quota for the group in the text editor:
Disk quotas for group devel (gid 505):
Filesystem                blocks    soft     hard    inodes    soft    hard
/dev/VolGroup00/LogVol02  440400       0        0     37418       0       0
Modify the limits, then save the file.
To verify that the group quota has been set, use the command:
quota -g devel

9.1.6. Setting the Grace Period for Soft Limits

If soft limits are set for a given quota (whether inode or block and for either users or groups) the grace period, or amount of time a soft limit can be exceeded, should be set with the command:
edquota -t
While other edquota commands operate on a particular user's or group's quota, the -t option operates on every filesystem with quotas enabled.

9.2. Managing Disk Quotas

If quotas are implemented, they need some maintenance — mostly in the form of watching to see if the quotas are exceeded and making sure the quotas are accurate.
Of course, if users repeatedly exceed their quotas or consistently reach their soft limits, a system administrator has a few choices to make depending on what type of users they are and how much disk space impacts their work. The administrator can either help the user determine how to use less disk space or increase the user's disk quota.

9.2.1. Enabling and Disabling

It is possible to disable quotas without setting them to 0. To turn all user and group quotas off, use the following command:
quotaoff -vaug
If neither the -u or -g options are specified, only the user quotas are disabled. If only -g is specified, only group quotas are disabled. The -v switch causes verbose status information to display as the command executes.
To enable quotas again, use the quotaon command with the same options.
For example, to enable user and group quotas for all file systems, use the following command:
quotaon -vaug
To enable quotas for a specific file system, such as /home, use the following command:
quotaon -vug /home
If neither the -u or -g options are specified, only the user quotas are enabled. If only -g is specified, only group quotas are enabled.

9.2.2. Reporting on Disk Quotas

Creating a disk usage report entails running the repquota utility. For example, the command repquota /home produces this output:
*** Report for user quotas on device /dev/mapper/VolGroup00-LogVol02
Block grace time: 7days; Inode grace time: 7days
                        Block limits                File limits
User            used    soft    hard  grace    used  soft  hard  grace
----------------------------------------------------------------------
root      --      36       0       0              4     0     0
kristin   --     540       0       0            125     0     0
testuser  --  440400  500000  550000          37418     0     0
To view the disk usage report for all (option -a) quota-enabled file systems, use the command:
repquota -a
While the report is easy to read, a few points should be explained. The -- displayed after each user is a quick way to determine whether the block or inode limits have been exceeded. If either soft limit is exceeded, a + appears in place of the corresponding -; the first - represents the block limit, and the second represents the inode limit.
The grace columns are normally blank. If a soft limit has been exceeded, the column contains a time specification equal to the amount of time remaining on the grace period. If the grace period has expired, none appears in its place.

9.2.3. Keeping Quotas Accurate

Whenever a file system is not unmounted cleanly (due to a system crash, for example), it is necessary to run quotacheck. However, quotacheck can be run on a regular basis, even if the system has not crashe