Red Hat Training

A Red Hat training course is available for Red Hat Enterprise Linux

Chapter 7. File Systems

Read this chapter for an overview of the file systems supported for use with Red Hat Enterprise Linux, and how to optimize their performance.

7.1. Tuning Considerations for File Systems

There are several tuning considerations common to all file systems: formatting and mount options selected on your system, and actions available to applications that can improve their performance on a given system.

7.1.1. Formatting Options

File system block size

Block size can be selected at mkfs time. The range of valid sizes depends on the system: the upper limit is the maximum page size of the host system, while the lower limit depends on the file system used. The default block size is appropriate for most use cases.

If you expect to create many files smaller than the default block size, you can set a smaller block size to minimize the amount of space wasted on disk. Note, however, that setting a smaller block size may limit the maximum size of the file system, and can cause additional runtime overhead, particularly for files greater than the selected block size.
File system geometry

If your system uses striped storage such as RAID5, you can improve performance by aligning data and metadata with the underlying storage geometry at mkfs time. For software RAID (LVM or MD) and some enterprise hardware storage, this information is queried and set automatically, but in many cases the administrator must specify this geometry manually with mkfs at the command line.

Refer to the Storage Administration Guide for further information about creating and maintaining these file systems.
External journals

Metadata-intensive workloads mean that the log section of a journaling file system (such as ext4 and XFS) is updated extremely frequently. To minimize seek time from file system to journal, you can place the journal on dedicated storage. Note, however, that placing the journal on external storage that is slower than the primary file system can nullify any potential advantage associated with using external storage.

Warning

Ensure that your external journal is reliable. The loss of an external journal device will cause file system corruption.
External journals are created at mkfs time, with journal devices being specified at mount time. Refer to the mke2fs(8), mkfs.xfs(8), and mount(8) man pages for further information.

7.1.2. Mount Options

Barriers

A write barrier is a kernel mechanism used to ensure that file system metadata is correctly written and ordered on persistent storage, even when storage devices with volatile write caches lose power. File systems with write barriers enabled also ensure that any data transmitted via fsync() persists across a power outage. Red Hat Enterprise Linux enables barriers by default on all hardware that supports them.

However, enabling write barriers slows some applications significantly; specifically, applications that use fsync() heavily, or create and delete many small files. For storage with no volatile write cache, or in the rare case where file system inconsistencies and data loss after a power loss is acceptable, barriers can be disabled by using the nobarrier mount option. For further information, refer to the Storage Administration Guide.
Access Time (noatime)

Historically, when a file is read, the access time (atime) for that file must be updated in the inode metadata, which involves additional write I/O. If accurate atime metadata is not required, mount the file system with the noatime option to eliminate these metadata updates. In most cases, however, atime is not a large overhead due to the default relative atime (or relatime) behavior in the Red Hat Enterprise Linux 6 kernel. The relatime behavior only updates atime if the previous atime is older than the modification time (mtime) or status change time (ctime).

Note

Enabling the noatime option also enables nodiratime behavior; there is no need to set both noatime and nodiratime.
Increased read-ahead support

Read-ahead speeds up file access by pre-fetching data and loading it into the page cache so that it can be available earlier in memory instead of from disk. Some workloads, such as those involving heavy streaming of sequential I/O, benefit from high read-ahead values.

The tuned tool and the use of LVM striping elevate the read-ahead value, but this is not always sufficient for some workloads. Additionally, Red Hat Enterprise Linux is not always able to set an appropriate read-ahead value based on what it can detect of your file system. For example, if a powerful storage array presents itself to Red Hat Enterprise Linux as a single powerful LUN, the operating system will not treat it as a powerful LUN array, and therefore will not by default make full use of the read-ahead advantages potentially available to the storage.
Use the blockdev command to view and edit the read-ahead value. To view the current read-ahead value for a particular block device, run:
# blockdev -getra device
To modify the read-ahead value for that block device, run the following command. N represents the number of 512-byte sectors.
# blockdev -setra N device
Note that the value selected with the blockdev command will not persist between boots. We recommend creating a run level init.d script to set this value during boot.

7.1.3. File system maintenance

Discard unused blocks

Batch discard and online discard operations are features of mounted file systems that discard blocks which are not in use by the file system. These operations are useful for both solid-state drives and thinly-provisioned storage.

Batch discard operations are run explicitly by the user with the fstrim command. This command discards all unused blocks in a file system that match the user's criteria. Both operation types are supported for use with the XFS and ext4 file systems in Red Hat Enterprise Linux 6.2 and later as long as the block device underlying the file system supports physical discard operations. Physical discard operations are supported if the value of /sys/block/device/queue/discard_max_bytes is not zero.
Online discard operations are specified at mount time with the -o discard option (either in /etc/fstab or as part of the mount command), and run in realtime without user intervention. Online discard operations only discard blocks that are transitioning from used to free. Online discard operations are supported on ext4 file systems in Red Hat Enterprise Linux 6.2 and later, and on XFS file systems in Red Hat Enterprise Linux 6.4 and later.
Red Hat recommends batch discard operations unless the system's workload is such that batch discard is not feasible, or online discard operations are necessary to maintain performance.

7.1.4. Application Considerations

Pre-allocation

The ext4, XFS, and GFS2 file systems support efficient space pre-allocation via the fallocate(2) glibc call. In cases where files may otherwise become badly fragmented due to write patterns, leading to poor read performance, space preallocation can be a useful technique. Pre-allocation marks disk space as if it has been allocated to a file, without writing any data into that space. Until real data is written to a pre-allocated block, read operations will return zeroes.