Chapter 2. Recommendations for GFS2 usage

This section provides general recommendations about GFS2 usage.

2.1. Mount Options: noatime and nodiratime

It is generally recommended to mount GFS2 file systems with the noatime and nodiratime arguments. This allows GFS2 to spend less time updating disk inodes for every access. For more information on the effect of these arguments on GFS2 file system performance, see GFS2 Node Locking.

2.2. Configuring atime Updates

Each file inode and directory inode has three time stamps associated with it:

  • ctime — The last time the inode status was changed
  • mtime — The last time the file (or directory) data was modified
  • atime — The last time the file (or directory) data was accessed

If atime updates are enabled as they are by default on GFS2 and other Linux file systems then every time a file is read, its inode needs to be updated.

Because few applications use the information provided by atime, those updates can require a significant amount of unnecessary write traffic and file locking traffic. That traffic can degrade performance; therefore, it may be preferable to turn off or reduce the frequency of atime updates.

Two methods of reducing the effects of atime updating are available:

  • Mount with relatime (relative atime), which updates the atime if the previous atime update is older than the mtime or ctime update.
  • Mount with noatime, which disables atime updates on that file system.

Mount with relatime

The relatime (relative atime) Linux mount option can be specified when the file system is mounted. This specifies that the atime is updated if the previous atime update is older than the mtime or ctime update. Usage

mount  BlockDevice MountPoint -o relatime
BlockDevice
Specifies the block device where the GFS2 file system resides.
MountPoint
Specifies the directory where the GFS2 file system should be mounted.

Example In this example, the GFS2 file system resides on /dev/vg01/lvol0 and is mounted on directory /mygfs2. The atime updates take place only if the previous atime update is older than the mtime or ctime update.

# mount /dev/vg01/lvol0 /mygfs2 -o relatime

Mount with noatime

The noatime Linux mount option can be specified when the file system is mounted, which disables atime updates on that file system. Usage

mount BlockDevice MountPoint -o noatime
BlockDevice
Specifies the block device where the GFS2 file system resides.
MountPoint
Specifies the directory where the GFS2 file system should be mounted.

Example In this example, the GFS2 file system resides on /dev/vg01/lvol0 and is mounted on directory /mygfs2 with atime updates turned off.

# mount /dev/vg01/lvol0 /mygfs2 -o noatime

2.3. VFS tuning options: research and experiment

Like all Linux file systems, GFS2 sits on top of a layer called the virtual file system (VFS). You can tune the VFS layer to improve underlying GFS2 performance by using the sysctl(8) command. For example, the values for dirty_background_ratio and vfs_cache_pressure may be adjusted depending on your situation. To fetch the current values, use the following commands:

# sysctl -n vm.dirty_background_ratio
# sysctl -n vm.vfs_cache_pressure

The following commands adjust the values:

# sysctl -w vm.dirty_background_ratio=20
# sysctl -w vm.vfs_cache_pressure=500

You can permanently change the values of these parameters by editing the /etc/sysctl.conf file.

To find the optimal values for your use cases, research the various VFS options and experiment on a test cluster before deploying into full production.

2.4. SELinux on GFS2

Use of Security Enhanced Linux (SELinux) with GFS2 incurs a small performance penalty. To avoid this overhead, you may choose not to use SELinux with GFS2 even on a system with SELinux in enforcing mode. When mounting a GFS2 file system, you can ensure that SELinux will not attempt to read the seclabel element on each file system object by using one of the context options as described on the mount(8) man page; SELinux will assume that all content in the file system is labeled with the seclabel element provided in the context mount options. This will also speed up processing as it avoids another disk read of the extended attribute block that could contain seclabel elements.

For example, on a system with SELinux in enforcing mode, you can use the following mount command to mount the GFS2 file system if the file system is going to contain Apache content. This label will apply to the entire file system; it remains in memory and is not written to disk.

# mount -t gfs2 -o context=system_u:object_r:httpd_sys_content_t:s0 /dev/mapper/xyz/mnt/gfs2

If you are not sure whether the file system will contain Apache content, you can use the labels public_content_rw_t or public_content_t, or you could define a new label altogether and define a policy around it.

Note that in a Pacemaker cluster you should always use Pacemaker to manage a GFS2 file system. You can specify the mount options when you create a GFS2 file system resource.

2.5. Setting up NFS over GFS2

Due to the added complexity of the GFS2 locking subsystem and its clustered nature, setting up NFS over GFS2 requires taking many precautions and careful configuration. This section describes the caveats you should take into account when configuring an NFS service over a GFS2 file system.

Warning

If the GFS2 file system is NFS exported, and NFS client applications use POSIX locks, then you must mount the file system with the localflocks option. The effect of this is to force both POSIX locks and flocks from each server to be local: non-clustered, independent of each other. This is necessary because a number of problems exist if GFS2 attempts to implement POSIX locks from NFS across the nodes of a cluster. For applications running on NFS clients, localized POSIX locks means that two clients can hold the same lock concurrently if the two clients are mounting from different servers. For this reason, when using NFS over GFS2, it is always safest to specify the -o localflocks mount option so that NFS can coordinate both POSIX locks and the flocks among all clients mounting NFS.

For all other (non-NFS) GFS2 applications, do not mount your file system using localflocks, so that GFS2 will manage the POSIX locks and flocks between all the nodes in the cluster (on a cluster-wide basis). If you specify localflocks and do not use NFS, the other nodes in the cluster will not have knowledge of each other’s POSIX locks and flocks, thus making them unsafe in a clustered environment

In addition to the locking considerations, you should take the following into account when configuring an NFS service over a GFS2 file system.

  • Red Hat supports only Red Hat High Availability Add-On configurations using NFSv3 with locking in an active/passive configuration with the following characteristics:

    • The back-end file system is a GFS2 file system running on a 2 to 16 node cluster.
    • An NFSv3 server is defined as a service exporting the entire GFS2 file system from a single cluster node at a time.
    • The NFS server can fail over from one cluster node to another (active/passive configuration).
    • No access to the GFS2 file system is allowed except through the NFS server. This includes both local GFS2 file system access as well as access through Samba or Clustered Samba.
    • There is no NFS quota support on the system.

      This configuration provides High Availability (HA) for the file system and reduces system downtime since a failed node does not result in the requirement to execute the fsck command when failing the NFS server from one node to another.

  • The fsid= NFS option is mandatory for NFS exports of GFS2.
  • If problems arise with your cluster (for example, the cluster becomes inquorate and fencing is not successful), the clustered logical volumes and the GFS2 file system will be frozen and no access is possible until the cluster is quorate. You should consider this possibility when determining whether a simple failover solution such as the one defined in this procedure is the most appropriate for your system.

2.6. Samba (SMB or Windows) file serving over GFS2

You can use Samba (SMB or Windows) file serving from a GFS2 file system with CTDB, which allows active/active configurations.

Simultaneous access to the data in the Samba share from outside of Samba is not supported. There is currently no support for GFS2 cluster leases, which slows Samba file serving.

2.7. Configuring virtual machines for GFS2

When using a GFS2 file system with a virtual machine, it is important that your VM storage settings on each node be configured properly in order to force the cache off. For example, including these settings for cache and io in the libvirt domain should allow GFS2 to behave as expected.

<driver name='qemu' type='raw' cache='none' io='native'/>

Alternately, you can configure the shareable attribute within the device element. This indicates that the device is expected to be shared between domains (as long as hypervisor and OS support this). If shareable is used, cache='no' should be used for that device.

2.8. Block allocation

This section provides a summary of issues related to block allocation in GFS2 file systems. Even though applications that only write data typically do not care how or where a block is allocated, some knowledge of how block allocation works can help you optimize performance.

2.8.1. Leave free space in the file system

When a GFS2 file system is nearly full, the block allocator starts to have a difficult time finding space for new blocks to be allocated. As a result, blocks given out by the allocator tend to be squeezed into the end of a resource group or in tiny slices where file fragmentation is much more likely. This file fragmentation can cause performance problems. In addition, when a GFS2 file system is nearly full, the GFS2 block allocator spends more time searching through multiple resource groups, and that adds lock contention that would not necessarily be there on a file system that has ample free space. This also can cause performance problems.

For these reasons, it is recommended that you not run a file system that is more than 85 percent full, although this figure may vary depending on workload.

2.8.2. Have each node allocate its own files, if possible

When developing applications for use with GFS2 file systems, it is recommended that you have each node allocate it own files, if possible. Due to the way the distributed lock manager (DLM) works, there will be more lock contention if all files are allocated by one node and other nodes need to add blocks to those files.

With DLM, the first node to lock a resource (like a file) becomes the “lock master” for that lock. Other nodes may lock that resource, but they have to ask permission from the lock master first. Each node knows which locks for which it is the lock master, and each node knows which node it has lent a lock to. Locking a lock on the master node is much faster than locking one on another node that has to stop and ask permission from the lock’s master.

As in many file systems, the GFS2 allocator tries to keep blocks in the same file close to one another to reduce the movement of disk heads and boost performance. A node that allocates blocks to a file will likely need to use and lock the same resource groups for the new blocks (unless all the blocks in that resource group are in use). The file system will run faster if the lock master for the resource group containing the file allocates its data blocks (it is faster to have the node that first opened the file do all the writing of new blocks).

2.8.3. Preallocate, if possible

If files are preallocated, block allocations can be avoided altogether and the file system can run more efficiently. GFS2 includes the fallocate(1) system call, which you can use to preallocate blocks of data.