Global File System 2
Red Hat Global File System 2
Abstract
Introduction
1. Audience
- Linux system administration procedures, including kernel configuration
- Installation and configuration of shared storage networks, such as Fibre Channel SANs
2. Related Documentation
- Installation Guide — Documents relevant information regarding the installation of Red Hat Enterprise Linux 6.
- Deployment Guide — Documents relevant information regarding the deployment, configuration and administration of Red Hat Enterprise Linux 6.
- Storage Administration Guide — Provides instructions on how to effectively manage storage devices and file systems on Red Hat Enterprise Linux 6.
- High Availability Add-On Overview — Provides a high-level overview of the Red Hat High Availability Add-On.
- Cluster Administration — Provides information about installing, configuring and managing the High Availability Add-On.
- Logical Volume Manager Administration — Provides a description of the Logical Volume Manager (LVM), including information on running LVM in a clustered environment.
- DM Multipath — Provides information about using the Device-Mapper Multipath feature of Red Hat Enterprise Linux.
- Load Balancer Administration — Provides information on configuring high-performance systems and services with the Load Balancer Add-On, a set of integrated software components that provide Linux Virtual Servers (LVS) for balancing IP load across a set of real servers.
- Release Notes — Provides information about the current release of Red Hat products.
3. We Need Feedback!
rh-gfs2(EN)-6 (2017-3-8T15:15)
Chapter 1. GFS2 Overview
Note
Note
fsck.gfs2
command on a very large file system can take a long time and consume a large amount of memory. Additionally, in the event of a disk or disk-subsystem failure, recovery time is limited by the speed of your backup media. For information on the amount of memory the fsck.gfs2
command requires, see Section 4.11, “Repairing a File System”.
clvmd
, which manages LVM logical volumes in a cluster. The daemon makes it possible to use LVM2 to manage logical volumes across a cluster, allowing all nodes in the cluster to share the logical volumes. For information on the LVM volume manager, see Logical Volume Manager Administration
gfs2.ko
kernel module implements the GFS2 file system and is loaded on GFS2 cluster nodes.
Note
1.1. New and Changed Features
1.1.1. New and Changed Features for Red Hat Enterprise Linux 6.0
- For the Red Hat Enterprise Linux 6 release, Red Hat does not support the use of GFS2 as a single-node file system.
- For the Red Hat Enterprise Linux 6 release, the
gfs2_convert
command to upgrade from a GFS to a GFS2 file system has been enhanced. For information on this command, see Appendix B, Converting a File System from GFS to GFS2. - The Red Hat Enterprise Linux 6 release supports the
discard
,nodiscard
,barrier
,nobarrier
,quota_quantum
,statfs_quantum
, andstatfs_percent
mount options. For information about mounting a GFS2 file system, see Section 4.2, “Mounting a File System”. - The Red Hat Enterprise Linux 6 version of this document contains a new section, Section 2.9, “GFS2 Node Locking”. This section describes some of the internals of GFS2 file systems.
1.1.2. New and Changed Features for Red Hat Enterprise Linux 6.1
- As of the Red Hat Enterprise Linux 6.1 release, GFS2 supports the standard Linux quota facilities. GFS2 quota management is documented in Section 4.5, “GFS2 Quota Management”.For earlier releases of Red Hat Enterprise Linux, GFS2 required the
gfs2_quota
command to manage quotas. Documentation for thegfs2_quota
is now provided in Appendix A, GFS2 Quota Management with thegfs2_quota
Command. - This document now contains a new chapter, Chapter 5, Diagnosing and Correcting Problems with GFS2 File Systems.
- Small technical corrections and clarifications have been made throughout the document.
1.1.3. New and Changed Features for Red Hat Enterprise Linux 6.2
- As of the Red Hat Enterprise Linux 6.2 release, GFS2 supports the
tunegfs2
command, which replaces some of the features of thegfs2_tool
command. For further information, see thetunegfs2
man page.The following sections have been updated to provide administrative procedures that do not require the use of thegfs2_tool
command:- Section 4.5.4, “Synchronizing Quotas with the
quotasync
Command”. and Section A.3, “Synchronizing Quotas with thegfs2_quota
Command” now describe how to change thequota_quantum
parameter from its default value of 60 seconds by using thequota_quantum=
mount option. - Section 4.10, “Suspending Activity on a File System” now describes how to suspend write activity to a file system using the
dmsetup
command.suspend
- This document includes a new appendix, Appendix C, GFS2 tracepoints and the debugfs glocks File. This appendix describes the glock
debugfs
interface and the GFS2 tracepoints. It is intended for advanced users who are familiar with file system internals who would like to learn more about the design of GFS2 and how to debug GFS2-specific issues.
1.1.4. New and Changed Features for Red Hat Enterprise Linux 6.3
1.1.5. New and Changed Features for Red Hat Enterprise Linux 6.4
1.1.6. New and Changed Features for Red Hat Enterprise Linux 6.6
1.2. Before Setting Up GFS2
- GFS2 nodes
- Determine which nodes in the cluster will mount the GFS2 file systems.
- Number of file systems
- Determine how many GFS2 file systems to create initially. (More file systems can be added later.)
- File system name
- Determine a unique name for each file system. The name must be unique for all
lock_dlm
file systems over the cluster. Each file system name is required in the form of a parameter variable. For example, this book uses file system namesmydata1
andmydata2
in some example procedures. - Journals
- Determine the number of journals for your GFS2 file systems. One journal is required for each node that mounts a GFS2 file system. GFS2 allows you to add journals dynamically at a later point as additional servers mount a file system. For information on adding journals to a GFS2 file system, see Section 4.7, “Adding Journals to a File System”.
- Storage devices and partitions
- Determine the storage devices and partitions to be used for creating logical volumes (by means of CLVM) in the file systems.
Note
1.3. Installing GFS2
gfs2-utils
package for GFS2 and the lvm2-cluster
package for the Clustered Logical Volume Manager (CLVM). The lvm2-cluster
and gfs2-utils
packages are part of ResilientStorage channel, which must be enabled before installing the packages.
yum install
command to install the Red Hat High Availability Add-On software packages:
# yum install rgmanager lvm2-cluster gfs2-utils
1.4. Differences between GFS and GFS2
gfs2_convert
utility. For information on the gfs2_convert
utility, see Appendix B, Converting a File System from GFS to GFS2.
1.4.1. GFS2 Command Names
Table 1.1. GFS and GFS2 Commands
GFS Command | GFS2 Command | Description | ||||
---|---|---|---|---|---|---|
mount | mount | Mount a file system. The system can determine whether the file system is a GFS or GFS2 file system type. For information on the GFS2 mount options see the gfs2_mount(8) man page. | ||||
umount | umount | Unmount a file system. | ||||
|
| Check and repair an unmounted file system. | ||||
gfs_grow | gfs2_grow | Grow a mounted file system. | ||||
gfs_jadd | gfs2_jadd | Add a journal to a mounted file system. | ||||
|
| Create a file system on a storage device. | ||||
gfs_quota | gfs2_quota | Manage quotas on a mounted file system. As of the Red Hat Enterprise Linux 6.1 release, GFS2 supports the standard Linux quota facilities. For further information on quota management in GFS2, see Section 4.5, “GFS2 Quota Management”. | ||||
gfs_tool | tunegfs2
mount parameters
dmsetup suspend
| Configure, tune, or gather information about a file system. The tunegfs2 command is supported as of the Red Hat Enterprise Linux 6.2 release. There is also a gfs2_tool command. | ||||
gfs_edit | gfs2_edit | Display, print, or edit file system internal structures. The gfs2_edit command can be used for GFS file systems as well as GFS2 file system. | ||||
gfs_tool setflag jdata/inherit_jdata | chattr +j (preferred) | Enable journaling on a file or directory. | ||||
setfacl/getfacl | setfacl/getfacl | Set or get file access control list for a file or directory. | ||||
setfattr/getfattr | setfattr/getfattr | Set or get the extended attributes of a file. |
1.4.2. Additional Differences Between GFS and GFS2
Context-Dependent Path Names
bind
option of the mount
command. For information on bind mounts and context-dependent pathnames in GFS2, see Section 4.12, “Bind Mounts and Context-Dependent Path Names”.
gfs2.ko Module
gfs.ko
. The kernel module that implements the GFS2 file system is gfs2.ko
.
Enabling Quota Enforcement in GFS2
Data Journaling
chattr
command to set and clear the j
flag on a file or directory. Setting the +j
flag on a file enables data journaling on that file. Setting the +j
flag on a directory means "inherit jdata", which indicates that all files and directories subsequently created in that directory are journaled. Using the chattr
command is the preferred way to enable and disable data journaling on a file.
Adding Journals Dynamically
atime_quantum parameter removed
atime_quantum
tunable parameter, which can be used by the GFS file system to specify how often atime
updates occur. In its place GFS2 supports the relatime
and noatime
mount options. The relatime
mount option is recommended to achieve similar behavior to setting the atime_quantum
parameter in GFS.
The data= option of the mount command
data=ordered
or data=writeback
option of the mount
. When data=ordered
is set, the user data modified by a transaction is flushed to the disk before the transaction is committed to disk. This should prevent the user from seeing uninitialized blocks in a file after a crash. When data=writeback
is set, the user data is written to the disk at any time after it is dirtied. This does not provide the same consistency guarantee as ordered
mode, but it should be slightly faster for some workloads. The default is ordered
mode.
The gfs2_tool command
gfs2_tool
command supports a different set of options for GFS2 than the gfs_tool
command supports for GFS:
- The
gfs2_tool
command supports ajournals
parameter that prints out information about the currently configured journals, including how many journals the file system contains. - The
gfs2_tool
command does not support thecounters
flag, which thegfs_tool
command uses to display GFS statistics. - The
gfs2_tool
command does not support theinherit_jdata
flag. To flag a directory as "inherit jdata", you can set thejdata
flag on the directory or you can use thechattr
command to set the+j
flag on the directory. Using thechattr
command is the preferred way to enable and disable data journaling on a file.
Note
tunegfs2
command, which replaces some of the features of the gfs2_tool
command. For further information, refer to the tunegfs2
(8) man page. The settune
and gettune
functions of the gfs2_tool
command have been replaced by command line options of the mount
command, which allows them to be set by means of the fstab
file when required.
The gfs2_edit command
gfs2_edit
command supports a different set of options for GFS2 than the gfs_edit
command supports for GFS. For information on the specific options each version of the command supports, see the gfs2_edit
and gfs_edit
man pages.
1.4.3. GFS2 Performance Improvements
- Better performance for heavy usage in a single directory
- Faster synchronous I/O operations
- Faster cached reads (no locking overhead)
- Faster direct I/O with preallocated files (provided I/O size is reasonably large, such as 4M blocks)
- Faster I/O operations in general
- Faster execution of the
df
command, because of fasterstatfs
calls - Improved
atime
mode to reduce the number of write I/O operations generated byatime
when compared with GFS
- GFS2 is part of the upstream kernel (integrated into 2.6.19).
- GFS2 supports the following features.
- extended file attributes (
xattr
) - the
lsattr
() andchattr
() attribute settings by means of standardioctl
() calls - nanosecond timestamps
- GFS2 uses less kernel memory.
- GFS2 requires no metadata generation numbers.Allocating GFS2 metadata does not require reads. Copies of metadata blocks in multiple journals are managed by revoking blocks from the journal before lock release.
- GFS2 includes a much simpler log manager that knows nothing about unlinked inodes or quota changes.
- The
gfs2_grow
andgfs2_jadd
commands use locking to prevent multiple instances running at the same time. - The ACL code has been simplified for calls like
creat
() andmkdir
(). - Unlinked inodes, quota changes, and
statfs
changes are recovered without remounting the journal.
Chapter 2. GFS2 Configuration and Operational Considerations
Important
2.1. Formatting Considerations
2.1.1. File System Size: Smaller is Better
- Less time is required to back up each file system.
- Less time is required if you need to check the file system with the
fsck.gfs2
command. - Less memory is required if need to check the file system with the
fsck.gfs2
command.
2.1.2. Block Size: Default (4K) Blocks Are Preferred
mkfs.gfs2
command attempts to estimate an optimal block size based on device topology. In general, 4K blocks are the preferred block size because 4K is the default page size (memory) for Linux. Unlike some other file systems, GFS2 does most of its operations using 4K kernel buffers. If your block size is 4K, the kernel has to do less work to manipulate the buffers.
2.1.3. Number of Journals: One for Each Node that Mounts
gfs2_jadd
command. With GFS2, you can add journals on the fly.
2.1.4. Journal Size: Default (128MB) Is Usually Optimal
mkfs.gfs2
command to create a GFS2 file system, you may specify the size of the journals. If you do not specify a size, it will default to 128MB, which should be optimal for most applications.
2.1.5. Size and Number of Resource Groups
mkfs.gfs2
command, it divides the storage into uniform slices known as resource groups. It attempts to estimate an optimal resource group size (ranging from 32MB to 2GB). You can override the default with the -r
option of the mkfs.gfs2
command.
- First, when a resource group is completely full, it remembers that and tries to avoid checking it for future allocations (until a block is freed from it). If you never delete files, contention will be less severe. However, if your application is constantly deleting blocks and allocating new blocks on a file system that is mostly full, contention will be very high and this will severely impact performance.
- Second, when new blocks are added to an existing file (for example, appending) GFS2 will attempt to group the new blocks together in the same resource group as the file. This is done to increase performance: on a spinning disk, seeks take less time when they are physically close together.
2.2. File System Fragmentation
2.3. Block Allocation Issues
2.3.1. Leave Free Space in the File System
2.3.2. Have Each Node Allocate its Own Files, If Possible
2.3.3. Preallocate, If Possible
fallocate
(1) system call, which you can use to preallocate blocks of data.
2.4. Cluster Considerations
2.5. Usage Considerations
2.5.1. Mount Options: noatime and nodiratime
noatime
and nodiratime
arguments. This allows GFS2 to spend less time updating disk inodes for every access.
2.5.2. DLM Tuning Options: Increase DLM Table Sizes
echo 1024 > /sys/kernel/config/dlm/cluster/lkbtbl_size echo 1024 > /sys/kernel/config/dlm/cluster/rsbtbl_size echo 1024 > /sys/kernel/config/dlm/cluster/dirtbl_size
2.5.3. VFS Tuning Options: Research and Experiment
sysctl
(8) command. For example, the values for dirty_background_ratio
and vfs_cache_pressure
may be adjusted depending on your situation. To fetch the current values, use the following commands:
sysctl -n vm.dirty_background_ratio sysctl -n vm.vfs_cache_pressure
sysctl -w vm.dirty_background_ratio=20 sysctl -w vm.vfs_cache_pressure=500
/etc/sysctl.conf
file.
2.5.4. SELinux: Avoid SELinux on GFS2
seclabel
element on each file system object by using one of the context
options as described on the mount
(8) man page; SELinux will assume that all content in the file system is labeled with the seclabel
element provided in the context
mount options. This will also speed up processing as it avoids another disk read of the extended attribute block that could contain seclabel
elements.
mount
command to mount the GFS2 file system if the file system is going to contain Apache content. This label will apply to the entire file system; it remains in memory and is not written to disk.
# mount -t gfs2 -o context=system_u:object_r:httpd_sys_content_t:s0 /dev/mapper/xyz/mnt/gfs2
# mount -t gfs2 -o context=system_u:object_r:httpd_sys_content_t:s0 /dev/mapper/xyz/mnt/gfs2
public_content_rw_t
or public_content_t
, or you could define a new label altogether and define a policy around it.
2.5.5. Setting Up NFS Over GFS2
Warning
localflocks
option. The intended effect of this is to force POSIX locks from each server to be local: that is, non-clustered, independent of each other. (A number of problems exist if GFS2 attempts to implement POSIX locks from NFS across the nodes of a cluster.) For applications running on NFS clients, localized POSIX locks means that two clients can hold the same lock concurrently if the two clients are mounting from different servers. If all clients mount NFS from one server, then the problem of separate servers granting the same locks independently goes away. If you are not sure whether to mount your file system with the localflocks
option, you should not use the option; it is always safer to have the locks working on a clustered basis.
- Red Hat supports only Red Hat High Availability Add-On configurations using NFSv3 with locking in an active/passive configuration with the following characteristics:
- The back-end file system is a GFS2 file system running on a 2 to 16 node cluster.
- An NFSv3 server is defined as a service exporting the entire GFS2 file system from a single cluster node at a time.
- The NFS server can fail over from one cluster node to another (active/passive configuration).
- No access to the GFS2 file system is allowed except through the NFS server. This includes both local GFS2 file system access as well as access through Samba or Clustered Samba.
- There is no NFS quota support on the system.
This configuration provides HA for the file system and reduces system downtime since a failed node does not result in the requirement to execute thefsck
command when failing the NFS server from one node to another. - The
fsid=
NFS option is mandatory for NFS exports of GFS2. - If problems arise with your cluster (for example, the cluster becomes inquorate and fencing is not successful), the clustered logical volumes and the GFS2 file system will be frozen and no access is possible until the cluster is quorate. You should consider this possibility when determining whether a simple failover solution such as the one defined in this procedure is the most appropriate for your system.
2.5.6. Samba (SMB or Windows) File Serving over GFS2
2.6. File System Backups
echo -n 3 > /proc/sys/vm/drop_caches
rsync
command on node-specific directories.
-o lockproto=lock_nolock
since it will not be in a cluster.
2.7. Hardware Considerations
- Use Higher-Quality Storage OptionsGFS2 can operate on cheaper shared-storage options, such as iSCSI or Fibre Channel over Ethernet (FCoE), but you will get better performance if you buy higher-quality storage with larger caching capacity. Red Hat performs most quality, sanity, and performance tests on SAN storage with Fibre Channel interconnect. As a general rule, it is always better to deploy something that has been tested first.
- Test Network Equipment Before DeployingHigher-quality, faster-network equipment makes cluster communications and GFS2 run faster with better reliability. However, you do not have to purchase the most expensive hardware. Some of the most expensive network switches have problems passing multicast packets, which are used for passing
fcntl
locks (flocks), whereas cheaper commodity network switches are sometimes faster and more reliable. It is a general best practice to try equipment before deploying it into full production.
2.8. Performance Issues: Check the Red Hat Customer Portal
2.9. GFS2 Node Locking
write
system call).
Note
- An inode is used in a read only fashion across all nodes.
- An inode is written or modified from a single node only.
mmap
() a file on GFS2 with a read/write mapping, but only read from it, this only counts as a read. On GFS though, it counts as a write, so GFS2 is much more scalable with mmap
() I/O.
noatime
mount
parameter, then reads will also result in writes to update the file timestamps. We recommend that all GFS2 users should mount with noatime
unless they have a specific requirement for atime
.
2.9.1. Issues with Posix Locking
- Use of Flocks will yield faster processing than use of Posix locks.
- Programs using Posix locks in GFS2 should avoid using the
GETLK
function since, in a clustered environment, the process ID may be for a different node in the cluster.
2.9.2. Performance Tuning With GFS2
mbox
), or with a directory for each user containing a file for each message (maildir
). When requests arrive over IMAP, the ideal arrangement is to give each user an affinity to a particular node. That way their requests to view and delete email messages will tend to be served from the cache on that one node. Obviously if that node fails, then the session can be restarted on a different node.
imap
or smtp
.
echo -n 3 >/proc/sys/vm/drop_caches
2.9.3. Troubleshooting GFS2 Performance with the GFS2 Lock Dump
debugfs
file which can be found at the following path name, assuming that debugfs
is mounted on /sys/kernel/debug/
:
/sys/kernel/debug/gfs2/fsname/glocks
debugfs
file is to use the cat
command to take a copy of the complete content of the file (it might take a long time if you have a large amount of RAM and a lot of cached inodes) while the application is experiencing problems, and then looking through the resulting data at a later date.
Note
debugfs
file, one a few seconds or even a minute or two after the other. By comparing the holder information in the two traces relating to the same glock number, you can tell whether the workload is making progress (that is, it is just slow) or whether it has become stuck (which is always a bug and should be reported to Red Hat support immediately).
debugfs
file starting with H: (holders) represent lock requests either granted or waiting to be granted. The flags field on the holders line f: shows which: The 'W' flag refers to a waiting request, the 'H' flag refers to a granted request. The glocks which have large numbers of waiting requests are likely to be those which are experiencing particular contention.
Table 2.1. Glock flags
Flag | Name | Meaning |
---|---|---|
b | Blocking | Valid when the locked flag is set, and indicates that the operation that has been requested from the DLM may block. This flag is cleared for demotion operations and for "try" locks. The purpose of this flag is to allow gathering of stats of the DLM response time independent from the time taken by other nodes to demote locks. |
d | Pending demote | A deferred (remote) demote request |
D | Demote | A demote request (local or remote) |
f | Log flush | The log needs to be committed before releasing this glock |
F | Frozen | Replies from remote nodes ignored - recovery is in progress. This flag is not related to file system freeze, which uses a different mechanism, but is used only in recovery. |
i | Invalidate in progress | In the process of invalidating pages under this glock |
I | Initial | Set when DLM lock is associated with this glock |
l | Locked | The glock is in the process of changing state |
L | LRU | Set when the glock is on the LRU list |
o | Object | Set when the glock is associated with an object (that is, an inode for type 2 glocks, and a resource group for type 3 glocks) |
p | Demote in progress | The glock is in the process of responding to a demote request |
q | Queued | Set when a holder is queued to a glock, and cleared when the glock is held, but there are no remaining holders. Used as part of the algorithm the calculates the minimum hold time for a glock. |
r | Reply pending | Reply received from remote node is awaiting processing |
y | Dirty | Data needs flushing to disk before releasing this glock |
Table 2.2. Glock holder flags
Flag | Name | Meaning |
---|---|---|
a | Async | Do not wait for glock result (will poll for result later) |
A | Any | Any compatible lock mode is acceptable |
c | No cache | When unlocked, demote DLM lock immediately |
e | No expire | Ignore subsequent lock cancel requests |
E | exact | Must have exact lock mode |
F | First | Set when holder is the first to be granted for this lock |
H | Holder | Indicates that requested lock is granted |
p | Priority | Enqueue holder at the head of the queue |
t | Try | A "try" lock |
T | Try 1CB | A "try" lock that sends a callback |
W | Wait | Set while waiting for request to complete |
find -inum number
where number is the inode number converted from the hex format in the glocks file into decimal.
Note
find
on a file system when it is experiencing lock contention, you are likely to make the problem worse. It is a good idea to stop the application before running the find
when you are looking for contended inodes.
Table 2.3. Glock types
Type number | Lock type | Use |
---|---|---|
1 | Trans | Transaction lock |
2 | Inode | Inode metadata and data |
3 | Rgrp | Resource group metadata |
4 | Meta | The superblock |
5 | Iopen | Inode last closer detection |
6 | Flock | flock (2) syscall |
8 | Quota | Quota operations |
9 | Journal | Journal mutex |
gfs2_grow
command to expand the file system.
Chapter 3. Getting Started
3.1. Prerequisite Tasks
- Make sure that you have noted the key characteristics of the GFS2 nodes (see Section 1.2, “Before Setting Up GFS2”).
- Make sure that the clocks on the GFS2 nodes are synchronized. It is recommended that you use the Network Time Protocol (NTP) software provided with your Red Hat Enterprise Linux distribution.
Note
The system clocks in GFS2 nodes must be within a few minutes of each other to prevent unnecessary inode time-stamp updating. Unnecessary inode time-stamp updating severely impacts cluster performance. - In order to use GFS2 in a clustered environment, you must configure your system to use the Clustered Logical Volume Manager (CLVM), a set of clustering extensions to the LVM Logical Volume Manager. In order to use CLVM, the Red Hat Cluster Suite software, including the
clvmd
daemon, must be running. For information on using CLVM, see Logical Volume Manager Administration. For information on installing and administering Red Hat Cluster Suite, see Cluster Administration.
3.2. Initial Setup Tasks
- Setting up logical volumes.
- Making a GFS2 files system.
- Mounting file systems.
- Using LVM, create a logical volume for each Red Hat GFS2 file system.
Note
You can useinit.d
scripts included with Red Hat Cluster Suite to automate activating and deactivating logical volumes. For more information aboutinit.d
scripts, see Configuring and Managing a Red Hat Cluster. - Create GFS2 file systems on logical volumes created in Step 1. Choose a unique name for each file system. For more information about creating a GFS2 file system, see Section 4.1, “Making a File System”.You can use either of the following formats to create a clustered GFS2 file system:
mkfs.gfs2 -p lock_dlm -t
ClusterName:FSName
-jNumberJournals BlockDevice
mkfs -t gfs2 -p lock_dlm -t
LockTableName
-jNumberJournals BlockDevice
For more information on creating a GFS2 file system, see Section 4.1, “Making a File System”. - At each node, mount the GFS2 file systems. For more information about mounting a GFS2 file system, see Section 4.2, “Mounting a File System”.Command usage:
mount BlockDevice MountPoint
mount -o acl BlockDevice MountPoint
The
mount option allows manipulating file ACLs. If a file system is mounted without the-o
acl
mount option, users are allowed to view ACLs (with-o
aclgetfacl
), but are not allowed to set them (withsetfacl
).Note
You can useinit.d
scripts included with the Red Hat High Availability Add-On to automate mounting and unmounting GFS2 file systems.
Chapter 4. Managing GFS2
4.1. Making a File System
mkfs.gfs2
command. You can also use the mkfs
command with the -t gfs2
option specified. A file system is created on an activated LVM volume. The following information is required to run the mkfs.gfs2
command:
- Lock protocol/module name (the lock protocol for a cluster is
lock_dlm
) - Cluster name (when running as part of a cluster configuration)
- Number of journals (one journal required for each node that may be mounting the file system)
mkfs.gfs2
command directly, or you can use the mkfs
command with the -t
parameter specifying a file system of type gfs2
, followed by the gfs2 file system options.
Note
mkfs.gfs2
command, you cannot decrease the size of the file system. You can, however, increase the size of an existing file system with the gfs2_grow
command, as described in Section 4.6, “Growing a File System”.
Usage
mkfs.gfs2 -pLockProtoName
-tLockTableName
-jNumberJournals BlockDevice
mkfs -t gfs2 -pLockProtoName
-tLockTableName
-jNumberJournals BlockDevice
Note
mkfs.gfs2 -pLockProtoName
-jNumberJournals BlockDevice
mkfs -t gfs2 -pLockProtoName
-jNumberJournals BlockDevice
Warning
LockProtoName
and LockTableName
parameters. Improper use of the LockProtoName
and LockTableName
parameters may cause file system or lock space corruption.
LockProtoName
- Specifies the name of the locking protocol to use. The lock protocol for a cluster is
lock_dlm
. LockTableName
- This parameter is specified for GFS2 file system in a cluster configuration. It has two parts separated by a colon (no spaces) as follows:
ClusterName:FSName
ClusterName
, the name of the cluster for which the GFS2 file system is being created.FSName
, the file system name, can be 1 to 16 characters long. The name must be unique for alllock_dlm
file systems over the cluster, and for all file systems (lock_dlm
andlock_nolock
) on each local node.
Number
- Specifies the number of journals to be created by the
mkfs.gfs2
command. One journal is required for each node that mounts the file system. For GFS2 file systems, more journals can be added later without growing the file system, as described in Section 4.7, “Adding Journals to a File System”. BlockDevice
- Specifies a logical or physical volume.
Examples
lock_dlm
is the locking protocol that the file system uses, since this is a clustered file system. The cluster name is alpha
, and the file system name is mydata1
. The file system contains eight journals and is created on /dev/vg01/lvol0
.
mkfs.gfs2 -p lock_dlm -t alpha:mydata1 -j 8 /dev/vg01/lvol0
mkfs -t gfs2 -p lock_dlm -t alpha:mydata1 -j 8 /dev/vg01/lvol0
lock_dlm
file system is made, which can be used in cluster alpha
. The file system name is mydata2
. The file system contains eight journals and is created on /dev/vg01/lvol1
.
mkfs.gfs2 -p lock_dlm -t alpha:mydata2 -j 8 /dev/vg01/lvol1
mkfs -t gfs2 -p lock_dlm -t alpha:mydata2 -j 8 /dev/vg01/lvol1
Complete Options
mkfs.gfs2
” describes the mkfs.gfs2
command options (flags and parameters).
Table 4.1. Command Options: mkfs.gfs2
Flag | Parameter | Description | |||||||
---|---|---|---|---|---|---|---|---|---|
-c | Megabytes | Sets the initial size of each journal's quota change file to Megabytes . | |||||||
-D | Enables debugging output. | ||||||||
-h | Help. Displays available options. | ||||||||
-J | MegaBytes | Specifies the size of the journal in megabytes. Default journal size is 128 megabytes. The minimum size is 8 megabytes. Larger journals improve performance, although they use more memory than smaller journals. | |||||||
-j | Number | Specifies the number of journals to be created by the mkfs.gfs2 command. One journal is required for each node that mounts the file system. If this option is not specified, one journal will be created. For GFS2 file systems, you can add additional journals at a later time without growing the file system. | |||||||
-O | Prevents the mkfs.gfs2 command from asking for confirmation before writing the file system. | ||||||||
-p | LockProtoName |
| |||||||
-q | Quiet. Do not display anything. | ||||||||
-r | MegaBytes | Specifies the size of the resource groups in megabytes. The minimum resource group size is 32 MB. The maximum resource group size is 2048 MB. A large resource group size may increase performance on very large file systems. If this is not specified, mkfs.gfs2 chooses the resource group size based on the size of the file system: average size file systems will have 256 MB resource groups, and bigger file systems will have bigger RGs for better performance. | |||||||
-t | LockTableName |
| |||||||
-u | MegaBytes | Specifies the initial size of each journal's unlinked tag file. | |||||||
-V | Displays command version information. |
4.2. Mounting a File System
Note
cman
) has not been started produces the following error message:
[root@gfs-a24c-01 ~]# mount -t gfs2 -o noatime /dev/mapper/mpathap1 /mnt
gfs_controld join connect error: Connection refused
error mounting lockproto lock_dlm
-o acl
mount option. If a file system is mounted without the -o acl
mount option, users are allowed to view ACLs (with getfacl
), but are not allowed to set them (with setfacl
).
Usage
mount BlockDevice MountPoint
mount -o acl BlockDevice MountPoint
-o acl
- GFS2-specific option to allow manipulating file ACLs.
BlockDevice
- Specifies the block device where the GFS2 file system resides.
MountPoint
- Specifies the directory where the GFS2 file system should be mounted.
Example
/dev/vg01/lvol0
is mounted on the /mygfs2
directory.
mount /dev/vg01/lvol0 /mygfs2
Complete Usage
mountBlockDevice MountPoint
-ooption
-o option
argument consists of GFS2-specific options (see Table 4.2, “GFS2-Specific Mount Options”) or acceptable standard Linux mount -o
options, or a combination of both. Multiple option
parameters are separated by a comma and no spaces.
Note
mount
command is a Linux system command. In addition to using GFS2-specific options described in this section, you can use other, standard, mount
command options (for example, -r
). For information about other Linux mount
command options, see the Linux mount
man page.
-o option
values that can be passed to GFS2 at mount time.
Note
Table 4.2. GFS2-Specific Mount Options
Option | Description | ||
---|---|---|---|
acl | Allows manipulating file ACLs. If a file system is mounted without the acl mount option, users are allowed to view ACLs (with getfacl ), but are not allowed to set them (with setfacl ). | ||
data=[ordered|writeback] | When data=ordered is set, the user data modified by a transaction is flushed to the disk before the transaction is committed to disk. This should prevent the user from seeing uninitialized blocks in a file after a crash. When data=writeback mode is set, the user data is written to the disk at any time after it is dirtied; this does not provide the same consistency guarantee as ordered mode, but it should be slightly faster for some workloads. The default value is ordered mode. | ||
| Forces GFS2 to treat the file system as a multihost file system. By default, using lock_nolock automatically turns on the localflocks flag. | ||
| Tells GFS2 to let the VFS (virtual file system) layer do all flock and fcntl. The localflocks flag is automatically turned on by lock_nolock . | ||
lockproto= LockModuleName | Allows the user to specify which locking protocol to use with the file system. If LockModuleName is not specified, the locking protocol name is read from the file system superblock. | ||
locktable= LockTableName | Allows the user to specify which locking table to use with the file system. | ||
quota=[off/account/on] | Turns quotas on or off for a file system. Setting the quotas to be in the account state causes the per UID/GID usage statistics to be correctly maintained by the file system; limit and warn values are ignored. The default value is off . | ||
errors=panic|withdraw | When errors=panic is specified, file system errors will cause a kernel panic. The default behavior, which is the same as specifying errors=withdraw , is for the system to withdraw from the file system and make it inaccessible until the next reboot; in some cases the system may remain running. For information on the GFS2 withdraw function, see Section 4.14, “The GFS2 Withdraw Function”. | ||
discard/nodiscard | Causes GFS2 to generate "discard" I/O requests for blocks that have been freed. These can be used by suitable hardware to implement thin provisioning and similar schemes. | ||
barrier/nobarrier | Causes GFS2 to send I/O barriers when flushing the journal. The default value is on . This option is automatically turned off if the underlying device does not support I/O barriers. Use of I/O barriers with GFS2 is highly recommended at all times unless the block device is designed so that it cannot lose its write cache content (for example, if it is on a UPS or it does not have a write cache). | ||
quota_quantum=secs | Sets the number of seconds for which a change in the quota information may sit on one node before being written to the quota file. This is the preferred way to set this parameter. The value is an integer number of seconds greater than zero. The default is 60 seconds. Shorter settings result in faster updates of the lazy quota information and less likelihood of someone exceeding their quota. Longer settings make file system operations involving quotas faster and more efficient. | ||
statfs_quantum=secs | Setting statfs_quantum to 0 is the preferred way to set the slow version of statfs . The default value is 30 secs which sets the maximum time period before statfs changes will be synced to the master statfs file. This can be adjusted to allow for faster, less accurate statfs values or slower more accurate values. When this option is set to 0, statfs will always report the true values. | ||
statfs_percent=value | Provides a bound on the maximum percentage change in the statfs information on a local basis before it is synced back to the master statfs file, even if the time period has not expired. If the setting of statfs_quantum is 0, then this setting is ignored. |
4.3. Unmounting a File System
umount
command.
Note
umount
command is a Linux system command. Information about this command can be found in the Linux umount
command man pages.
Usage
umount MountPoint
MountPoint
- Specifies the directory where the GFS2 file system is currently mounted.
4.4. Special Considerations when Mounting GFS2 File Systems
fstab
file will not be known to the system when file systems are unmounted at system shutdown. As a result, the GFS2 script will not unmount the GFS2 file system. After the GFS2 shutdown script is run, the standard shutdown process kills off all remaining user processes, including the cluster infrastructure, and tries to unmount the file system. This unmount will fail without the cluster infrastructure and the system will hang.
- Always use an entry in the
fstab
file to mount the GFS2 file system. - If a GFS2 file system has been mounted manually with the
mount
command, be sure to unmount the file system manually with theumount
command before rebooting or shutting down the system.
4.5. GFS2 Quota Management
quota=on
or quota=account
option, GFS2 keeps track of the space used by each user and group even when there are no limits in place. GFS2 updates quota information in a transactional way so system crashes do not require quota usages to be reconstructed.
Note
gfs2_quota
command to manage quotas. For information on using the gfs2_quota
command, see Appendix A, GFS2 Quota Management with the gfs2_quota
Command.
4.5.1. Configuring Disk Quotas
- Set up quotas in enforcement or accounting mode.
- Initialize the quota database file with current block usage information.
- Assign quota policies. (In accounting mode, these policies are not enforced.)
4.5.1.1. Setting Up Quotas in Enforcement or Accounting Mode
quota=on
option specified.
quota=account
option specified.
Usage
quota=on
option specified.
mount -o quota=on BlockDevice MountPoint
quota=account
option specified.
mount -o quota=account BlockDevice MountPoint
quota=off
option specified. This is the default setting.
mount -o quota=off BlockDevice MountPoint
quota={on|off|account}
on
- Specifies that quotas are enabled when the file system is mounted.off
- Specifies that quotas are disabled when the file system is mounted.account
- Specifies that user and group usage statistics are maintained by the file system, even though the quota limits are not enforced.BlockDevice
- Specifies the block device where the GFS2 file system resides.
MountPoint
- Specifies the directory where the GFS2 file system should be mounted.
Examples
/dev/vg01/lvol0
is mounted on the /mygfs2
directory with quotas enabled.
mount -o quota=on /dev/vg01/lvol0 /mygfs2
/dev/vg01/lvol0
is mounted on the /mygfs2
directory with quota accounting maintained, but not enforced.
mount -o quota=account /dev/vg01/lvol0 /mygfs2
4.5.1.2. Creating the Quota Database Files
quotacheck
command.
quotacheck
command examines quota-enabled file systems and builds a table of the current disk usage per file system. The table is then used to update the operating system's copy of disk usage. In addition, the file system's disk quota files are updated.
-u
and the -g
options of the quotacheck
command; both of these options must be specified for user and group quotas to be initialized. For example, if quotas are enabled for the /home
file system, create the files in the /home
directory:
quotacheck -ug /home
4.5.1.3. Assigning Quotas per User
edquota
command. Note that if you have mounted your file system in accounting mode (with the quota=account
option specified), the quotas are not enforced.
edquota username
/etc/fstab
for the /home
partition (/dev/VolGroup00/LogVol02
in the example below) and the command edquota testuser
is executed, the following is shown in the editor configured as the default for the system:
Disk quotas for user testuser (uid 501): Filesystem blocks soft hard inodes soft hard /dev/VolGroup00/LogVol02 440436 0 0
Note
EDITOR
environment variable is used by edquota
. To change the editor, set the EDITOR
environment variable in your ~/.bash_profile
file to the full path of the editor of your choice.
Disk quotas for user testuser (uid 501): Filesystem blocks soft hard inodes soft hard /dev/VolGroup00/LogVol02 440436 500000 550000
quota testuser
4.5.1.4. Assigning Quotas per Group
account=on
option specified), the quotas are not enforced.
devel
group (the group must exist prior to setting the group quota), use the following command:
edquota -g devel
Disk quotas for group devel (gid 505): Filesystem blocks soft hard inodes soft hard /dev/VolGroup00/LogVol02 440400 0 0
quota -g devel
4.5.2. Managing Disk Quotas
repquota
utility. For example, the command repquota /home
produces this output:
*** Report for user quotas on device /dev/mapper/VolGroup00-LogVol02 Block grace time: 7days; Inode grace time: 7days Block limits File limits User used soft hard grace used soft hard grace ---------------------------------------------------------------------- root -- 36 0 0 4 0 0 kristin -- 540 0 0 125 0 0 testuser -- 440400 500000 550000 37418 0 0
-a
) quota-enabled file systems, use the command:
repquota -a
--
displayed after each user is a quick way to determine whether the block limits have been exceeded. If the block soft limit is exceeded, a +
appears in place of the first -
in the output. The second -
indicates the inode limit, but GFS2 file systems do not support inode limits so that character will remain as -
. GFS2 file systems do not support a grace period, so the grace
column will remain blank.
repquota
command is not supported over NFS, irrespective of the underlying file system.
4.5.3. Keeping Quotas Accurate
quotacheck
command to create, check, and repair quota files. Additionally, you may want to run the quotacheck
if you think your quota files may not be accurate, as may occur when a file system is not unmounted cleanly after a system crash.
quotacheck
command, see the quotacheck
man page.
Note
quotacheck
when the file system is relatively idle on all nodes because disk activity may affect the computed quota values.
4.5.4. Synchronizing Quotas with the quotasync
Command
quota_quantum
. You can change this from its default value of 60 seconds using the quota_quantum=
mount option, as described in Table 4.2, “GFS2-Specific Mount Options”. The quota_quantum
parameter must be set on each node and each time the file system is mounted. Changes to the quota_quantum
parameter are not persistent across unmounts. You can update the quota_quantum
value with the mount -o remount
.
quotasync
command to synchronize the quota information from a node to the on-disk quota file between the automatic updates performed by GFS2.
Usage
quotasync [-ug] -a|mntpnt
...
u
- Sync the user quota files.
g
- Sync the group quota files
a
- Sync all file systems that are currently quota-enabled and support sync. When -a is absent, a file system mountpoint should be specified.
mntpnt
- Specifies the GFS2 file system to which the actions apply.
mount -o quota_quantum=secs,remount BlockDevice MountPoint
MountPoint
- Specifies the GFS2 file system to which the actions apply.
secs
- Specifies the new time period between regular quota-file synchronizations by GFS2. Smaller values may increase contention and slow down performance.
Examples
/mnt/mygfs2
.
# quotasync -ug /mnt/mygfs2
/mnt/mygfs2
when remounting that file system on logical volume /dev/volgroup/logical_volume
.
# mount -o quota_quantum=3600,remount /dev/volgroup/logical_volume /mnt/mygfs2
4.6. Growing a File System
gfs2_grow
command is used to expand a GFS2 file system after the device where the file system resides has been expanded. Running a gfs2_grow
command on an existing GFS2 file system fills all spare space between the current end of the file system and the end of the device with a newly initialized GFS2 file system extension. When the fill operation is completed, the resource index for the file system is updated. All nodes in the cluster can then use the extra storage space that has been added.
Warning
kernel-2.6.32-754.el6
that can cause the gfs2_grow
command to fail and potentially cause GFS2 metadata corruption. Before running the gfs2_grow
command, ensure that you update to kernel-2.6.32-754.6.3.el6
.
gfs2_grow
command must be run on a mounted file system, but only needs to be run on one node in a cluster. All the other nodes sense that the expansion has occurred and automatically start using the new space.
Note
mkfs.gfs2
command, you cannot decrease the size of the file system.
Usage
gfs2_grow MountPoint
MountPoint
- Specifies the GFS2 file system to which the actions apply.
Comments
gfs2_grow
command:
- Back up important data on the file system.
- Determine the volume that is used by the file system to be expanded by running a
df
command.MountPoint
- Expand the underlying cluster volume with LVM. For information on administering LVM volumes, see Logical Volume Manager Administration.
gfs2_grow
command, run a df
command to check that the new space is now available in the file system.
Examples
/mygfs2fs
directory is expanded.
[root@dash-01 ~]# gfs2_grow /mygfs2fs
FS: Mount Point: /mygfs2fs
FS: Device: /dev/mapper/gfs2testvg-gfs2testlv
FS: Size: 524288 (0x80000)
FS: RG size: 65533 (0xfffd)
DEV: Size: 655360 (0xa0000)
The file system grew by 512MB.
gfs2_grow complete.
Complete Usage
gfs2_grow [Options
] {MountPoint
|Device
} [MountPoint
|Device
]
MountPoint
- Specifies the directory where the GFS2 file system is mounted.
Device
- Specifies the device node of the file system.
Table 4.3. GFS2-specific Options Available While Expanding A File System
Option | Description |
---|---|
-h | Help. Displays a short usage message. |
-q | Quiet. Turns down the verbosity level. |
-r MegaBytes | Specifies the size of the new resource group. The default size is 256MB. |
-T | Test. Do all calculations, but do not write any data to the disk and do not expand the file system. |
-V | Displays command version information. |
4.7. Adding Journals to a File System
gfs2_jadd
command is used to add journals to a GFS2 file system. You can add journals to a GFS2 file system dynamically at any point without expanding the underlying logical volume. The gfs2_jadd
command must be run on a mounted file system, but it needs to be run on only one node in the cluster. All the other nodes sense that the expansion has occurred.
Note
gfs2_jadd
will fail, even if the logical volume containing the file system has been extended and is larger than the file system. This is because in a GFS2 file system, journals are plain files rather than embedded metadata, so simply extending the underlying logical volume will not provide space for the journals.
journals
option of the gfs2_tool
to find out how many journals the GFS2 file system currently contains. The following example displays the number and size of the journals in the file system mounted at /mnt/gfs2
.
[root@roth-01 ../cluster/gfs2]# gfs2_tool journals /mnt/gfs2
journal2 - 128MB
journal1 - 128MB
journal0 - 128MB
3 journal(s) found.
Usage
gfs2_jadd -j Number MountPoint
Number
- Specifies the number of new journals to be added.
MountPoint
- Specifies the directory where the GFS2 file system is mounted.
Examples
/mygfs2
directory.
gfs2_jadd -j1 /mygfs2
/mygfs2
directory.
gfs2_jadd -j2 /mygfs2
Complete Usage
gfs2_jadd [Options
] {MountPoint
|Device
} [MountPoint
|Device
]
MountPoint
- Specifies the directory where the GFS2 file system is mounted.
Device
- Specifies the device node of the file system.
Table 4.4. GFS2-specific Options Available When Adding Journals
Flag | Parameter | Description |
---|---|---|
-h | Help. Displays short usage message. | |
-J | MegaBytes | Specifies the size of the new journals in megabytes. Default journal size is 128 megabytes. The minimum size is 32 megabytes. To add journals of different sizes to the file system, the gfs2_jadd command must be run for each size journal. The size specified is rounded down so that it is a multiple of the journal-segment size that was specified when the file system was created. |
-j | Number | Specifies the number of new journals to be added by the gfs2_jadd command. The default value is 1. |
-q | Quiet. Turns down the verbosity level. | |
-V | Displays command version information. |
4.8. Data Journaling
fsync()
call on a file causes the file's data to be written to disk immediately. The call returns when the disk reports that all data is safely written.
fsync()
time for very small files because the file data is written to the journal in addition to the metadata. This advantage rapidly reduces as the file size increases. Writing to medium and larger files will be much slower with data journaling turned on.
fsync()
to sync file data may see improved performance by using data journaling. Data journaling can be enabled automatically for any GFS2 files created in a flagged directory (and all its subdirectories). Existing files with zero length can also have data journaling turned on or off.
chattr
command.
/mnt/gfs2/gfs2_dir/newfile
file and then check whether the flag has been set properly.
[root@roth-01 ~]#chattr +j /mnt/gfs2/gfs2_dir/newfile
[root@roth-01 ~]#lsattr /mnt/gfs2/gfs2_dir
---------j--- /mnt/gfs2/gfs2_dir/newfile
/mnt/gfs2/gfs2_dir/newfile
file and then check whether the flag has been set properly.
[root@roth-01 ~]#chattr -j /mnt/gfs2/gfs2_dir/newfile
[root@roth-01 ~]#lsattr /mnt/gfs2/gfs2_dir
------------- /mnt/gfs2/gfs2_dir/newfile
chattr
command to set the j
flag on a directory. When you set this flag for a directory, all files and directories subsequently created in that directory are journaled. The following set of commands sets the j
flag on the gfs2_dir
directory, then checks whether the flag has been set properly. After this, the commands create a new file called newfile
in the /mnt/gfs2/gfs2_dir
directory and then check whether the j
flag has been set for the file. Since the j
flag is set for the directory, then newfile
should also have journaling enabled.
[root@roth-01 ~]#chattr -j /mnt/gfs2/gfs2_dir
[root@roth-01 ~]#lsattr /mnt/gfs2
---------j--- /mnt/gfs2/gfs2_dir [root@roth-01 ~]#touch /mnt/gfs2/gfs2_dir/newfile
[root@roth-01 ~]#lsattr /mnt/gfs2/gfs2_dir
---------j--- /mnt/gfs2/gfs2_dir/newfile
4.9. Configuring atime
Updates
ctime
— The last time the inode status was changedmtime
— The last time the file (or directory) data was modifiedatime
— The last time the file (or directory) data was accessed
atime
updates are enabled as they are by default on GFS2 and other Linux file systems then every time a file is read, its inode needs to be updated.
atime
, those updates can require a significant amount of unnecessary write traffic and file locking traffic. That traffic can degrade performance; therefore, it may be preferable to turn off or reduce the frequency of atime
updates.
atime
updating are available:
- Mount with
relatime
(relative atime), which updates theatime
if the previousatime
update is older than themtime
orctime
update. - Mount with
noatime
, which disablesatime
updates on that file system.
4.9.1. Mount with relatime
relatime
(relative atime) Linux mount option can be specified when the file system is mounted. This specifies that the atime
is updated if the previous atime
update is older than the mtime
or ctime
update.
Usage
mount BlockDevice MountPoint
-o relatime
BlockDevice
- Specifies the block device where the GFS2 file system resides.
MountPoint
- Specifies the directory where the GFS2 file system should be mounted.
Example
/dev/vg01/lvol0
and is mounted on directory /mygfs2
. The atime
updates take place only if the previous atime
update is older than the mtime
or ctime
update.
mount /dev/vg01/lvol0 /mygfs2 -o relatime
4.9.2. Mount with noatime
noatime
Linux mount option can be specified when the file system is mounted, which disables atime
updates on that file system.
Usage
mount BlockDevice MountPoint
-o noatime
BlockDevice
- Specifies the block device where the GFS2 file system resides.
MountPoint
- Specifies the directory where the GFS2 file system should be mounted.
Example
/dev/vg01/lvol0
and is mounted on directory /mygfs2
with atime
updates turned off.
mount /dev/vg01/lvol0 /mygfs2 -o noatime
4.10. Suspending Activity on a File System
dmsetup suspend
command. Suspending write activity allows hardware-based device snapshots to be used to capture the file system in a consistent state. The dmsetup resume
command ends the suspension.
Usage
dmsetup suspend MountPoint
dmsetup resume MountPoint
MountPoint
- Specifies the file system.
Examples
/mygfs2
.
# dmsetup suspend /mygfs2
/mygfs2
.
# dmsetup resume /mygfs2
4.11. Repairing a File System
fsck.gfs2
command.
Important
fsck.gfs2
command must be run only on a file system that is unmounted from all nodes.
Important
fsck.gfs2
command. The fsck.gfs2
command cannot determine at boot time whether the file system is mounted by another node in the cluster. You should run the fsck.gfs2
command manually only after the system boots.
fsck.gfs2
command does not run on a GFS2 file system at boot time, modify the /etc/fstab
file so that the final two columns for a GFS2 file system mount point show "0 0" rather than "1 1" (or any other numbers), as in the following example:
/dev/VG12/lv_svr_home /svr_home gfs2 defaults,noatime,nodiratime,noquota 0 0
Note
fsck.gfs2
command differs from some earlier releases of gfs_fsck
in the following ways:
- Pressing Ctrl+C while running the
fsck.gfs2
interrupts processing and displays a prompt asking whether you would like to abort the command, skip the rest of the current pass, or continue processing. - You can increase the level of verbosity by using the
-v
flag. Adding a second-v
flag increases the level again. - You can decrease the level of verbosity by using the
-q
flag. Adding a second-q
flag decreases the level again. - The
-n
option opens a file system as read-only and answersno
to any queries automatically. The option provides a way of trying the command to reveal errors without actually allowing thefsck.gfs2
command to take effect.
fsck.gfs2
man page for additional information about other command options.
fsck.gfs2
command requires system memory above and beyond the memory used for the operating system and kernel. Each block of memory in the GFS2 file system itself requires approximately five bits of additional memory, or 5/8 of a byte. So to estimate how many bytes of memory you will need to run the fsck.gfs2
command on your file system, determine how many blocks the file system contains and multiply that number by 5/8.
fsck.gfs2
command on a GFS2 file system that is 16TB with a block size of 4K, first determine how many blocks of memory the file system contains by dividing 16Tb by 4K:
17592186044416 / 4096 = 4294967296
4294967296 * 5/8 = 2684354560
fsck.gfs2
command. Note that if the block size was 1K, running the fsck.gfs2
command would require four times the memory, or approximately 11GB.
Usage
fsck.gfs2 -y BlockDevice
-y
- The
-y
flag causes all questions to be answered withyes
. With the-y
flag specified, thefsck.gfs2
command does not prompt you for an answer before making changes. BlockDevice
- Specifies the block device where the GFS2 file system resides.
Example
/dev/testvol/testlv
is repaired. All queries to repair are automatically answered with yes
.
[root@dash-01 ~]# fsck.gfs2 -y /dev/testvg/testlv
Initializing fsck
Validating Resource Group index.
Level 1 RG check.
(level 1 passed)
Clearing journals (this may take a while)...
Journals cleared.
Starting pass1
Pass1 complete
Starting pass1b
Pass1b complete
Starting pass1c
Pass1c complete
Starting pass2
Pass2 complete
Starting pass3
Pass3 complete
Starting pass4
Pass4 complete
Starting pass5
Pass5 complete
Writing changes to disk
fsck.gfs2 complete
4.12. Bind Mounts and Context-Dependent Path Names
bind
option of the mount
command.
bind
option of the mount
command allows you to remount part of a file hierarchy at a different location while it is still available at the original location. The format of this command is as follows.
mount --bind olddir newdir
olddir
directory are available at two locations: olddir
and newdir
. You can also use this option to make an individual file available at two locations.
/root/tmp
will be identical to the contents of the previously mounted /var/log
directory.
[root@menscryfa ~]#cd ~root
[root@menscryfa ~]#mkdir ./tmp
[root@menscryfa ~]#mount --bind /var/log /root/tmp
/etc/fstab
file to achieve the same results at mount time. The following /etc/fstab
entry will result in the contents of /root/tmp
being identical to the contents of the /var/log
directory.
/var/log /root/tmp none bind 0 0
mount
command to see that the file system has been mounted, as in the following example.
[root@menscryfa ~]# mount | grep /tmp
/var/log on /root/tmp type none (rw,bind)
/bin
directory as a Context-Dependent Path Name that would resolve to one of the following paths, depending on the system architecture.
/usr/i386-bin /usr/x86_64-bin /usr/ppc64-bin
/bin
directory. Then, using a script or an entry in the /etc/fstab
file, you can mount each of the individual architecture directories onto the /bin
directory with a mount -bind
command. For example, you can use the following command as a line in a script.
mount --bind /usr/i386-bin /bin
/etc/fstab
file.
/usr/1386-bin /bin none bind 0 0
%fill
for the file system). Context-Dependent Path Names are more limited in what they can encompass. Note, however, that you will need to write your own script to mount according to a criteria such as the value of %fill
.
Warning
bind
option and the original file system was mounted rw
, the new file system will also be mounted rw
even if you use the ro
flag; the ro
flag is silently ignored. In this case, the new file system might be marked as ro
in the /proc/mounts
directory, which may be misleading.
4.13. Bind Mounts and File System Mount Order
bind
option of the mount
command, you must be sure that the file systems are mounted in the correct order. In the following example, the /var/log
directory must be mounted before executing the bind mount on the /tmp
directory:
# mount --bind /var/log /tmp
- In general, file system mount order is determined by the order in which the file systems appear in the
fstab
file. The exceptions to this ordering are file systems mounted with the_netdev
flag or file systems that have their owninit
scripts. - A file system with its own
init
script is mounted later in the initialization process, after the file systems in thefstab
file. - File systems mounted with the
_netdev
flag are mounted when the network has been enabled on the system.
fstab
file as follows:
- Mount local file systems that are required for the bind mount.
- Bind mount the directory on which to mount the GFS2 file system.
- Mount the GFS2 file system.
fstab
file will not mount the file systems correctly since the GFS2 file system will not be mounted until the GFS2 init
script is run. In this case, you should write an init
script to execute the bind mount so that the bind mount will not take place until after the GFS2 file system is mounted.
init
script. This script performs a bind mount of two directories onto two directories of a GFS2 file system. In this example, there is an existing GFS2 mount point at /mnt/gfs2a
, which is mounted when the GFS2 init
script runs, after cluster startup.
chkconfig
statement indicate the following:
- 345 indicates the run levels that the script will be started in
- 29 is the start priority, which in this case indicates that the script will run at startup time after the GFS2
init
script, which has a start priority of 26 - 73 is the stop priority, which in this case indicates that the script will be stopped during shutdown before the GFS2 script, which has a stop priority of 74
service start
and a service stop
command. For example, if the script is named fredwilma
, then you can execute service fredwilma start
.
/etc/init.d
directory with the same permissions as the other scripts in that directory. You can then execute a chkconfig on
command to link the script to the indicated run levels. For example, if the script is named fredwilma
, then you can execute chkconfig fredwilma on
.
#!/bin/bash # # chkconfig: 345 29 73 # description: mount/unmount my custom bind mounts onto a gfs2 subdirectory # # ### BEGIN INIT INFO # Provides: ### END INIT INFO . /etc/init.d/functions case "$1" in start) # In this example, fred and wilma want their home directories # bind-mounted over the gfs2 directory /mnt/gfs2a, which has # been mounted as /mnt/gfs2a mkdir -p /mnt/gfs2a/home/fred &> /dev/null mkdir -p /mnt/gfs2a/home/wilma &> /dev/null /bin/mount --bind /mnt/gfs2a/home/fred /home/fred /bin/mount --bind /mnt/gfs2a/home/wilma /home/wilma ;; stop) /bin/umount /mnt/gfs2a/home/fred /bin/umount /mnt/gfs2a/home/wilma ;; status) ;; restart) $0 stop $0 start ;; reload) $0 start ;; *) echo $"Usage: $0 {start|stop|restart|reload|status}" exit 1 esac exit 0
4.14. The GFS2 Withdraw Function
fsck.gfs2
command. The GFS withdraw function is less severe than a kernel panic, which would cause another node to fence the node.
gfs2
startup script enabled and the GFS2 file system is included in the /etc/fstab
file, the GFS2 file system will be remounted when you reboot. If the GFS2 file system withdrew because of perceived file system corruption, it is recommended that you run the fsck.gfs2
command before remounting the file system. In this case, in order to prevent your file system from remounting at boot time, you can perform the following procedure:
- Temporarily disable the startup script on the affected node with the following command:
#
chkconfig gfs2 off
- Reboot the affected node, starting the cluster software. The GFS2 file system will not be mounted.
- Unmount the file system from every node in the cluster.
- Run the
fsck.gfs2
on the file system from one node only to ensure there is no file system corruption. - Re-enable the startup script on the affected node by running the following command:
#
chkconfig gfs2 on
- Remount the GFS2 file system from all nodes in the cluster.
-o errors=panic
option specified. When this option is specified, any errors that would normally cause the system to withdraw cause the system to panic instead. This stops the node's cluster communications, which causes the node to be fenced.
gfs_controld
daemon requesting withdraw. The gfs_controld
daemon runs the dmsetup
program to place the device mapper error target underneath the file system preventing further access to the block device. It then tells the kernel that this has been completed. This is the reason for the GFS2 support requirement to always use a CLVM device under GFS2, since otherwise it is not possible to insert a device mapper target.
dmsetup
program to insert the error target as requested. This can happen if there is a shortage of memory at the point of the withdraw and memory cannot be reclaimed due to the problem that triggered the withdraw in the first place.
Chapter 5. Diagnosing and Correcting Problems with GFS2 File Systems
5.1. GFS2 File System Shows Slow Performance
5.2. GFS2 File System Hangs and Requires Reboot of One Node
- The gfs2 lock dump for the file system on each node:
cat /sys/kernel/debug/gfs2/fsname/glocks >glocks.fsname.nodename
- The DLM lock dump for the file system on each node: You can get this information with the
dlm_tool
:dlm_tool lockdebug -sv lsname.
In this command, lsname is the lockspace name used by DLM for the file system in question. You can find this value in the output from thegroup_tool
command. - The output from the
sysrq -t
command. - The contents of the
/var/log/messages
file.
5.3. GFS2 File System Hangs and Requires Reboot of All Nodes
- You may have had a failed fence. GFS2 file systems will freeze to ensure data integrity in the event of a failed fence. Check the messages logs to see if there are any failed fences at the time of the hang. Ensure that fencing is configured correctly.
- The GFS2 file system may have withdrawn. Check through the messages logs for the word
withdraw
and check for any messages and calltraces from GFS2 indicating that the file system has been withdrawn. A withdraw is indicative of file system corruption, a storage failure, or a bug. Unmount the file system, update thegfs2-utils
package, and execute thefsck
command on the file system to return it to service. Open a support ticket with Red Hat Support. Inform them you experienced a GFS2 withdraw and provide sosreports with logs.For information on the GFS2 withdraw function, see Section 4.14, “The GFS2 Withdraw Function”. - This error may be indicative of a locking problem or bug. Gather data during one of these occurrences and open a support ticket with Red Hat Support, as described in Section 5.2, “GFS2 File System Hangs and Requires Reboot of One Node”.
5.4. GFS2 File System Does Not Mount on Newly-Added Cluster Node
spectator
mount option set, since these do not require a journal). You can add journals to a GFS2 file system with the gfs2_jadd
command, as described in Section 4.7, “Adding Journals to a File System”.
5.5. Space Indicated as Used in Empty File System
df
command will show that there is space being taken up. This is because GFS2 file system journals consume space (number of journals * journal size) on disk. If you created a GFS2 file system with a large number of journals or specified a large journal size then you will be see (number of journals * journal size) as already in use when you execute the df
. Even if you did not specify a large number of journals or large journals, small GFS2 file systems (in the 1GB or less range) will show a large amount of space as being in use with the default GFS2 journal size.
Chapter 6. Configuring a GFS2 File System in a Pacemaker Cluster
- On each node in the cluster, install the High Availability and Resilient Storage packages.
#
yum groupinstall 'High Availability' 'Resilient Storage'
- Create the Pacemaker cluster and configure fencing for the cluster. For information on configuring a Pacemaker cluster, see Configuring the Red Hat High Availability Add-On with Pacemaker.
- On each node in the cluster, enable the
clvmd
service. If you will be using cluster-mirrored volumes, enable thecmirrord
service.#
chkconfig clvmd on
#chkconfig cmirrord on
After you enable these daemons, when starting and stopping Pacemaker or the cluster through normal means usingpcs cluster start
,pcs cluster stop
,service pacemaker start
, orservice pacemaker stop
, theclvmd
andcmirrord
daemons will be started and stopped as needed. - On one node in the cluster, perform the following steps:
- Set the global Pacemaker parameter
no_quorum_policy
tofreeze
.Note
By default, the value ofno-quorum-policy
is set tostop
, indicating that once quorum is lost, all the resources on the remaining partition will immediately be stopped. Typically this default is the safest and most optimal option, but unlike most resources, GFS2 requires quorum to function. When quorum is lost both the applications using the GFS2 mounts and the GFS2 mount itself cannot be correctly stopped. Any attempts to stop these resources without quorum will fail which will ultimately result in the entire cluster being fenced every time quorum is lost.To address this situation, you can set theno-quorum-policy=freeze
when GFS2 is in use. This means that when quorum is lost, the remaining partition will do nothing until quorum is regained.#
pcs property set no-quorum-policy=freeze
- After ensuring that the locking type is set to 3 in the
/etc/lvm/lvm.conf
file to support clustered locking, Create the clustered LV and format the volume with a GFS2 file system. Ensure that you create enough journals for each of the nodes in your cluster.#
pvcreate /dev/vdb
#vgcreate -Ay -cy cluster_vg /dev/vdb
#lvcreate -L5G -n cluster_lv cluster_vg
#mkfs.gfs2 -j2 -p lock_dlm -t rhel7-demo:gfs2-demo /dev/cluster_vg/cluster_lv
- Configure a
clusterfs
resource.You should not add the file system to the/etc/fstab
file because it will be managed as a Pacemaker cluster resource. Mount options can be specified as part of the resource configuration withoptions=options
. Run thepcs resource describe Filesystem
command for full configuration options.This cluster resource creation command specifies thenoatime
mount option.#
pcs resource create clusterfs Filesystem device="/dev/cluster_vg/cluster_lv" directory="/var/mountpoint" fstype="gfs2" "options=noatime" op monitor interval=10s on-fail=fence clone interleave=true
- Verify that GFS2 is mounted as expected.
#
mount |grep /mnt/gfs2-demo
/dev/mapper/cluster_vg-cluster_lv on /mnt/gfs2-demo type gfs2 (rw,noatime,seclabel)
- (Optional) Reboot all cluster nodes to verify GFS2 persistence and recovery.
Appendix A. GFS2 Quota Management with the gfs2_quota
Command
gfs2_quota
command to manage quotas. This appendix documents the use of the gfs2_quota
command for managing GFS2 file system quotas.
A.1. Setting Quotas with the gfs2_quota
command
gfs2_quota
command. The command only needs to be run on a single node where GFS2 is mounted.
quota=
of the mount
command when mounting the GFS2 file system, as described in Section A.4, “Enabling/Disabling Quota Enforcement”.
Usage
gfs2_quota limit -uUser
-lSize
-fMountPoint
gfs2_quota limit -gGroup
-lSize
-fMountPoint
gfs2_quota warn -uUser
-lSize
-fMountPoint
gfs2_quota warn -gGroup
-lSize
-fMountPoint
User
- A user ID to limit or warn. It can be either a user name from the password file or the UID number.
Group
- A group ID to limit or warn. It can be either a group name from the group file or the GID number.
Size
- Specifies the new value to limit or warn. By default, the value is in units of megabytes. The additional
-k
,-s
and-b
flags change the units to kilobytes, sectors, and file system blocks, respectively. MountPoint
- Specifies the GFS2 file system to which the actions apply.
Examples
/mygfs2
.
# gfs2_quota limit -u Bert -l 1024 -f /mygfs2
/mygfs2
.
# gfs2_quota warn -g 21 -l 50 -k -f /mygfs2
A.2. Displaying Quota Limits and Usage with the gfs2_quota
Command
gfs2_quota get
command. The entire contents of the quota file can also be displayed using the gfs2_quota list
command, in which case all IDs with a non-zero hard limit, soft limit, or value are listed.
Usage
gfs2_quota get -uUser
-fMountPoint
gfs2_quota get -gGroup
-fMountPoint
gfs2_quota list -f MountPoint
User
- A user ID to display information about a specific user. It can be either a user name from the password file or the UID number.
Group
- A group ID to display information about a specific group. It can be either a group name from the group file or the GID number.
MountPoint
- Specifies the GFS2 file system to which the actions apply.
Command Output
gfs2_quota
command is displayed as follows:
userUser
: limit:LimitSize
warn:WarnSize
value:Value
groupGroup
: limit:LimitSize
warn:WarnSize
value:Value
LimitSize
, WarnSize
, and Value
numbers (values) are in units of megabytes by default. Adding the -k
, -s
, or -b
flags to the command line change the units to kilobytes, sectors, or file system blocks, respectively.
User
- A user name or ID to which the data is associated.
Group
- A group name or ID to which the data is associated.
LimitSize
- The hard limit set for the user or group. This value is zero if no limit has been set.
Value
- The actual amount of disk space used by the user or group.
Comments
gfs2_quota
command does not resolve UIDs and GIDs into names if the -n
option is added to the command line.
-d
option to the command line. This is useful when trying to match the numbers from gfs2_quota
with the results of a du
command.
Examples
/mygfs2
.
# gfs2_quota list -f /mygfs2
users
on file system /mygfs2
.
# gfs2_quota get -g users -f /mygfs2 -s
A.3. Synchronizing Quotas with the gfs2_quota
Command
quota_quantum
. You can change this from its default value of 60 seconds using the quota_quantum=
mount option, as described in Table 4.2, “GFS2-Specific Mount Options”. The quota_quantum
parameter must be set on each node and each time the file system is mounted. Changes to the quota_quantum
parameter are not persistent across unmounts. You can update the quota_quantum
value with the mount -o remount
.
gfs2_quota sync
command to synchronize the quota information from a node to the on-disk quota file between the automatic updates performed by GFS2.
Usage
gfs2_quota sync -f MountPoint
MountPoint
- Specifies the GFS2 file system to which the actions apply.
mount -o quota_quantum=secs,remount BlockDevice MountPoint
MountPoint
- Specifies the GFS2 file system to which the actions apply.
secs
- Specifies the new time period between regular quota-file synchronizations by GFS2. Smaller values may increase contention and slow down performance.
Examples
/mygfs2
.
# gfs2_quota sync -f /mygfs2
/mnt/mygfs2
when remounting that file system on logical volume /dev/volgroup/logical_volume
.
# mount -o quota_quantum=3600,remount /dev/volgroup/logical_volume /mnt/mygfs2
A.4. Enabling/Disabling Quota Enforcement
quota=on
option specified.
Usage
mount -o quota=on BlockDevice MountPoint
quota=off
option specified. This is the default setting.
mount -o quota=off BlockDevice MountPoint
-o quota={on|off}
- Specifies that quota enforcement is enabled or disabled when the file system is mounted.
BlockDevice
- Specifies the block device where the GFS2 file system resides.
MountPoint
- Specifies the directory where the GFS2 file system should be mounted.
Examples
/dev/vg01/lvol0
is mounted on the /mygfs2
directory with quota enforcement enabled.
# mount -o quota=on /dev/vg01/lvol0 /mygfs2
A.5. Enabling Quota Accounting
quota=account
option specified.
Usage
mount -o quota=account BlockDevice MountPoint
-o quota=account
- Specifies that user and group usage statistics are maintained by the file system, even though the quota limits are not enforced.
BlockDevice
- Specifies the block device where the GFS2 file system resides.
MountPoint
- Specifies the directory where the GFS2 file system should be mounted.
Example
/dev/vg01/lvol0
is mounted on the /mygfs2
directory with quota accounting enabled.
# mount -o quota=account /dev/vg01/lvol0 /mygfs2
Appendix B. Converting a File System from GFS to GFS2
gfs2_convert
command. Note that you must perform this conversion procedure on a Red Hat Enterprise Linux 5 system before upgrading to Red Hat Enterprise Linux 6.
Note
Warning
gfs_fsck
command to check the file system and fix any errors.
fsck.gfs2
command on the file system until the conversion is complete.
B.1. Conversion of Context-Dependent Path Names
bind
option of the mount
command.
gfs2_convert
command identifies CDPNs and replaces them with empty directories with the same name. In order to configure bind mounts to replace the CDPNs, however, you need to know the full paths of the link targets of the CDPNs you are replacing. Before converting your file system, you can use the find
command to identify the links.
hostname
CDPN:
[root@smoke-01 gfs]# find /mnt/gfs -lname @hostname
/mnt/gfs/log
find
command for other CDPNs (mach
, os
, sys
, uid
, gid
, jid
). Note that since CDPN names can be of the form @hostname
or {hostname}
, you will need to run the find
command for each variant.
B.2. GFS to GFS2 Conversion Procedure
- On a Red Hat Enterprise Linux system, make a backup of your existing GFS file system.
- Unmount the GFS file system from all nodes in the cluster.
- Execute the
gfs_fsck
command on the GFS file system to ensure there is no file system corruption. - Execute
gfs2_convert
. The system will display warnings and confirmation questions before convertinggfsfilesystem
gfsfilesystem
to GFS2. - Upgrade to Red Hat Enterprise Linux 6.
/dev/shell_vg/500g
to a GFS2 file system.
[root@shell-01 ~]# /root/cluster/gfs2/convert/gfs2_convert /dev/shell_vg/500g
gfs2_convert version 2 (built May 10 2010 10:05:40)
Copyright (C) Red Hat, Inc. 2004-2006 All rights reserved.
Examining file system..................
This program will convert a gfs1 filesystem to a gfs2 filesystem.
WARNING: This can't be undone. It is strongly advised that you:
1. Back up your entire filesystem first.
2. Run gfs_fsck first to ensure filesystem integrity.
3. Make sure the filesystem is NOT mounted from any node.
4. Make sure you have the latest software versions.
Convert /dev/shell_vg/500g from GFS1 to GFS2? (y/n)y
Converting resource groups...................
Converting inodes.
24208 inodes from 1862 rgs converted.
Fixing file and directory information.
18 cdpn symlinks moved to empty directories.
Converting journals.
Converting journal space to rg space.
Writing journal #1...done.
Writing journal #2...done.
Writing journal #3...done.
Writing journal #4...done.
Building GFS2 file system structures.
Removing obsolete GFS1 file system structures.
Committing changes to disk.
/dev/shell_vg/500g: filesystem converted successfully to gfs2.
Appendix C. GFS2 tracepoints and the debugfs glocks File
debugfs
interface and the GFS2 tracepoints. It is intended for advanced users who are familiar with file system internals who would like to learn more about the design of GFS2 and how to debug GFS2-specific issues.
C.1. GFS2 tracepoint Types
blktrace
infrastructure and the blktrace
tracepoints can be used in combination with those of GFS2 to gain a fuller picture of the system performance. Due to the level at which the tracepoints operate, they can produce large volumes of data in a very short period of time. They are designed to put a minimum load on the system when they are enabled, but it is inevitable that they will have some effect. Filtering events with a variety of means can help reduce the volume of data and help focus on obtaining just the information which is useful for understanding any particular situation.
C.2. Tracepoints
/sys/kernel/debug/tracing/
directory assuming that debugfs
is mounted in the standard place at the /sys/kernel/debug
directory. The events
subdirectory contains all the tracing events that may be specified and, provided the gfs2
module is loaded, there will be a gfs2
subdirectory containing further subdirectories, one for each GFS2 event. The contents of the /sys/kernel/debug/tracing/events/gfs2
directory should look roughly like the following:
[root@chywoon gfs2]# ls
enable gfs2_bmap gfs2_glock_queue gfs2_log_flush
filter gfs2_demote_rq gfs2_glock_state_change gfs2_pin
gfs2_block_alloc gfs2_glock_put gfs2_log_blocks gfs2_promote
[root@chywoon gfs2]# echo -n 1 >/sys/kernel/debug/tracing/events/gfs2/enable
enable
file in each of the individual event subdirectories. The same is true of the filter
file which can be used to set an event filter for each event or set of events. The meaning of the individual events is explained in more detail below.
[root@chywoon gfs2]# cat /sys/kernel/debug/tracing/trace
/sys/kernel/debug/tracing/trace_pipe
, can be used when all the output is required. Events are read from this file as they occur; there is no historical information available through this interface. The format of the output is the same from both interfaces and is described for each of the GFS2 events in the later sections of this appendix.
trace-cmd
is available for reading tracepoint data. For more information on this utility, see the link in Section C.10, “References”. The trace-cmd
utility can be used in a similar way to the strace
utility, for example to run a command while gathering trace data from various sources.
C.3. Glocks
Table C.1. Glock Modes and DLM Lock Modes
Glock mode | DLM lock mode | Notes |
---|---|---|
UN | IV/NL | Unlocked (no DLM lock associated with glock or NL lock depending on I flag) |
SH | PR | Shared (protected read) lock |
EX | EX | Exclusive lock |
DF | CW | Deferred (concurrent write) used for Direct I/O and file system freeze |
Note
lock_dlm
lock module (not to be confused with the DLM itself) into GFS2.
Note
Table C.2. Glock Modes and Data Types
Glock mode | Cache Data | Cache Metadata | Dirty Data | Dirty Metadata |
---|---|---|---|---|
UN | No | No | No | No |
SH | Yes | Yes | No | No |
DF | No | Yes | No | No |
EX | Yes | Yes | Yes | Yes |
C.4. The glock debugfs Interface
debugfs
interface allows the visualization of the internal state of the glocks and the holders and it also includes some summary details of the objects being locked in some cases. Each line of the file either begins G: with no indentation (which refers to the glock itself) or it begins with a different letter, indented with a single space, and refers to the structures associated with the glock immediately above it in the file (H: is a holder, I: an inode, and R: a resource group) . Here is an example of what the content of this file might look like:
G: s:SH n:5/75320 f:I t:SH d:EX/0 a:0 r:3 H: s:SH f:EH e:0 p:4466 [postmark] gfs2_inode_lookup+0x14e/0x260 [gfs2] G: s:EX n:3/258028 f:yI t:EX d:EX/0 a:3 r:4 H: s:EX f:tH e:0 p:4466 [postmark] gfs2_inplace_reserve_i+0x177/0x780 [gfs2] R: n:258028 f:05 b:22256/22256 i:16800 G: s:EX n:2/219916 f:yfI t:EX d:EX/0 a:0 r:3 I: n:75661/219916 t:8 f:0x10 d:0x00000000 s:7522/7522 G: s:SH n:5/127205 f:I t:SH d:EX/0 a:0 r:3 H: s:SH f:EH e:0 p:4466 [postmark] gfs2_inode_lookup+0x14e/0x260 [gfs2] G: s:EX n:2/50382 f:yfI t:EX d:EX/0 a:0 r:2 G: s:SH n:5/302519 f:I t:SH d:EX/0 a:0 r:3 H: s:SH f:EH e:0 p:4466 [postmark] gfs2_inode_lookup+0x14e/0x260 [gfs2] G: s:SH n:5/313874 f:I t:SH d:EX/0 a:0 r:3 H: s:SH f:EH e:0 p:4466 [postmark] gfs2_inode_lookup+0x14e/0x260 [gfs2] G: s:SH n:5/271916 f:I t:SH d:EX/0 a:0 r:3 H: s:SH f:EH e:0 p:4466 [postmark] gfs2_inode_lookup+0x14e/0x260 [gfs2] G: s:SH n:5/312732 f:I t:SH d:EX/0 a:0 r:3 H: s:SH f:EH e:0 p:4466 [postmark] gfs2_inode_lookup+0x14e/0x260 [gfs2]
cat /sys/kernel/debug/gfs2/unity:myfs/glocks >my.lock
during a run of the postmark benchmark on a single node GFS2 file system. The glocks in the figure have been selected in order to show some of the more interesting features of the glock dumps.
iopen
glock which relates to inode 75320. In the case of inode and iopen
glocks, the glock number is always identical to the inode's disk block number.
Note
blktrace
for example) and with output from stat
(1).
debugfs
interface.
Table C.3. Glock types
Type number | Lock type | Use |
---|---|---|
1 | trans | Transaction lock |
2 | inode | Inode metadata and data |
3 | rgrp | Resource group metadata |
4 | meta | The superblock |
5 | iopen | Inode last closer detection |
6 | flock | flock (2) syscall |
8 | quota | Quota operations |
9 | journal | Journal mutex |
Table C.4. Glock flags
Flag | Name | Meaning |
---|---|---|
d | Pending demote | A deferred (remote) demote request |
D | Demote | A demote request (local or remote) |
f | Log flush | The log needs to be committed before releasing this glock |
F | Frozen | Replies from remote nodes ignored - recovery is in progress. |
i | Invalidate in progress | In the process of invalidating pages under this glock |
I | Initial | Set when DLM lock is associated with this glock |
l | Locked | The glock is in the process of changing state |
L | LRU | Set when the glock is on the LRU list` |
o | Object | Set when the glock is associated with an object (that is, an inode for type 2 glocks, and a resource group for type 3 glocks) |
p | Demote in progress | The glock is in the process of responding to a demote request |
q | Queued | Set when a holder is queued to a glock, and cleared when the glock is held, but there are no remaining holders. Used as part of the algorithm the calculates the minimum hold time for a glock. |
r | Reply pending | Reply received from remote node is awaiting processing |
y | Dirty | Data needs flushing to disk before releasing this glock |
C.5. Glock Holders
Table C.5. Glock holder flags
Flag | Name | Meaning |
---|---|---|
a | Async | Do not wait for glock result (will poll for result later) |
A | Any | Any compatible lock mode is acceptable |
c | No cache | When unlocked, demote DLM lock immediately |
e | No expire | Ignore subsequent lock cancel requests |
E | Exact | Must have exact lock mode |
F | First | Set when holder is the first to be granted for this lock |
H | Holder | Indicates that requested lock is granted |
p | Priority | Enqueue holder at the head of the queue |
t | Try | A "try" lock |
T | Try 1CB | A "try" lock that sends a callback |
W | Wait | Set while waiting for request to complete |
try 1CB
) lock, on the other hand, is identical to the t lock except that the DLM will send a single callback to current incompatible lock holders. One use of the T (try 1CB
) lock is with the iopen
locks, which are used to arbitrate among the nodes when an inode's i_nlink
count is zero, and determine which of the nodes will be responsible for deallocating the inode. The iopen
glock is normally held in the shared state, but when the i_nlink
count becomes zero and ->delete_inode
() is called, it will request an exclusive lock with T (try 1CB
) set. It will continue to deallocate the inode if the lock is granted. If the lock is not granted it will result in the node(s) which were preventing the grant of the lock marking their glock(s) with the D (demote) flag, which is checked at ->drop_inode
() time in order to ensure that the deallocation is not forgotten.
close
() occurs. Also, at the same time as the inode's link count is decremented to zero the inode is marked as being in the special state of having zero link count but still in use in the resource group bitmap. This functions like the ext3 file system3's orphan list in that it allows any subsequent reader of the bitmap to know that there is potentially space that might be reclaimed, and to attempt to reclaim it.
C.6. Glock tracepoints
gfs2_glock_state_change
tracepoint is the most important one to understand. It tracks every state change of the glock from initial creation right through to the final demotion which ends with gfs2_glock_put
and the final NL to unlocked transition. The l (locked) glock flag is always set before a state change occurs and will not be cleared until after it has finished. There are never any granted holders (the H glock holder flag) during a state change. If there are any queued holders, they will always be in the W (waiting) state. When the state change is complete then the holders may be granted which is the final operation before the l glock flag is cleared.
gfs2_demote_rq
tracepoint keeps track of demote requests, both local and remote. Assuming that there is enough memory on the node, the local demote requests will rarely be seen, and most often they will be created by umount or by occasional memory reclaim. The number of remote demote requests is a measure of the contention between nodes for a particular inode or resource group.
gfs2_promote
is called, this occurs as the final stages of a state change or when a lock is requested which can be granted immediately due to the glock state already caching a lock of a suitable mode. If the holder is the first one to be granted for this glock, then the f (first) flag is set on that holder. This is currently used only by resource groups.
C.7. Bmap tracepoints
gfs2_bmap
tracepoint is called twice for each bmap operation: once at the start to display the bmap request, and once at the end to display the result. This makes it easy to match the requests and results together and measure the time taken to map blocks in different parts of the file system, different file offsets, or even of different files. It is also possible to see what the average extent sizes being returned are in comparison to those being requested.
gfs2_block_alloc
is called not only on allocations, but also on freeing of blocks. Since the allocations are all referenced according to the inode for which the block is intended, this can be used to track which physical blocks belong to which files in a live file system. This is particularly useful when combined with blktrace
, which will show problematic I/O patterns that may then be referred back to the relevant inodes using the mapping gained by means of this tracepoint.
C.8. Log tracepoints
gfs2_pin
), as well as the time taken to commit the transactions to the log (gfs2_log_flush
). This can be very useful when trying to debug journaling performance issues.
gfs2_log_blocks
tracepoint keeps track of the reserved blocks in the log, which can help show if the log is too small for the workload, for example.
gfs2_ail_flush
tracepoint (Red Hat Enterprise Linux 6.2 and later) is similar to the gfs2_log_flush
tracepoint in that it keeps track of the start and end of flushes of the AIL list. The AIL list contains buffers which have been through the log, but have not yet been written back in place and this is periodically flushed in order to release more log space for use by the filesystem, or when a process requests a sync or fsync.
C.9. Glock Statistics
dcount
, which counts the number of DLM operations requested. This shows how much data has gone into the mean/variance calculations.qcount
, which counts the number ofsyscall
level operations requested. Generallyqcount
will be equal to or greater thandcount
.
- srtt/srttvar: Smoothed round trip time for non-blocking operations
- srttb/srttvarb: Smoothed round trip time for blocking operations
- irtt/irttvar: Inter-request time (for example, time between DLM requests)
sysfs
files:
- The
glstats
file. This file is similar to theglocks
file, except that it contains statistics, with one glock per line. The data is initialized from "per cpu" data for that glock type for which the glock is created (aside from counters, which are zeroed). This file may be very large. - The
lkstats
file. This contains "per cpu" stats for each glock type. It contains one statistic per line, in which each column is a cpu core. There are eight lines per glock type, with types following on from each other.
C.10. References
glocks
file, see the following resources:
- For information on glock internal locking rules, see http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/filesystems/gfs2-glocks.txt;h=0494f78d87e40c225eb1dc1a1489acd891210761;hb=HEAD.
- For information on event tracing, see http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/trace/events.txt;h=09bd8e9029892e4e1d48078de4d076e24eff3dd2;hb=HEAD.
- For information on the
trace-cmd
utility, see http://lwn.net/Articles/341902/.
Appendix D. Revision History
Revision History | |||
---|---|---|---|
Revision 9.1-4 | Thu Oct 26 2017 | ||
| |||
Revision 9.1-2 | Wed Mar 8 2017 | ||
| |||
Revision 9.1-1 | Fri Dec 16 2016 | ||
| |||
Revision 8.1-5 | Wed Apr 27 2016 | ||
| |||
Revision 8.1-2 | Wed Mar 9 2016 | ||
| |||
Revision 7.1-5 | Wed Jul 8 2015 | ||
| |||
Revision 7.1-4 | Mon Apr 13 2015 | ||
| |||
Revision 7.0-9 | Wed Oct 8 2014 | ||
| |||
Revision 7.0-8 | Thu Aug 7 2014 | ||
| |||
Revision 6.0-6 | Wed Nov 13 2013 | ||
| |||
Revision 6.0-5 | Fri Sep 27 2013 | ||
| |||
Revision 5.0-7 | Mon Feb 18 2013 | ||
| |||
Revision 5.0-5 | Mon Nov 26 2012 | ||
| |||
Revision 4.0-2 | Thu Mar 28 2012 | ||
| |||
Revision 3.0-2 | Thu Dec 1 2011 | ||
| |||
Revision 3.0-1 | Mon Sep 19 2011 | ||
| |||
Revision 2.0-1 | Thu May 19 2011 | ||
| |||
Revision 1.0-1 | Wed Nov 15 2010 | ||
|
Index
A
- acl mount option, Mounting a File System
- adding journals to a file system, Adding Journals to a File System
- atime, configuring updates, Configuring atime Updates
- mounting with noatime , Mount with noatime
- mounting with relatime , Mount with relatime
- audience, Audience
B
- bind mount
- mount order, Bind Mounts and File System Mount Order
- bind mounts, Bind Mounts and Context-Dependent Path Names
C
- Configuration considerations, GFS2 Configuration and Operational Considerations
- configuration, before, Before Setting Up GFS2
- configuration, initial, Getting Started
- prerequisite tasks, Prerequisite Tasks
- Context-Dependent Path Names (CDPNs)
- GFS to GFS2 Conversion, Conversion of Context-Dependent Path Names
D
- data journaling, Data Journaling
- debugfs, GFS2 tracepoints and the debugfs glocks File
- debugfs file, Troubleshooting GFS2 Performance with the GFS2 Lock Dump
- disk quotas
- additional resources, References
- assigning per group, Assigning Quotas per Group
- assigning per user, Assigning Quotas per User
- enabling, Configuring Disk Quotas
- creating quota files, Creating the Quota Database Files
- quotacheck, running, Creating the Quota Database Files
- hard limit, Assigning Quotas per User
- management of, Managing Disk Quotas
- quotacheck command, using to check, Keeping Quotas Accurate
- reporting, Managing Disk Quotas
- soft limit, Assigning Quotas per User
F
- features, new and changed, New and Changed Features
- feedback
- contact information for this manual, We Need Feedback!
- file system
- adding journals, Adding Journals to a File System
- atime, configuring updates, Configuring atime Updates
- mounting with noatime , Mount with noatime
- mounting with relatime , Mount with relatime
- bind mounts, Bind Mounts and Context-Dependent Path Names
- context-dependent path names (CDPNs), Bind Mounts and Context-Dependent Path Names
- data journaling, Data Journaling
- growing, Growing a File System
- making, Making a File System
- mount order, Bind Mounts and File System Mount Order
- mounting, Mounting a File System, Special Considerations when Mounting GFS2 File Systems
- quota management, GFS2 Quota Management, Setting Up Quotas in Enforcement or Accounting Mode, GFS2 Quota Management with the gfs2_quota Command
- displaying quota limits, Displaying Quota Limits and Usage with the gfs2_quota Command
- enabling quota accounting, Enabling Quota Accounting
- enabling/disabling quota enforcement, Enabling/Disabling Quota Enforcement
- setting quotas, Setting Quotas with the gfs2_quota command
- synchronizing quotas, Synchronizing Quotas with the quotasync Command, Synchronizing Quotas with the gfs2_quota Command
- repairing, Repairing a File System
- suspending activity, Suspending Activity on a File System
- unmounting, Unmounting a File System, Special Considerations when Mounting GFS2 File Systems
- fsck.gfs2 command, Repairing a File System
G
- GFS2
- atime, configuring updates, Configuring atime Updates
- mounting with noatime , Mount with noatime
- mounting with relatime , Mount with relatime
- Configuration considerations, GFS2 Configuration and Operational Considerations
- managing, Managing GFS2
- Operation, GFS2 Configuration and Operational Considerations
- quota management, GFS2 Quota Management, Setting Up Quotas in Enforcement or Accounting Mode, GFS2 Quota Management with the gfs2_quota Command
- displaying quota limits, Displaying Quota Limits and Usage with the gfs2_quota Command
- enabling quota accounting, Enabling Quota Accounting
- enabling/disabling quota enforcement, Enabling/Disabling Quota Enforcement
- setting quotas, Setting Quotas with the gfs2_quota command
- synchronizing quotas, Synchronizing Quotas with the quotasync Command, Synchronizing Quotas with the gfs2_quota Command
- withdraw function, The GFS2 Withdraw Function
- GFS2 file system maximum size, GFS2 Overview
- GFS2-specific options for adding journals table, Complete Usage
- GFS2-specific options for expanding file systems table, Complete Usage
- gfs2_grow command, Growing a File System
- gfs2_jadd command, Adding Journals to a File System
- gfs2_quota command, GFS2 Quota Management with the gfs2_quota Command
- glock, GFS2 tracepoints and the debugfs glocks File
- glock flags, Troubleshooting GFS2 Performance with the GFS2 Lock Dump, The glock debugfs Interface
- glock holder flags, Troubleshooting GFS2 Performance with the GFS2 Lock Dump, Glock Holders
- glock types, Troubleshooting GFS2 Performance with the GFS2 Lock Dump, The glock debugfs Interface
- growing a file system, Growing a File System
I
- initial tasks
- setup, initial, Initial Setup Tasks
- introduction, Introduction
- audience, Audience
M
- making a file system, Making a File System
- managing GFS2, Managing GFS2
- maximum size, GFS2 file system, GFS2 Overview
- mkfs command, Making a File System
- mkfs.gfs2 command options table, Complete Options
- mount command, Mounting a File System
- mount table, Complete Usage
- mounting a file system, Mounting a File System, Special Considerations when Mounting GFS2 File Systems
N
- node locking, GFS2 Node Locking
O
- overview, GFS2 Overview
- configuration, before, Before Setting Up GFS2
- features, new and changed, New and Changed Features
P
- path names, context-dependent (CDPNs), Bind Mounts and Context-Dependent Path Names
- performance tuning, Performance Tuning With GFS2
- Posix locking, Issues with Posix Locking
- preface (see introduction)
- prerequisite tasks
- configuration, initial, Prerequisite Tasks
Q
- quota management, GFS2 Quota Management, Setting Up Quotas in Enforcement or Accounting Mode, GFS2 Quota Management with the gfs2_quota Command
- displaying quota limits, Displaying Quota Limits and Usage with the gfs2_quota Command
- enabling quota accounting, Enabling Quota Accounting
- enabling/disabling quota enforcement, Enabling/Disabling Quota Enforcement
- setting quotas, Setting Quotas with the gfs2_quota command
- synchronizing quotas, Synchronizing Quotas with the quotasync Command, Synchronizing Quotas with the gfs2_quota Command
- quota= mount option, Setting Quotas with the gfs2_quota command
- quotacheck , Creating the Quota Database Files
- quotacheck command
- checking quota accuracy with, Keeping Quotas Accurate
- quota_quantum tunable parameter, Synchronizing Quotas with the quotasync Command, Synchronizing Quotas with the gfs2_quota Command
R
- repairing a file system, Repairing a File System
S
- setup, initial
- initial tasks, Initial Setup Tasks
- suspending activity on a file system, Suspending Activity on a File System
- system hang at unmount, Special Considerations when Mounting GFS2 File Systems
T
- tables
- GFS2-specific options for adding journals, Complete Usage
- GFS2-specific options for expanding file systems, Complete Usage
- mkfs.gfs2 command options, Complete Options
- mount options, Complete Usage
- tracepoints, GFS2 tracepoints and the debugfs glocks File
- tuning, performance, Performance Tuning With GFS2
U
- umount command, Unmounting a File System
- unmount, system hang, Special Considerations when Mounting GFS2 File Systems
- unmounting a file system, Unmounting a File System, Special Considerations when Mounting GFS2 File Systems
W
- withdraw function, GFS2, The GFS2 Withdraw Function