2.9. GFS2 Node Locking

In order to get the best performance from a GFS2 file system, it is very important to understand some of the basic theory of its operation. A single node file system is implemented alongside a cache, the purpose of which is to eliminate latency of disk accesses when using frequently requested data. In Linux the page cache (and historically the buffer cache) provide this caching function.
With GFS2, each node has its own page cache which may contain some portion of the on-disk data. GFS2 uses a locking mechanism called glocks (pronounced gee-locks) to maintain the integrity of the cache between nodes. The glock subsystem provides a cache management function which is implemented using the distributed lock manager (DLM) as the underlying communication layer.
The glocks provide protection for the cache on a per-inode basis, so there is one lock per inode which is used for controlling the caching layer. If that glock is granted in shared mode (DLM lock mode: PR) then the data under that glock may be cached upon one or more nodes at the same time, so that all the nodes may have local access to the data.
If the glock is granted in exclusive mode (DLM lock mode: EX) then only a single node may cache the data under that glock. This mode is used by all operations which modify the data (such as the write system call).
If another node requests a glock which cannot be granted immediately, then the DLM sends a message to the node or nodes which currently hold the glocks blocking the new request to ask them to drop their locks. Dropping glocks can be (by the standards of most file system operations) a long process. Dropping a shared glock requires only that the cache be invalidated, which is relatively quick and proportional to the amount of cached data.
Dropping an exclusive glock requires a log flush, and writing back any changed data to disk, followed by the invalidation as per the shared glock.
The difference between a single node file system and GFS2, then, is that a single node file system has a single cache and GFS2 has a separate cache on each node. In both cases, latency to access cached data is of a similar order of magnitude, but the latency to access uncached data is much greater in GFS2 if another node has previously cached that same data.

Note

Due to the way in which GFS2's caching is implemented the best performance is obtained when either of the following takes place:
  • An inode is used in a read only fashion across all nodes.
  • An inode is written or modified from a single node only.
Note that inserting and removing entries from a directory during file creation and deletion counts as writing to the directory inode.
It is possible to break this rule provided that it is broken relatively infrequently. Ignoring this rule too often will result in a severe performance penalty.
If you mmap() a file on GFS2 with a read/write mapping, but only read from it, this only counts as a read. On GFS though, it counts as a write, so GFS2 is much more scalable with mmap() I/O.
If you do not set the noatime mount parameter, then reads will also result in writes to update the file timestamps. We recommend that all GFS2 users should mount with noatime unless they have a specific requirement for atime.

2.9.1. Issues with Posix Locking

When using Posix locking, you should take the following into account:
  • Use of Flocks will yield faster processing than use of Posix locks.
  • Programs using Posix locks in GFS2 should avoid using the GETLK function since, in a clustered environment, the process ID may be for a different node in the cluster.