4.14. Repairing a File System

When nodes fail with the file system mounted, file system journaling allows fast recovery. However, if a storage device loses power or is physically disconnected, file system corruption may occur. (Journaling cannot be used to recover from storage subsystem failures.) When that type of corruption occurs, you can recover the GFS file system by using the gfs_fsck command.


The gfs_fsck command must be run only on a file system that is unmounted from all nodes.


You should not check a GFS file system at boot time with the gfs_fsck command. The gfs_fsck command can not determine at boot time whether the file system is mounted by another node in the cluster. You should run the gfs_fsck command manually only after the system boots.
To ensure that the gfs_fsck command does not run on a GFS file system at boot time, modify the /etc/fstab file so that the final two columns for a GFS file system mount point show "0 0" rather than "1 1" (or any other numbers), as in the following example:
/dev/VG12/lv_svr_home	/svr_home	gfs	defaults,noatime,nodiratime,noquota	0 0


The gfs_fsck command has changed from previous releases of Red Hat GFS in the following ways:
  • Pressing Ctrl+C while running the gfs_fsck interrupts processing and displays a prompt asking whether you would like to abort the command, skip the rest of the current pass, or continue processing.
  • You can increase the level of verbosity by using the -v flag. Adding a second -v flag increases the level again.
  • You can decrease the level of verbosity by using the -q flag. Adding a second -q flag decreases the level again.
  • The -n option opens a file system as read-only and answers no to any queries automatically. The option provides a way of trying the command to reveal errors without actually allowing the gfs_fsck command to take effect.
Refer to the gfs_fsck man page, gfs_fsck(8), for additional information about other command options.
Running the gfs_fsck command requires system memory above and beyond the memory used for the operating system and kernel. Each block of memory in the file system itself requires approximately one byte of additional memory. So to estimate the amount of memory you will need to run the gfs_fsck command on your file system, divide the file system size (in bytes) by the block size.
For example, for a GFS file system that is 16TB with a block size of 4K, divide 16TB by 4K:
 17592186044416 / 4096 = 4294967296
This file system requires approximately 4GB of free memory to run the gfs_fsck command. Note that if the block size was 1K, running the gfs_fsck command would require four times the memory, or 16GB.


gfs_fsck -y BlockDevice
The -y flag causes all questions to be answered with yes. With the -y flag specified, the gfs_fsck command does not prompt you for an answer before making changes.
Specifies the block device where the GFS file system resides.


In this example, the GFS file system residing on block device /dev/gfsvg/gfslv is repaired. All queries to repair are automatically answered with yes. Because this example uses the -v (verbose) option, the sample output is extensive and repetitive lines have been elided.
[root@tng3-1]# gfs_fsck -v -y /dev/gfsvg/gfslv
Initializing fsck
Initializing lists...
Initializing special inodes...
Validating Resource Group index.
Level 1 check.
92 resource groups found.
Setting block ranges...
Creating a block list of size 9175040...
Clearing journals (this may take a while)Clearing journal 0
Clearing journal 1
Clearing journal 2
Clearing journal 10

Journals cleared.
Starting pass1
Checking metadata in Resource Group 0
Checking metadata in Resource Group 1
Checking metadata in Resource Group 91
Pass1 complete      
Starting pass1b
Looking for duplicate blocks...
No duplicate blocks found
Pass1b complete      
Starting pass1c
Looking for inodes containing ea blocks...
Pass1c complete      
Starting pass2
Checking directory inodes.
Pass2 complete      
Starting pass3
Marking root inode connected
Checking directory linkage.
Pass3 complete      
Starting pass4
Checking inode reference counts.
Pass4 complete      
Starting pass5
Updating Resource Group 92
Pass5 complete      
Writing changes to disk
Syncing the device.
Freeing buffers.