XFS not recovering space on read-onlly filesystems following a system crash on RHEL

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 7
  • kernel versions prior to kernel-3.10.0-862.el7
  • XFS filesystems

Issue

  • After a system crash space allocated is not being reclaimed on my root filesystem.

  • After a system crash xfs_repair -nv returns -1 indicating corruption was detected, shouldn't log recovery prevent this?

    root@rhel7 ~]# xfs_repair -nv $IMAGE
    Phase 1 - find and verify superblock...
        - block cache size set to 89000 entries
    Phase 2 - using internal log
        - zero log...
    zero_log: head block 101 tail block 101
        - scan filesystem freespace and inode maps...
    agi unlinked bucket 3 is 67 in ag 0 (inode=67)
        - found root inode chunk
    Phase 3 - for each AG...
    ...
    Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
    disconnected inode 67, would move to lost+found
    Phase 7 - verify link counts...
    would have reset inode 67 nlinks from 0 to 1
    No modify flag set, skipping filesystem flush and exiting.
    

Resolution

This issue was resolved in kernel-3.10.0-862.el7 via errata RHSA-2018:1062

Root Cause

XFS log recovery was not processing unlinked inodes on read-only mounts.

The boot process initially mounts the root filesystem read-only prior to switching to read-write, meaning that root filesystems are most effected.

Diagnostic Steps

[root@rhel7 ~]# uname -r
3.10.0-514.el7.x86_64
[root@rhel7 ~]# IMAGE=$(mktemp /tmp/xfs.image.XXX)
[root@rhel7 ~]# truncate -s 1G $IMAGE
[root@rhel7 ~]# mkfs.xfs $IMAGE
meta-data=/tmp/xfs.image.8as     isize=512    agcount=4, agsize=65536 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0, sparse=0
data     =                       bsize=4096   blocks=262144, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
[root@rhel7 ~]# mount $IMAGE /mnt
[root@rhel7 ~]# df -h /mnt
Filesystem      Size  Used Avail Use% Mounted on
/dev/loop0     1014M   33M  982M   4% /mnt
[root@rhel7 ~]# fallocate -l 800M /mnt/a_file
[root@rhel7 ~]# df -h /mnt/a_file
Filesystem      Size  Used Avail Use% Mounted on
/dev/loop0     1014M  833M  182M  83% /mnt
[root@rhel7 ~]# tail -f /mnt/a_file > /dev/null &
[1] 3488
[root@rhel7 ~]# jobs
[1]+  Running                 tail -f /mnt/a_file > /dev/null &
[root@rhel7 ~]# rm -f /mnt/a_file
[root@rhel7 ~]# ls /mnt
[root@rhel7 ~]# 

If lsof is installed the file can be seen as still open, but in a deleted state.

[root@rhel7 ~]# lsof /mnt
COMMAND  PID USER   FD   TYPE DEVICE  SIZE/OFF NODE NAME
tail    3686 root    3r   REG    7,0 838860800   67 /mnt/a_file (deleted)
[root@rhel7 ~]# 

Force a shutdown of the filesystem, this simulates a crash or other unclean shutdown.

[root@rhel7 ~]# xfs_io -x -c "shutdown -f" /mnt
[root@rhel7 ~]# umount /mnt
umount: /mnt: target is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
[root@rhel7 ~]# kill %1
[root@rhel7 ~]# umount /mnt
[1]+  Terminated              tail -f /mnt/a_file > /dev/null

Mount the image read-only.

Note mount can usually be used to create the loop device automatically, but if the read-only option is used the loop device will be created as readonly, preventing log recovery from taking place.

[root@rhel7 ~]# losetup --find --show $IMAGE
/dev/loop0
[root@rhel7 ~]# mount -o ro /dev/loop0 /mnt
[root@rhel7 ~]# mount -o remount,rw /dev/loop0 /mnt
[root@rhel7 ~]# dmesg | tail -3
[  490.130228] XFS (loop0): Mounting V5 Filesystem
[  490.158399] XFS (loop0): Starting recovery (logdev: internal)
[  490.158866] XFS (loop0): Ending recovery (logdev: internal)
[root@rhel7 ~]# df -h /mnt
Filesystem      Size  Used Avail Use% Mounted on
/dev/loop0     1014M  801M  214M  79% /mnt
[root@rhel7 ~]# ls /mnt
[root@rhel7 ~]# umount /mnt

[root@rhel7 ~]# xfs_repair -nv $IMAGE
Phase 1 - find and verify superblock...
        - block cache size set to 89000 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 101 tail block 101
        - scan filesystem freespace and inode maps...
agi unlinked bucket 3 is 67 in ag 0 (inode=67)
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
disconnected inode 67, would move to lost+found
Phase 7 - verify link counts...
would have reset inode 67 nlinks from 0 to 1
No modify flag set, skipping filesystem flush and exiting.

        XFS_REPAIR Summary    Thu Jan 10 00:33:42 2019

Phase       Start       End     Duration
Phase 1:    01/10 00:33:42  01/10 00:33:42  
Phase 2:    01/10 00:33:42  01/10 00:33:42  
Phase 3:    01/10 00:33:42  01/10 00:33:42  
Phase 4:    01/10 00:33:42  01/10 00:33:42  
Phase 5:    Skipped
Phase 6:    01/10 00:33:42  01/10 00:33:42  
Phase 7:    01/10 00:33:42  01/10 00:33:42  

Total run time: 

[root@rhel7 ~]# echo $?
1
[root@rhel7 ~]# man xfs_repair
...
       xfs_repair -n (no modify node)  will  return  a  status  of  1  if
       filesystem  corruption was detected and 0 if no filesystem corrup‐
       tion was detected.  xfs_repair run  without  the  -n  option  will
       always return a status code of 0.

Running without -n moves the unlinked file to lost+found allowing it to be manually removed after mounting.

[root@rhel7 ~]# xfs_repair $IMAGE
...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
disconnected inode 67, moving to lost+found
Phase 7 - verify and correct link counts...
done

[root@rhel7 ~]# mount $IMAGE /mnt
[root@rhel7 ~]# df -h /mnt
Filesystem      Size  Used Avail Use% Mounted on
/dev/loop1     1014M  833M  182M  83% /mnt
[root@rhel7 ~]# ls -lh /mnt/lost+found/
total 800M
-rw-r--r--. 1 root root 800M Jan 10 00:32 67
[root@rhel7 ~]# rm -f /mnt/lost+found/67 

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.