Why did my GFS2 filesystem withdraw with: function = get_leaf, file = fs/gfs2/dir.c, line = 763
Environment
- Red Hat Enterprise Linux Server 5.8 or 5.9 (with the High Availability Add on)
- Affected kernels are between
2.6.18-274.12.1.el5
and2.6.18-348.1.1.el5
(not including either of these kernels) - GFS2 filesystems
Issue
- A
GFS2
withdraw is occurring similar to the following:
GFS2: fsid=MyCluster:MyGFS.0: function = get_leaf, file = fs/gfs2/dir.c, line = 763
Resolution
Update to kernel-2.6.18-348.1.1.el5
or later in RHEL 5 Update 9
Update to kernel-2.6.18-371.el5
or later in RHEL 5 Update 10 and above
- Install the newer kernel on all cluster nodes and perform a full cluster restart.
- Update to the latest gfs2-utils package
- fsck the gfs2 filesystem using the steps in How can I recover from a GFS2 withdrawal and fix any filesystem corruption that might exist in a Red Hat Enterprise Linux 5, 6, or 7 Resilient Storage cluster?, however you can skip gathering diagnostic data as this issue is already fully eplained.
Root Cause
- GFS2 filesystems will withdraw when they encounter corruption to prevent additional corruption from occurring.
- More information about this fix is available in the RHEL5.9 release notes:
Previously, GFS2 did not properly free directory hash table memory from cache when the directory was removed from cache. If the same
GFS2 inode was later reused as another directory, the stale directory hash table was reused instead of reading the correct
information from the media. If the GFS2 hash table was not reused, a small amount of memory was lost until the next reboot.
If the hash table was reused, the directory could become corrupt. Later, GFS2 could discover the file system inconsistency and
withdraw from the file system, making it unavailable until the system was rebooted. This update applies a patch to the kernel that
frees the directory hash table correctly from cache and prevents this file system corruption.
Diagnostic Steps
If the following symptoms exist, this solution may apply:
- Filesystem withdraw's do not occur if a kernel including 2.6.18-274.12.1.el5 or earlier is used, or a kernel including 2.6.18-348.1.1.el5 or later is used.
- Review the
/var/log/messages
file(s) and look for the GFS2 withdraw message similar to:
kernel: GFS2: fsid=rhcluster:app02.2: fatal: invalid metadata block
kernel: GFS2: fsid=rhcluster:app02.2: bh = 9120530 (magic number)
kernel: GFS2: fsid=rhcluster:app02.2: function = get_leaf, file = fs/gfs2/dir.c, line = 763
kernel: GFS2: fsid=rhcluster:app02.2: about to withdraw this file system
kernel: GFS2: fsid=rhcluster:app02.2: telling LM to withdraw
kernel: GFS2: fsid=rhcluster:app02.2: withdrawn
kernel:
kernel: Call Trace:
kernel: [<ffffffff8890c764>] :gfs2:gfs2_lm_withdraw+0xd3/0x100
kernel: [<ffffffff80063a2a>] __wait_on_bit+0x60/0x6e
kernel: [<ffffffff8001558e>] sync_buffer+0x0/0x3f
kernel: [<ffffffff88902d8f>] :gfs2:gfs2_dirent_find+0x0/0x4d
kernel: [<ffffffff80063aa4>] out_of_line_wait_on_bit+0x6c/0x78
kernel: [<ffffffff800a34d5>] wake_bit_function+0x0/0x23
kernel: [<ffffffff8001aaeb>] submit_bh+0x10d/0x114
kernel: [<ffffffff88920803>] :gfs2:gfs2_meta_check_ii+0x2c/0x38
kernel: [<ffffffff889023ca>] :gfs2:get_leaf+0x6b/0xa8
kernel: [<ffffffff889029e6>] :gfs2:get_first_leaf+0x2a/0x31
kernel: [<ffffffff88902a70>] :gfs2:gfs2_dirent_search+0x83/0x16e
kernel: [<ffffffff8890403a>] :gfs2:gfs2_dir_search+0x21/0x73
kernel: [<ffffffff8000daa3>] permission+0x81/0xc8
kernel: [<ffffffff8890ac80>] :gfs2:gfs2_lookupi+0x12e/0x16b
kernel: [<ffffffff8890ac3e>] :gfs2:gfs2_lookupi+0xec/0x16b
kernel: [<ffffffff88917a08>] :gfs2:gfs2_lookup+0x26/0xa7
kernel: [<ffffffff889089db>] :gfs2:gfs2_glock_put+0xfd/0x115
kernel: [<ffffffff80022970>] d_alloc+0x176/0x1ab
kernel: [<ffffffff8000d09c>] do_lookup+0x126/0x227
kernel: [<ffffffff8000a295>] __link_path_walk+0x9e6/0xf25
kernel: [<ffffffff8000eb23>] link_path_walk+0x45/0xb8
kernel: [<ffffffff8000cdf6>] do_path_lookup+0x294/0x310
kernel: [<ffffffff8002380c>] __path_lookup_intent_open+0x56/0x97
kernel: [<ffffffff8001b0e7>] open_namei+0x73/0x6ba
kernel: [<ffffffff800275c8>] do_filp_open+0x1c/0x38
kernel: [<ffffffff80019f9a>] do_sys_open+0x44/0xbe
kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0
- The filesystem is marked as withdrawn:
# for locktable in $(ls /sys/fs/gfs2/); do echo -n "Checking $locktable: "; if [ $(cat /sys/fs/gfs2/$locktable/withdraw) -eq 1 ]; then echo "Withdrawn"; else echo "OK"; fi; done
Checking rhcluster:app01: OK
Checking rhcluster:app02: Withdrawn
Checking rhcluster:app03: OK
- The issue reoccurs on another filesystem, or on the same filesystem again after
fsck.gfs2
has run and fixed any corruption present.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments