Why did my GFS2 filesystem withdraw with: function = get_leaf, file = fs/gfs2/dir.c, line = 763

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux Server 5.8 or 5.9 (with the High Availability Add on)
  • Affected kernels are between 2.6.18-274.12.1.el5 and 2.6.18-348.1.1.el5 (not including either of these kernels)
  • GFS2 filesystems

Issue

  • A GFS2 withdraw is occurring similar to the following:
GFS2: fsid=MyCluster:MyGFS.0: function = get_leaf, file = fs/gfs2/dir.c, line = 763

Resolution

Update to kernel-2.6.18-348.1.1.el5 or later in RHEL 5 Update 9

Update to kernel-2.6.18-371.el5 or later in RHEL 5 Update 10 and above

Root Cause

Previously, GFS2 did not properly free directory hash table memory from cache when the directory was removed from cache. If the same
GFS2 inode was later reused as another directory, the stale directory hash table was reused instead of reading the correct
information from the media. If the GFS2 hash table was not reused, a small amount of memory was lost until the next reboot.

If the hash table was reused, the directory could become corrupt. Later, GFS2 could discover the file system inconsistency and
withdraw from the file system, making it unavailable until the system was rebooted. This update applies a patch to the kernel that
frees the directory hash table correctly from cache and prevents this file system corruption. 

Diagnostic Steps

If the following symptoms exist, this solution may apply:

  • Filesystem withdraw's do not occur if a kernel including 2.6.18-274.12.1.el5 or earlier is used, or a kernel including 2.6.18-348.1.1.el5 or later is used.
  • Review the /var/log/messages file(s) and look for the GFS2 withdraw message similar to:
kernel: GFS2: fsid=rhcluster:app02.2: fatal: invalid metadata block
kernel: GFS2: fsid=rhcluster:app02.2:   bh = 9120530 (magic number)
kernel: GFS2: fsid=rhcluster:app02.2:   function = get_leaf, file = fs/gfs2/dir.c, line = 763
kernel: GFS2: fsid=rhcluster:app02.2: about to withdraw this file system
kernel: GFS2: fsid=rhcluster:app02.2: telling LM to withdraw
kernel: GFS2: fsid=rhcluster:app02.2: withdrawn
kernel:
kernel: Call Trace:
kernel:  [<ffffffff8890c764>] :gfs2:gfs2_lm_withdraw+0xd3/0x100
kernel:  [<ffffffff80063a2a>] __wait_on_bit+0x60/0x6e
kernel:  [<ffffffff8001558e>] sync_buffer+0x0/0x3f
kernel:  [<ffffffff88902d8f>] :gfs2:gfs2_dirent_find+0x0/0x4d
kernel:  [<ffffffff80063aa4>] out_of_line_wait_on_bit+0x6c/0x78
kernel:  [<ffffffff800a34d5>] wake_bit_function+0x0/0x23
kernel:  [<ffffffff8001aaeb>] submit_bh+0x10d/0x114
kernel:  [<ffffffff88920803>] :gfs2:gfs2_meta_check_ii+0x2c/0x38
kernel:  [<ffffffff889023ca>] :gfs2:get_leaf+0x6b/0xa8
kernel:  [<ffffffff889029e6>] :gfs2:get_first_leaf+0x2a/0x31
kernel:  [<ffffffff88902a70>] :gfs2:gfs2_dirent_search+0x83/0x16e
kernel:  [<ffffffff8890403a>] :gfs2:gfs2_dir_search+0x21/0x73
kernel:  [<ffffffff8000daa3>] permission+0x81/0xc8
kernel:  [<ffffffff8890ac80>] :gfs2:gfs2_lookupi+0x12e/0x16b
kernel:  [<ffffffff8890ac3e>] :gfs2:gfs2_lookupi+0xec/0x16b
kernel:  [<ffffffff88917a08>] :gfs2:gfs2_lookup+0x26/0xa7
kernel:  [<ffffffff889089db>] :gfs2:gfs2_glock_put+0xfd/0x115
kernel:  [<ffffffff80022970>] d_alloc+0x176/0x1ab
kernel:  [<ffffffff8000d09c>] do_lookup+0x126/0x227
kernel:  [<ffffffff8000a295>] __link_path_walk+0x9e6/0xf25
kernel:  [<ffffffff8000eb23>] link_path_walk+0x45/0xb8
kernel:  [<ffffffff8000cdf6>] do_path_lookup+0x294/0x310
kernel:  [<ffffffff8002380c>] __path_lookup_intent_open+0x56/0x97
kernel:  [<ffffffff8001b0e7>] open_namei+0x73/0x6ba
kernel:  [<ffffffff800275c8>] do_filp_open+0x1c/0x38
kernel:  [<ffffffff80019f9a>] do_sys_open+0x44/0xbe
kernel:  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
  • The filesystem is marked as withdrawn:
# for locktable in $(ls /sys/fs/gfs2/); do echo -n "Checking $locktable: "; if [ $(cat /sys/fs/gfs2/$locktable/withdraw) -eq 1 ]; then echo "Withdrawn"; else echo "OK"; fi; done
Checking rhcluster:app01: OK
Checking rhcluster:app02: Withdrawn
Checking rhcluster:app03: OK
  • The issue reoccurs on another filesystem, or on the same filesystem again after fsck.gfs2 has run and fixed any corruption present.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments