Processes running on GFS2 Filesystem cluster resources become deadlocked during glock demotion

Solution Verified - Updated -

Issue

While running on RHEL 8.8z kernel 4.18.0-477.89.1.el8_8 or RHEL 8.10z kernel 4.18.0-553.33.1.el8_10 or RHEL 9.5z kernel 5.14.0-503.21.1.el9_5 or higher, several processes active on the gfs2 filesystem will become blocked indefinitely. There are no clear ways to get these process unblocked.

The below errors may be observed in messages when the issue becomes present, and you may see the same thread blocking for long periods of time:

$ cat /var/log/messages
-----------------------------------------8<----------------------------------------- 
Jan 13 08:17:05 rhel8-node1 kernel: INFO: task thread1:2064907 blocked for more than 120 seconds.
Jan 13 08:17:05 rhel8-node1 kernel:      Not tainted 4.18.0-553.33.1.el8_10.x86_64 #1
Jan 13 08:17:05 rhel8-node1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 13 08:17:05 rhel8-node1 kernel: task:perl            state:D stack:0     pid:2064907 ppid:2064906 flags:0x10004080
Jan 13 08:17:05 rhel8-node1 kernel: Call Trace:
Jan 13 08:17:05 rhel8-node1 kernel: __schedule+0x2d1/0x870
Jan 13 08:17:05 rhel8-node1 kernel: ? gdlm_lock+0x1f6/0x2e0 [gfs2]
Jan 13 08:17:05 rhel8-node1 kernel: ? bit_wait_io+0x50/0x50
Jan 13 08:17:05 rhel8-node1 kernel: schedule+0x55/0xf0
Jan 13 08:17:05 rhel8-node1 kernel: bit_wait+0xd/0x50
Jan 13 08:17:05 rhel8-node1 kernel: __wait_on_bit+0x2d/0x90
Jan 13 08:17:05 rhel8-node1 kernel: out_of_line_wait_on_bit+0x91/0xb0
Jan 13 08:17:05 rhel8-node1 kernel: ? var_wake_function+0x30/0x30
Jan 13 08:17:05 rhel8-node1 kernel: gfs2_glock_wait+0x3b/0x90 [gfs2]
Jan 13 08:17:05 rhel8-node1 kernel: __gfs2_lookup+0x9d/0x150 [gfs2]
Jan 13 08:17:05 rhel8-node1 kernel: ? __gfs2_lookup+0x95/0x150 [gfs2]
Jan 13 08:17:05 rhel8-node1 kernel: __lookup_slow+0x97/0x160
Jan 13 08:17:05 rhel8-node1 kernel: lookup_slow+0x35/0x50
Jan 13 08:17:05 rhel8-node1 kernel: walk_component+0x1c3/0x300
Jan 13 08:17:05 rhel8-node1 kernel: ? nd_jump_root+0xb9/0xf0
Jan 13 08:17:05 rhel8-node1 kernel: path_lookupat.isra.43+0x79/0x220
Jan 13 08:17:05 rhel8-node1 kernel: ? audit_copy_inode+0x94/0xd0
Jan 13 08:17:05 rhel8-node1 kernel: filename_lookup.part.58+0xa0/0x170
Jan 13 08:17:05 rhel8-node1 kernel: ? getname_flags+0x4a/0x1e0
Jan 13 08:17:05 rhel8-node1 kernel: ? __check_object_size+0xac/0x173
Jan 13 08:17:05 rhel8-node1 kernel: ? path_get+0x11/0x30
Jan 13 08:17:05 rhel8-node1 kernel: ? audit_alloc_name+0x132/0x150
Jan 13 08:17:05 rhel8-node1 kernel: ? __audit_getname+0x2d/0x50
Jan 13 08:17:05 rhel8-node1 kernel: vfs_statx+0x74/0xe0
Jan 13 08:17:05 rhel8-node1 kernel: __do_sys_newlstat+0x39/0x70
Jan 13 08:17:05 rhel8-node1 kernel: ? syscall_trace_enter+0x1ff/0x2d0
Jan 13 08:17:05 rhel8-node1 kernel: do_syscall_64+0x5b/0x1a0
Jan 13 08:17:05 rhel8-node1 kernel: entry_SYSCALL_64_after_hwframe+0x66/0xcb
Jan 13 08:17:05 rhel8-node1 kernel: RIP: 0033:0x7f122812cbb9
-----------------------------------------8<----------------------------------------- 
Jan 13 08:35:30 rhel8-node1 kernel: INFO: task thread1:2064907 blocked for more than 120 seconds.
Jan 13 08:35:30 rhel8-node1 kernel:      Not tainted 4.18.0-553.33.1.el8_10.x86_64 #1
Jan 13 08:35:30 rhel8-node1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 13 08:35:30 rhel8-node1 kernel: task:perl            state:D stack:0     pid:2064907 ppid:2064906 flags:0x10004080
Jan 13 08:35:30 rhel8-node1 kernel: Call Trace:
Jan 13 08:35:30 rhel8-node1 kernel: __schedule+0x2d1/0x870
Jan 13 08:35:30 rhel8-node1 kernel: ? gdlm_lock+0x1f6/0x2e0 [gfs2]
Jan 13 08:35:30 rhel8-node1 kernel: ? bit_wait_io+0x50/0x50
Jan 13 08:35:30 rhel8-node1 kernel: schedule+0x55/0xf0
Jan 13 08:35:30 rhel8-node1 kernel: bit_wait+0xd/0x50
Jan 13 08:35:30 rhel8-node1 kernel: __wait_on_bit+0x2d/0x90
Jan 13 08:35:30 rhel8-node1 kernel: out_of_line_wait_on_bit+0x91/0xb0
Jan 13 08:35:30 rhel8-node1 kernel: ? var_wake_function+0x30/0x30
Jan 13 08:35:30 rhel8-node1 kernel: gfs2_glock_wait+0x3b/0x90 [gfs2]
Jan 13 08:35:30 rhel8-node1 kernel: __gfs2_lookup+0x9d/0x150 [gfs2]
Jan 13 08:35:30 rhel8-node1 kernel: ? __gfs2_lookup+0x95/0x150 [gfs2]
Jan 13 08:35:30 rhel8-node1 kernel: __lookup_slow+0x97/0x160
Jan 13 08:35:30 rhel8-node1 kernel: lookup_slow+0x35/0x50
Jan 13 08:35:30 rhel8-node1 kernel: walk_component+0x1c3/0x300
Jan 13 08:35:30 rhel8-node1 kernel: ? nd_jump_root+0xb9/0xf0
Jan 13 08:35:30 rhel8-node1 kernel: path_lookupat.isra.43+0x79/0x220
Jan 13 08:35:30 rhel8-node1 kernel: ? audit_copy_inode+0x94/0xd0
Jan 13 08:35:30 rhel8-node1 kernel: filename_lookup.part.58+0xa0/0x170
Jan 13 08:35:30 rhel8-node1 kernel: ? getname_flags+0x4a/0x1e0
Jan 13 08:35:30 rhel8-node1 kernel: ? __check_object_size+0xac/0x173
Jan 13 08:35:30 rhel8-node1 kernel: ? path_get+0x11/0x30
Jan 13 08:35:30 rhel8-node1 kernel: ? audit_alloc_name+0x132/0x150
Jan 13 08:35:30 rhel8-node1 kernel: ? __audit_getname+0x2d/0x50
Jan 13 08:35:30 rhel8-node1 kernel: vfs_statx+0x74/0xe0
Jan 13 08:35:30 rhel8-node1 kernel: __do_sys_newlstat+0x39/0x70
Jan 13 08:35:30 rhel8-node1 kernel: ? syscall_trace_enter+0x1ff/0x2d0
Jan 13 08:35:30 rhel8-node1 kernel: do_syscall_64+0x5b/0x1a0
Jan 13 08:35:30 rhel8-node1 kernel: entry_SYSCALL_64_after_hwframe+0x66/0xcb
Jan 13 08:35:30 rhel8-node1 kernel: RIP: 0033:0x7f122812cbb9

In some rare cases, this issue may additionally result in a filesystem withdrawal. If this is observed an fsck is additionally recommended per the below article:

$ cat /var/log/messages
-----------------------------------------8<----------------------------------------- 
Jan 15 03:40:17 rhel8-node1 kernel: [434827.820631] gfs2: fsid=rhel8-cluster1:gfs2-lv1.2: fatal: invalid metadata block
Jan 15 03:40:17 rhel8-node1 kernel: [434827.820631]   bh = 176983977 (magic number)
Jan 15 03:40:17 rhel8-node1 kernel: [434827.820631]   function = gfs2_meta_buffer, file = fs/gfs2/meta_io.c, line = 499
Jan 15 03:40:17 rhel8-node1 kernel: [434827.820709] gfs2: fsid=rhel8-cluster1:gfs2-lv1.2: about to withdraw this file system
Jan 15 03:40:17 rhel8-node1 kernel: gfs2: fsid=rhel8-cluster1:gfs2-lv1.2: fatal: invalid metadata block#012  bh = 176983977 (magic number)#012  function = gfs2_meta_buffer, file = fs/gfs2/meta_io.c, line = 499
Jan 15 03:40:17 rhel8-node1 kernel: gfs2: fsid=rhel8-cluster1:gfs2-lv1.2: about to withdraw this file system
Jan 15 03:40:22 rhel8-node1 kernel: [434833.111628] gfs2: fsid=rhel8-cluster1:gfs2-lv1.2: Requesting recovery of jid 2.
Jan 15 03:40:22 rhel8-node1 kernel: gfs2: fsid=rhel8-cluster1:gfs2-lv1.2: Requesting recovery of jid 2.
Jan 15 03:40:22 rhel8-node1 kernel: [434833.578206] gfs2: fsid=rhel8-cluster1:gfs2-lv1.2: Journal recovery complete for jid 2.
Jan 15 03:40:22 rhel8-node1 kernel: [434833.578210] gfs2: fsid=rhel8-cluster1:gfs2-lv1.2: Glock dequeues delayed: 0
Jan 15 03:40:22 rhel8-node1 kernel: [434833.588805] gfs2: fsid=rhel8-cluster1:gfs2-lv1.2: telling LM to unmount
Jan 15 03:40:22 rhel8-node1 kernel: [434833.588856] dlm: gfs2-lv1: leaving the lockspace group...
Jan 15 03:40:22 rhel8-node1 kernel: [434833.597029] gfs2: fsid=rhel8-cluster1:gfs2-lv1.2: recover_prep ignored due to withdraw.
Jan 15 03:40:22 rhel8-node1 kernel: [434833.597218] dlm: gfs2-lv1: group event done 0 0
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603408] dlm: gfs2-lv1: release_lockspace final free
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603437] gfs2: fsid=rhel8-cluster1:gfs2-lv1.2: File system withdrawn
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603469] CPU: 0 PID: 4018548 Comm: lsof Kdump: loaded Not tainted 4.18.0-553.33.1.el8_10.x86_64 #1
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603472] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603473] Call Trace:
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603503]  dump_stack+0x41/0x60
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603511]  gfs2_withdraw.cold.14+0xc5/0x418 [gfs2]
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603532]  gfs2_meta_check_ii+0x2f/0x50 [gfs2]
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603545]  gfs2_meta_buffer+0x10b/0x120 [gfs2]
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603557]  ? bit_wait_io+0x50/0x50
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603561]  gfs2_inode_refresh+0x34/0x280 [gfs2]
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603573]  ? bit_wait_io+0x50/0x50
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603574]  inode_go_instantiate+0x1c/0x40 [gfs2]
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603585]  gfs2_instantiate+0x85/0xc0 [gfs2]
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603595]  gfs2_glock_holder_ready.part.48+0xe/0x30 [gfs2]
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603605]  gfs2_xattr_get+0x192/0x1d0 [gfs2]
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603614]  ? nd_jump_link+0x53/0xd0
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603619]  ? lockref_put_or_lock+0x5f/0x80
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603623]  ? gfs2_xattr_get+0x188/0x1d0 [gfs2]
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603632]  __vfs_getxattr+0x54/0x70
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603636]  inode_doinit_use_xattr+0x63/0x170
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603640]  inode_doinit_with_dentry+0x350/0x500
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603644]  __inode_security_revalidate+0x45/0x80
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603647]  selinux_inode_getattr+0x5a/0x90
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603651]  security_inode_getattr+0x30/0x50
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603654]  vfs_getattr+0x1c/0x50
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603659]  vfs_statx+0x8a/0xe0
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603662]  __do_sys_newstat+0x39/0x70
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603665]  ? syscall_trace_enter+0x1ff/0x2d0
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603669]  ? vfs_read+0x121/0x150
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603671]  do_syscall_64+0x5b/0x1a0
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603675]  entry_SYSCALL_64_after_hwframe+0x66/0xcb
Jan 15 03:40:22 rhel8-node1 kernel: [434833.603679] RIP: 0033:0x7fcf4f598b09

Environment

  • Red Hat Enterprise Linux Server 8, 9 (with the High Availability Add On and Resilient Storage Add Ons)
  • A Global Filesystem 2(gfs2)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content