GFS2 access blocks throughout cluster following a withdrawal and the logs on those nodes show "Trying to acquire journal lock...Busy" in RHEL 6
Environment
- Red Hat Enterprise Linux (RHEL) 6 with the Resilient Storage Add On
kernel
releases starting with2.6.32-431.el6
in RHEL 6 Update 5, or with2.6.32-358.14.1.el6
in RHEL 6 Update 4- GFS2
- A condition is present which has resulted in a GFS2 withdrawl on one node
Issue
- After GFS2 withdrew on one node, the rest of the cluster was unable to access the file system
- One node had an I/O error and we saw a withdrawl, then the other node in the cluster was blocked after showing:
Jun 12 12:07:15 node1 kernel: GFS2: fsid=gfs2-cluster:gfs-data-lv00.0: jid=1: Trying to acquire journal lock...
Jun 12 12:07:15 node1 kernel: GFS2: fsid=gfs2-cluster:gfs-data-lv00.0: jid=1: Busy
Resolution
-
Update to
kernel-2.6.32-504.30.3.el6
or later. -
Workaround to prevent this issue: Set
errors=panic
in the mount options for any GFS2 file systems, to cause a node to panic and be fenced rather than withdraw. For example, when mounting via/etc/fstab
:
/dev/clustervg/lv1 /mnt/lv1 gfs2 defaults,errors=panic 0 0
Or when using a clusterfs
resource in /etc/cluster/cluster.conf
:
<clusterfs name="clustervg-lv1-gfs2" device="/dev/clustervg/lv1" mountpoint="/mnt/lv1 fstype="gfs2" options="defaults,errors=panic" fsid="1234" force_unmount="0" self_fence="1"/>
Root Cause
This issue is being investigated by Red Hat in Bugzilla #1110846.
In kernels prior to those listed in the Environment section above, GFS2 withdrawl handling had some issues that would result in the kernel on the withdrawn node releasing the DLM lockspace for that GFS2 file system before the rest of the cluster had set themselves up to handle recovery; the end result is that there was the potential for other nodes and processes to quickly get locks that the withdrawn node held before any node had a chance to replay the journal and complete recovery. This could cause unexpected behavior, and so GFS2 was modified to wait for the rest of the cluster to be ready before the DLM lockspace would be freed. This happened via Bugzilla #908093 (6.5) and 927308 (6.4.z).
With this change came an unexpected race condition: if the withdrawing node takes a bit of time to carry out its withdrawl procedures (the slowest part of which is often the 'dmsetup suspend' it must carry out), then its release of the DLM lockspace may be delayed slightly. In this case, another node may proceed to attempting journal recovery before that lockspace has been released, and thus its lock request for that journal is denied. This operation is not retried, and the result is that this and any other remaining nodes (besides the withdrawn node) are stuck waiting for recovery indefinintely, and will never complete it.
Diagnostic Steps
-
Look in all nodes'
/var/log/messages
for signs of a GFS2 withdrawl, which should also be accompanied by a backtrace -
Look in other nodes' (besides the withdrawn node)
/var/log/messages
for evidence that they attempted to get a journal lock, and found it was busy:
Jun 12 12:07:15 node1 kernel: GFS2: fsid=gfs2-cluster:gfs-data-lv00.0: jid=1: Trying to acquire journal lock...
Jun 12 12:07:15 node1 kernel: GFS2: fsid=gfs2-cluster:gfs-data-lv00.0: jid=1: Busy
If all nodes show the same symptom, then this bug is likely what occurred.
- Look in
cman_tool services
on blocked nodes and determine if theGFS2
mountgroup shows "blocked"
# cman_tool services
[...]
gfs mountgroups
name lv1
id 0x0ba5c4d3
flags 0x0000004c mounted,blocked
change member 1 joined 0 remove 1 failed 0 seq 3,3
members 1
If so, this bug may be at fault.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments