fs or clusterfs resource fails to stop when a process has its current working directory (cwd) within the resource's mountpoint in a RHEL 6 High Availability cluster
Environment
- Red Hat Enterprise Linux (RHEL) 6 with the High Availability Add On
resource-agents
releases starting with3.9.2-40.el6
up to, but not including,3.9.2-40.el6_5.5
- One or more
<fs/>
,<clusterfs/>
, or<netfs/>
resources in a service in/etc/cluster/cluster.conf
- One or more processes that change directories to, or set their current working directory to, a location on the mountpoint of one of those resources
Issue
- My fs resource fails to stop, even though I have
force_unmount
enabled - If a process has a cwd on the mountpoint for a cluster-managed
fs
orclusterfs
resource,rgmanager
can't stop that resource and the node self-fences
Mar 08 18:21:04 rgmanager Stopping service service:myService
Mar 08 18:21:26 rgmanager [fs] unmounting /myFS
Mar 08 18:21:26 rgmanager [fs] umount failed: 1
Mar 08 18:21:26 rgmanager [fs] Sending SIGTERM to processes on /myFS
Mar 08 18:21:31 rgmanager [fs] unmounting /myFS
Mar 08 18:21:31 rgmanager [fs] umount failed: 1
Mar 08 18:21:31 rgmanager [fs] Sending SIGKILL to processes on /myFS
Mar 08 18:21:36 rgmanager [fs] unmounting /myFS
Mar 08 18:21:36 rgmanager [fs] umount failed: 1
Mar 08 18:21:37 rgmanager [fs] Sending SIGKILL to processes on /myFS
Mar 08 18:21:37 rgmanager [fs] 'umount /myFS' failed, error=1
Mar 08 18:21:37 rgmanager [fs] umount failed - REBOOTING
Resolution
-
Update to
resource-agents-3.9.2-40.el6_5.5
or later, or toresource-agents-3.9.5-12.el6
or later. -
Also see the general recommendations for preventing a file-system-based resource from failing to stop.
Root Cause
This issue was resolved by Red Hat in Bugzilla #1051115 for RHEL 6 Update 6 and in #1051185 for RHEL 6 Update 5 with an asychronous erratum.
A change was made in RHEL 6 Update 5 (resource-agents-3.9.2-40.el6
) to the file system utility library used by several resource agents (fs
, clusterfs
, netfs
) that altered how those resources detect processes using the mountpoint in question and kill them if force_unmount
is set. This change was needed to address a separate issue that could cause a stop operation on one of these resource types to block if there was an unresponsive NFS mount anywhere on the system. This change to the fs utility library introduced a bug in that the resource agent would not detect or kill processes that did not directly have files open on the mountpoint but instead just had their current working directory ("cwd") on that mountpoint
. The end result was that if the file system could not be unmounted during a stop operation because a process still resided on that mountpoint, that process may not be killed and thus the <fs/>
, <clusterfs/>
, or <netfs/>
resource may fail to stop, even though force_unmount
is enabled. If self_fence
is enabled, this failure to stop would trigger the node to reboot itself.
A similar issue was later discovered affecting processes that utilize shared memory backed by the mountpoint managed by the resource, which is described in a separate solution.
Diagnostic Steps
-
While the
<fs/>
,<clusterfs/>
, or<netfs/>
resource is started, runlsof
and look for any processes that list a directory that resides within the resource'smountpoint
and where theFD
column listscwd
. For example:COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME myApp 4400 root cwd DIR 253,0 4096 2 /myFS
- For any process that is found, look through the
lsof
output to see if any other entries are listed for that process where it has open a file somewhere on thatmountpoint
. - If any process exists that does have a
cwd
on thatmountpoint
but does not hold any other file open, the resource is susceptible to failing to stop because of this issue.
- For any process that is found, look through the
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments