NFSv4 server restarts causes long pause in NFS client when try to open a file under the mount point
Environment
- Red Hat Enterprise Linux 5, 6, 7, 8, 9
- NFSv4.0
Issue
- NFSv4 server restarts cause a long pause in NFS client when try to cat a text file under the mount point.
- Setup a simple NFS export on a RHEL server.
/tmp *(rw,no_root_squash,fsid=0)
- Mount that export on another RHEL 6 server.
# mount -t nfs4 x.x.x.x:/ /mnt/tmp
- Use
cat
command on a text file under the mount. If the NFS service on the server with the export is restarted, there is a long pause when try tocat
the same text file under the mount point. It looks like the NFS client is failing to renew it's session and being forced to wait for the 90 second grace period.
Resolution
To decrease grace period, follow the steps below depending on the type of environment.
Red Hat Enterprise Linux 5 or later
-
Change the value of kernel parameter in the following files.
-
Note: lease time should also be set the same value as one for grace period. Since "lease time" is used for time interval of a file-lock-request between server and client, decreasing its parameter could impact network traffic to be busier.
# service nfs stop
# echo 10 > /proc/sys/fs/nfs/nlm_grace_period
# echo 10 > /proc/fs/nfsd/nfsv4gracetime
# echo 10 > /proc/fs/nfsd/nfsv4leasetime
# service nfs start
- Where
echo 10
, which means 10 seconds, is just an example.
Red Hat Cluster Suite and clusters based on Red Hat Enterprise Linux 6 with rgmanager
- In order to have the NFSv4 lease and grace time affected on failover the grace time must be applied during NFS service start.
- Edit
/etc/sysconfig/nfs
and add the following option:
NFSD_V4_GRACE=10
-
Note: Even though this is the correct configuration for modifying NFSv4 grace time on service start there are currently two open bugs that detail the above option not functioning when applied (tracked via Bugzilla #1063087 and #1063088). This is because nfsd enforces that the grace time is always greater than or equal to the lease time and will modify the value to reflect this over time. The init script that starts nfsd does not implement an
echo
intonfsv4leasetime
, onlynfsv4gracetime
. -
To workaround this behavior it's recommended to modify the
/etc/rc.d/init.d/nfs
init script manually from:
# Set v4 grace period if requested
[ -n "$NFSD_V4_GRACE" ] && {
echo "$NFSD_V4_GRACE" > /proc/fs/nfsd/nfsv4gracetime
}
to:
# Set v4 grace period if requested
[ -n "$NFSD_V4_GRACE" ] && {
echo "$NFSD_V4_GRACE" > /proc/fs/nfsd/nfsv4leasetime
echo "$NFSD_V4_GRACE" > /proc/fs/nfsd/nfsv4gracetime
}
Clusters based on Red Hat Enterprise Linux 6 with pacemaker and Red Hat Enterprise Linux 7, 8, 9
- In pacemaker-based clusters, the resource
ocf:heartbeat:nfsserver
is used. Any custom parameters to/etc/sysconfig/nfs
, such as grace time, should be entered into configuration of thenfsserver
resource itself. The reason behind this is that in pacemaker-based clusters, the NFSD configuration file/etc/sysconfig/nfs
is dynamically generated based on cluster-wide configuration (see CIB/Cluster Information Base ).
# pcs resource update your_nfs_server nfsd_args="--grace-time 10"
From description of nfsserver resource
# pcs resource describe nfsserver
Assumed agent name 'ocf:heartbeat:nfsserver' (deduced from 'nfsserver')
ocf:heartbeat:nfsserver - Manages an NFS server
Nfsserver helps to manage the Linux nfs server as a failover-able resource in Linux-HA.
It depends on Linux specific NFS implementation details, so is considered not portable to other platforms yet.
Resource options:
nfs_init_script: The default init script shipped with the Linux distro. The nfsserver resource agent offloads the start/stop/monitor work to the init script because the procedure to start/stop/monitor nfsserver varies on different Linux distro. In the event that this
option is not set, this agent will attempt to use an init script at this location, /etc/init.d/nfs, or detect a systemd unit-file to use in the event that no init script is detected.
nfs_no_notify: Do not send reboot notifications to NFSv3 clients during server startup.
nfs_notify_foreground: Keeps the sm-notify attached to its controlling terminal and running in the foreground.
nfs_smnotify_retry_time: Specifies the length of sm-notify retry time, in minutes, to continue retrying notifications to unresponsive hosts. If this option is not specified, sm-notify attempts to send notifications for 15 minutes. Specifying a value of 0 causes sm-notify
to continue sending notifications to unresponsive peers until it is manually killed.
nfs_ip: Comma separated list of floating IP addresses used to access the nfs service
nfsd_args: Specifies what arguments to pass to the nfs daemon on startup. View the rpc.nfsd man page for information on what arguments are available. Note that setting this value will override all settings placed in the local /etc/sysconfig/nfs file.
lockd_udp_port: The udp port lockd should listen on. Note that setting this value will override all settings placed in the local /etc/sysconfig/nfs file.
lockd_tcp_port: The tcp port lockd should listen on. Note that setting this value will override all settings placed in the local /etc/sysconfig/nfs file.
statd_outgoing_port: The source port number sm-notify uses when sending reboot notifications. Note that setting this value will override all settings placed in the local /etc/sysconfig/nfs file.
statd_port: The port number used for RPC listener sockets. Note that setting this value will override all settings placed in the local /etc/sysconfig/nfs file.
mountd_port: The port number used for rpc.mountd listener sockets. Note that setting this value will override all settings placed in the local /etc/sysconfig/nfs file.
rquotad_port: The port number used for rpc.rquotad. Note that setting this value will override all settings placed in the local /etc/sysconfig/nfs file.
nfs_shared_infodir: The nfsserver resource agent will save nfs related information in this specific directory. And this directory must be able to fail-over before nfsserver itself.
rpcpipefs_dir: The mount point for the sunrpc file system. Default is /var/lib/nfs/rpc_pipefs. This script will mount (bind) nfs_shared_infodir on /var/lib/nfs/ (cannot be changed), and this script will mount the sunrpc file system on /var/lib/nfs/rpc_pipefs (default, can
be changed by this parameter). If you want to move only rpc_pipefs/ (e.g. to keep rpc_pipefs/ local) from default, please set this value.
Default operations:
start: interval=0s timeout=40
stop: interval=0s timeout=20s
monitor: interval=10 timeout=20s
Root Cause
- The delay is because of a 90 second grace period.
-
The purpose of the grace period is to give the clients enough time to notice that the server has rebooted, and to reclaim their existing locks without danger of having somebody else steal the lock from them. This is definitely a strongly recommended feature to prevent any data corruption in your mailbox/database/logfile/... that relies on those locks. NFSv4 RFC says,
During the grace period, the server must reject READ and WRITE operations and non-reclaim locking requests (i.e., other LOCK and OPEN operations) with an error of NFS4ERR_GRACE.
-
Retransmit interval for
NFS4ERR_GRACE
is 0.1*2^n seconds (max: 15), and clients may need to wait for more than 90 seconds. -
In NFSv4.1,
RECLAIM_COMPLETE
call is defined, and a client can notify a server that reclaim is finished. If all NFS clients sendRECLAIM_COMPLETE
, the server does not delay to respond. NFSv4.1 RFC says,A RECLAIM_COMPLETE operation is used to indicate that the client has reclaimed all of the locking state that it will recover, when it is recovering state due to either a server restart or the transfer of a file system to another server.
Diagnostic Steps
- Capture a
tcpdump
on the NFS client using the command:
# tcpdump -s0 -i INTERFACE host NFS.SERVER.IP -w /tmp/tcpdump.pcap
-
Where 'INTERFACE' is the ethernet interface that communicates with the NFS server.
-
Open the
tcpdump
with wireshark and look for NFS4ERR_GRACE replies for outgoing OPEN calls:
55 2014-11-08 15:43:21.569853 10.12.13.14 -> 10.12.13.25 NFS 330 V4 Call OPEN DH: 0x1178f166/foo
56 2014-11-08 15:43:21.569895 10.12.13.14 -> 10.12.13.25 NFS 122 V4 Reply (Call In 55) OPEN Status: NFS4ERR_GRACE
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments