RHEL7.4: Linux NFS server's DRC memory limits can cause NFS client mount command hangs with repeated CREATE_SESSION / NFS4ERR_DELAY
Issue
After updating our dev/qa servers to rhel 7.4 last week, several servers are not able to mount an nfsv4 share, but mount hangs and in /var/log/messages we see kernel: NFS: nfs4_discover_server_trunking unhandled error -512. Exiting with error EIO
# mount -vvv -o rw,nosuid,soft,intr,rsize=8192,wsize=8192,vers=4.1,tcp foo.example.com:/export /mnt
mount.nfs: timeout set for Mon Aug 7 13:33:20 2017mount.nfs: trying text-based options
A packet capture shows the NFS server replying with NFS4ERR_DELAY and following error messages (also see attached pcap's):
NFS reply xid 3282429166 reply ok 44 getattr ERROR: Request couldn't be completed in time
NFS4ERR_DELAY
If we specify vers=3 or vers=4.0 as a mount option, the mount is accessible, only when left nondefined (which then picks up 4.1 as per 7.4 release default), 4.1, or 4.2 does it fail
Environment
- Red Hat Enterprise Linux 7.4 (NFS server)
- kernels from 3.10.0-693.el7.x86_64 and before kernel-3.10.0-693.21.1.el7`
- NFS client
- any NFS client using NFS4.1
- seen on RHEL7.4 Linux NFS client with nfs-utils-1.3.0-0.48.el7.x86_64 (default NFS version changed to NFS4.1)
- NFSv4.1
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.