RHEL6: NFS4 server incorrectly returning NFS4ERR_EXPIRED to WRITE due to wraparound of current_fileid leads to infinite protocol loop with NFS4 client
Issue
- NFS4 client hangs to an NFS server that has been up a long time or has handled OPENs for 2^32 or more files
- Linux NFS4 client hangs on a single WRITE with the Linux NFS4 server returning NFS4ERR_EXPIRED to the WRITE repeatedly even after error recovery.
- In the Linux NFS server the current_fileid is a 32-bit counter and if it wraps around, it you can get into an infinite protocol loop of:
- WRITE / NFS4ERR_EXPIRED
- RENEW / NFS4_OK
- OPEN / NFS4_OK (with new stateid, same clientid used in the open that was sent in the RENEW)
- WRITE / NFS4ERR_EXPIRED (uses new stateid returned in the OPEN reply)
Environment
- Red Hat Enterprise Linux 6 (NFS server)
- any kernel prior to kernel-2.6.32-675.el6
- any kernel prior to kernel-2.6.32-642.15.1.el6
- seen on 2.6.32-573.1.1.el6 (a rhel6.7.z kernel) and other RHEL6 kernels
- NFS4
- RHEL used as NFS server
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.