RHEL7: Possible data corruption when NFSv4.1 client does not handle LOCK reply of NFS4ERR_DENIED or other errors during recovery
Issue
Two NFSv4.1 clients connected to a NFS server and attempting to write to the same file which is protected by a lock. Under normal operation, one NFS client will obtain the lock then begin writing to the file, while the other NFS client repeatedly tries to obtain the lock. If a network partition occurs between the first client, and the lease expires, the second client will obtain the lock and start writing to the file. If the network partition then is remedied between the first client and the NFS server, with normal operation the error should be handled properly by the NFS client and the NFS client should EIO any outstanding writes (for more information, see https://access.redhat.com/solutions/1179643). However, it is possible the NFS client will receive a NFS4ERR_DENIED during recovery and not handle this error properly.
- NFS4.1 client had obtained a lock and was writing to a file, but a network partition occurred and the following message was seen in the log:
NFS: nfs4_reclaim_locks: unhandled error
Environment
- Red Hat Enterprise Linux 7 (NFS client)
- seen on kernel-3.10.0-693*.el7
- seen on kernel-3.10.0-514*el7
- seen on kernel-3.10.0-327*el7
- NFSv4.1
- seen with NetApp NFS server (version 8.3.2)
- other NFS servers likely affected
- may require write delegation support on NFS server
- Linux NFS server not believed to be affected
- could not be reproduced
- likely due to different NFS4 responses during recovery and lack of write delegation support
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
