NFS becomes unresponsive after nfsd errors related to TCP shutting down socket
Environment
- Red Hat Enterprise Linux 5
- Versions of
nfs-utilspackage older than 1.0.9-47
Issue
- After a few minutes of seeing errors about NFS scrolling on the console, the system becomes unresponsive
- Following error messages are observed on NFS client
kernel: nfs: server server.hostname not responding, still trying
kernel: nfs: server server.hostname OK
- And following error messages are observed on NFS server.
kernel: rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket
Resolution
- Update
nfs-utilsto version1.0.9-47or higher
# umount /nfs/share
# yum update nfs-utils
# mount nfs.example.com:/share /nfs/share
# rpm -q nfs-utils ==> shows the upgraded version of nfs-utils
- Try to expand TCP socket buffer. Refer to How can I tune the TCP Socket Buffers?
Root Cause
- This issue occurs as a result of the NFS threads being unable to service the requests faster than they come in from the clients.
Diagnostic Steps
- Check the following errors in
/var/log/messagesfile:
kernel: rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket
kernel: rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket
last message repeated 3 times
kernel: rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket
last message repeated 2 times
last message repeated 13 times
-
We need to find out what is making
nfsdgo in a loop. Can we get a kernel dump after this problem happens? It might be interesting to know where thenfsdthreads are hanging. Please use the steps in following article for detailed steps on how to enable kexec/kdump on RHEL:
How do I configure kexec/kdump on Red Hat Enterprise Linux? -
Enable SysRq facility on the system, and collect
sysrq-ttrace whennfsdis hanging, then attach the results which are dumped to the log. Runsysrq-ttrace again afternfsdstops hanging.
What is the SysRq facility and how do I use it? -
Also capture network dumps on the server when this issue occurs, please refer to the steps in below article for detailed information on how to capture the network dumps:
How to capture network packets with tcpdump?
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
