NFS becomes unresponsive after nfsd errors related to TCP shutting down socket

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 5
  • Versions of nfs-utils package older than 1.0.9-47

Issue

  • After a few minutes of seeing errors about NFS scrolling on the console, the system becomes unresponsive
  • Following error messages are observed on NFS client
kernel: nfs: server server.hostname not responding, still trying
kernel: nfs: server server.hostname OK
  • And following error messages are observed on NFS server.
kernel: rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket

Resolution

  • Update nfs-utils to version 1.0.9-47 or higher
 # umount /nfs/share
 # yum update nfs-utils
 # mount nfs.example.com:/share /nfs/share

 # rpm -q nfs-utils  ==> shows the upgraded version of nfs-utils

Root Cause

  • This issue occurs as a result of the NFS threads being unable to service the requests faster than they come in from the clients.

Diagnostic Steps

  • Check the following errors in /var/log/messages file:
kernel: rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket
kernel: rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket
last message repeated 3 times
kernel: rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket
last message repeated 2 times
last message repeated 13 times
  • We need to find out what is making nfsd go in a loop. Can we get a kernel dump after this problem happens? It might be interesting to know where the nfsd threads are hanging. Please use the steps in following article for detailed steps on how to enable kexec/kdump on RHEL:
    How do I configure kexec/kdump on Red Hat Enterprise Linux?

  • Enable SysRq facility on the system, and collect sysrq-t trace when nfsd is hanging, then attach the results which are dumped to the log. Run sysrq-t trace again after nfsd stops hanging.
    What is the SysRq facility and how do I use it?

  • Also capture network dumps on the server when this issue occurs, please refer to the steps in below article for detailed information on how to capture the network dumps:
    How to capture network packets with tcpdump?

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments