Why does NFSv3 mounts NAS on Red Hat Enterprise Linux 5 hang ?

Solution Verified - Updated -

Environment

  • NFS client:

    • Red Hat Enterprise Linux 5.5
    • kernel 2.6.18-194.el5
  • NFS server:

    • NAS

Issue

  • Periodically NFSv3 mounts will enter a hung state. Attempting to remount the NFS mounts will also hang.
  • File systems can be unmounted with '-l' flag, but cannot be remounted until host is rebooted.
  • Need to force unmount about 4-5 times until is unmounted.
  • The problem happens in different servers.

Resolution

  • The solution is to fix the networking on the NAS box.
  • Since NAS is not responding nothing can be done/fixed on RHEL (OS) side

Root Cause

  • NAS side was not handshaking with RHEL due high load from its end.

Diagnostic Steps

From the netstat of the servers it is possible to see there are SYN_SENT connections which means that a session has been requested by the server and is waiting for reply from remote server (in this case the NAS).

HOST1:

$ netstat -taupen | egrep 2049
tcp        0      1 <HOST_IP>:855     <NAS_IP>:2049     SYN_SENT    0          13530      -     on (87.97/5/0)

HOST2:

$ netstat -taupen | egrep 2049
tcp        0      1 <HOST_IP>:922   <NAS_IP>:2049          SYN_SENT    0          10743   -     on (5.47/2/0)
  • The RHEL server requests the session (SYN) but no answer from NAS. After a long time the handshake is done:
140554 HOST1 -> NAS TCP 74 50852 > sunrpc [SYN] Seq=0 Win=5840 Len=0 MSS=1460 SACK_PERM=1 TSval=3811972621 TSecr=0 WS=128
141569 NAS -> HOST1 TCP 78 sunrpc > 50852 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=8960 SACK_PERM=1 WS=8 TSval=144100286 TSecr=3811972621
141581 HOST1 -> NAS TCP 66 50852 > sunrpc [ACK] Seq=1 Ack=1 Win=5888 Len=0 TSval=3811972622 TSecr=144100286
141612 HOST1 -> NAS Portmap 126 V2 GETPORT Call[Packet size limited during capture]
9472 438.142182 NAS -> HOST1 TCP 66 [TCP Window Update] sunrpc > 50852 [ACK] Seq=1 Ack=1 Win=139264 Len=0 TSval=144100286 TSecr=3811972622
142420 NAS -> HOST1 Portmap 98 V2 GETPORT Reply (Call In 9471)
142427 HOST1 -> NAS TCP 66 50852 > sunrpc [ACK] Seq=61 Ack=33 Win=5888 Len=0 TSval=3811972623 TSecr=144100286
142442 HOST1 -> NAS TCP 66 50852 > sunrpc [FIN, ACK] Seq=61 Ack=33 Win=5888 Len=0 TSval=3811972623 TSecr=144100286
143081 NAS -> HOST1 TCP 66 sunrpc > 50852 [ACK] Seq=33 Ack=62 Win=139200 Len=0 TSval=144100286 TSecr=3811972623
143090 NAS -> HOST1 TCP 66 sunrpc > 50852 [FIN, ACK] Seq=33 Ack=62 Win=139200 Len=0 TSval=144100286 TSecr=3811972623
143101 HOST1 -> NAS TCP 66 50852 > sunrpc [ACK] Seq=62 Ack=34 Win=5888 Len=0 TSval=3811972624 TSecr=144100286

There is a connectivity issue from the NAS end. From the servers there is no lost packet or error or drop or connection failure. We see the queueing of handshakes (SYN) which indicates the remote server (NAS) is not answering in time or never.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.