- Red Hat Enterprise Linux 5.5
- kernel 2.6.18-194.el5
- Periodically NFSv3 mounts will enter a hung state. Attempting to remount the NFS mounts will also hang.
- File systems can be unmounted with '-l' flag, but cannot be remounted until host is rebooted.
- Need to force unmount about 4-5 times until is unmounted.
- The problem happens in different servers.
- The solution is to fix the networking on the NAS box.
- Since NAS is not responding nothing can be done/fixed on RHEL (OS) side
- NAS side was not handshaking with RHEL due high load from its end.
From the netstat of the servers it is possible to see there are SYN_SENT connections which means that a session has been requested by the server and is waiting for reply from remote server (in this case the NAS).
$ netstat -taupen | egrep 2049 tcp 0 1 <HOST_IP>:855 <NAS_IP>:2049 SYN_SENT 0 13530 - on (87.97/5/0)
$ netstat -taupen | egrep 2049 tcp 0 1 <HOST_IP>:922 <NAS_IP>:2049 SYN_SENT 0 10743 - on (5.47/2/0)
- The RHEL server requests the session (SYN) but no answer from NAS. After a long time the handshake is done:
140554 HOST1 -> NAS TCP 74 50852 > sunrpc [SYN] Seq=0 Win=5840 Len=0 MSS=1460 SACK_PERM=1 TSval=3811972621 TSecr=0 WS=128 141569 NAS -> HOST1 TCP 78 sunrpc > 50852 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=8960 SACK_PERM=1 WS=8 TSval=144100286 TSecr=3811972621 141581 HOST1 -> NAS TCP 66 50852 > sunrpc [ACK] Seq=1 Ack=1 Win=5888 Len=0 TSval=3811972622 TSecr=144100286 141612 HOST1 -> NAS Portmap 126 V2 GETPORT Call[Packet size limited during capture] 9472 438.142182 NAS -> HOST1 TCP 66 [TCP Window Update] sunrpc > 50852 [ACK] Seq=1 Ack=1 Win=139264 Len=0 TSval=144100286 TSecr=3811972622 142420 NAS -> HOST1 Portmap 98 V2 GETPORT Reply (Call In 9471) 142427 HOST1 -> NAS TCP 66 50852 > sunrpc [ACK] Seq=61 Ack=33 Win=5888 Len=0 TSval=3811972623 TSecr=144100286 142442 HOST1 -> NAS TCP 66 50852 > sunrpc [FIN, ACK] Seq=61 Ack=33 Win=5888 Len=0 TSval=3811972623 TSecr=144100286 143081 NAS -> HOST1 TCP 66 sunrpc > 50852 [ACK] Seq=33 Ack=62 Win=139200 Len=0 TSval=144100286 TSecr=3811972623 143090 NAS -> HOST1 TCP 66 sunrpc > 50852 [FIN, ACK] Seq=33 Ack=62 Win=139200 Len=0 TSval=144100286 TSecr=3811972623 143101 HOST1 -> NAS TCP 66 50852 > sunrpc [ACK] Seq=62 Ack=34 Win=5888 Len=0 TSval=3811972624 TSecr=144100286
There is a connectivity issue from the NAS end. From the servers there is no lost packet or error or drop or connection failure. We see the queueing of handshakes (SYN) which indicates the remote server (NAS) is not answering in time or never.
- Red Hat Enterprise Linux
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.