NFS over UDP performance problem on RHEL6

Latest response

Dear RedHat users

I'm using NFSV3 over UDP for the home directories on a large RHEL6 client server.
-> Version 3 because my NFS server is only v3, fedora based and a little bit old.
-> UDP because it is an HA solution and, with TCP, NFS freeze when moving to the rescue server.
All this will change in the next months but I need time to setup a new HA solution.

With the home directory NFS mounted, working on the server is very slow: typing commands in a terminal show slowly each characters... But the server is runing fast when we launch applications on it (20 cores and 512Gb RAM) so the server is not overloaded.
Connected as root with homedir on the local disc the problem does not occur.
Mounting with tcp protocol does not show the problem but I cannot use this in the HA context. But this show that is is not a network bandwith problem (10Gb ethernet)

All other clients are CentOS6 or OpenSuse based and do not show this behavior, so it is not a NFS server overload problem too.

I will apprecriate any advice about RHEL nfs tuning for UDP protocol or howto investigate this strange problem.

Patrick

Responses

I would focus on resolving your TCP HA issues. NFS over UDP isn't supported at all in RHEL 7 so UDP isn't a sustainable path moving forward.

Are you fencing/rebooting the "old" server when you move clients away to a "new" server? That's required for NFS over TCP, so that the "old" server's TCP sessions don't get stuck if the clients move back to that server.

To troubleshoot UDP, I'd look at the mount options in use (refer to man nfs to understand them) and do a packet capture to see what's going back and forth between the NFS Client and NFS Server at times when the slowness is seen. Look for packet loss on the client and server with ethtool -S ethX and netstat -s at times when the slowness is seen. If you've applied any sysctls on the client and server, try to remove those and reboot to apply the default settings, there are many things which could cause poor UDP performance in there. You can see if you need to increase the number of NFS Server threads. I know you said CentOS/openSUSE NFS Clients work fine, but it's worth being scientific and investigating both ends, keep an open mind as to where any issue may lie.

Hi Jamie

The HA NFS solution is built mainly to avoid service disruption when the main server crashes. In case of power loss or system crash it is not possible to take into consideration a clean shutdown of this server. So moving to tcp for NFS may be more complicated, may be with virtual appliance... The options I use are the same for all clients (RedHat/Centos/openSUSE): rw,intr,hard,proto=udp,vers=3,wsize=1500,rsize=1500,retrans=6,sloppy wsize and rsize is needed because this network interface is set with jumboframes (for an other NFS server with very large sets of data, with tcp protocol on NFS and without HA) and the HA NFS server is not. I'll try to capture the ethernet communications between the faulty server and the RedHat client today.

Thanks

Patrick

I think you have misunderstood. Whenever TCP NFS Clients move off a HA NFS Server, the NFS Server should be fenced or rebooted, to clear the old NFS Server's TCP session states. This is the reason NFS hangs when you move the clients back to that server, because the NFS Client TCP connection state differs from the old NFS Server TCP connection state, because the NFS Client has been talking to another NFS Server.

That's a strange rsize and wsize, it will be grown up to 4096 anyway, and that's the absolute smallest block size so will incur a large performance overhead penalty. You don't need to change the NFS block size to accommodate the interface MTU, IP will fragment packets to the MTU as required. Depending on your usage, you probably want to make the blocksize larger, you could even just leave it at the default and it will negotiate with the NFS Server.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.