RHEL 6 NFS clients become unresponsive due to invalid rpc_xprt->snd_task pointer

Solution Verified - Updated 2014-08-25T15:31:10+00:00 -

Issue

We have seen several (10 or so) cases so far since our 6.3 upgrade where an nfs mount on a client becomes unresponsive. df will hang, commands that access the mount will hang.
New mounts to the same cluster through new nodes will succeed. New mount requests to the same cluster through the stuck server-node will hang.
multiple servers which randomly but regularly lose access to 9 NFS shares. The NFS server is some EMC filer. The shares are using TCP. UDP shares are not affected. The same TCP shares on other yet-unaffected servers are fine. NFS server during the issue is network-available and correctly answers to "showmount -e" request.
NFS clients become unresponsive, with "failed to lock transport" messages appearing for each RPC request made on the stuck RPC transport (if verbose RPC debugging is enabled)

Originally see on Red Hat Enterprise Linux 6.3 (NFS Client)
- Originally seen on kernel-2.6.32-279.14.1.el6, believed other kernels between 2.6.32-220.el6 and 2.6.32-358.2.1.el6 are affected
Mostly seen with NFSv3 protocol, HOWEVER as the bug is in the SUNRPC layer, other protocols (NFS v4, NFS v4.1, NFSACL) are susceptible as well.
Seen with Isilon NFS server cluster and EMC NFS server, as well as NetApp filers

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.