Red Hat Enterprise Linux 5 NFSv4 clients report: v4 server returned a bad sequence-id error!

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 5.4 (NFS client)
  • Solaris (NFS server)
  • NetApp Filer - ONTAP 7.3.2P6D1 (NFS server)
  • Nfsv4 - not kerberized

Issue

  • The NFS mounts hang when trying to access them. The messages log shows these types of errors:

Feb 10 18:51:01 hostname kernel: NFS: v4 server returned a bad sequence-id error!
Feb 10 18:52:01 hostname kernel: Error: state recovery failed on NFSv4 server *.*.*.* with error 2
Feb 10 18:52:32 hostname last message repeated 123273 times
Feb 10 18:53:33 hostname last message repeated 264847 times
Feb 10 18:54:34 hostname last message repeated 267357 times

Resolution

Root Cause

  • Summary:

    • There are two bugs. The first bug is that the RHEL NFS client has not implemented RELEASE_LOCKOWNER. This exhausts the NFS server's stateids resulting in the NFS server returning NFS4ERR_RESOURCE to the NFS client. The second bug is that the NFS client does not properly handle the NFS4ERR_RESOURCE resulting in the NFS server returning NFS4ERR_BAD_SEQID (v4 server returned a bad sequence-id error!).
  • Details of the 1st bug: "Current versions of Linux have an issue when you use file locking: they can end up using a lot of stateids if you have one or several applications does a lot of lock/unlock cycles but where the file stateid never gets CLOSEd (which can happen if at least one application has the file open at any point in time). That is a bug that the above patch set aims to fix by adding client side support for the RELEASE_LOCKOWNER operation." (http://article.gmane.org/gmane.linux.nfs/33685)

  • Details of the 2nd bug: "RFC 3530 states that when we recieve the error NFS4ERR_RESOURCE, we are supposed to bump the sequence number on OPEN, LOCK, LOCKU, CLOSE, etc operations. The problem is that we map that error into EREMOTEIO in the layer, and so the NFSv4 middle-layer routines like seqid_mutating_err(), and nfs_increment_seqid() don't recognise it." (http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=52567b03ca38b6e556ced450d64dba8d66e23b0e)

Diagnostic Steps

  • Use the reproducer attached to: https://bugzilla.redhat.com/show_bug.cgi?id=620502

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.