Skip to navigation

RHEL6: NFSv4 flock regression with 2.6.32-279.22.1 or 2.6.32-358.el6 (Lock reclaim failed!): 2.6.32-279.19.1 is ok

Updated 2013-11-25T15:59:59+00:00

Issue

  • nfsv4: flock() hanging instead of failing with error
  • regression: NFS4 errors when logging in/locking files
  • In our environment, home directories are mounted over NFS. On the latest kernel, logging in to a graphical session causes a stream of "nfs4_reclaim_open_state: Lock reclaim failed!" kernel messages to be logged, many messages within the same second. This has caused serious problems, as the system log file grows so large that the root filesystem can fill within a day or two.
  • We are having what looks like a file locking (flock) problem over nfsv4 with clients running the 2.6.32-279.22.1 kernel. I suspect this problem may manifest itself in other ways, but here is one example running pidgin:
===============
$ strace pidgin
...
stat("foo", {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
open("foo", O_RDONLY) = 14
flock(14, LOCK_EX
===============
And it just hangs there waiting for the lock.
  • All we have to do to fix the problem is boot the client to the previous 2.6.32-279.19.1 kernel and all is well again.
  • While the flock() is hung, the messages file is generating a lot of these kernel errors, with the timestamp of the same second indicating a tight loop is being executed:
kernel: nfs4_reclaim_open_state: Lock reclaim failed!
kernel: message repeated 210204 times: [nfs4_reclaim_open_state: Lock reclaim failed!]
  • Taking a tcpdump at the time of the failure shows a repeated sequence of LOCK requests, all failing with (10038 == NFS4ERR_OPENMODE), and all with a timestamp very close together (within the same second):
1 2013-03-01 09:18:56.208360 192.168.122.131 -> 192.168.122.121 NFS V4 COMP Call LOCK
  2 2013-03-01 09:18:56.209034 192.168.122.121 -> 192.168.122.131 NFS V4 COMP Reply (Call In 1) LOCK(10038)
  3 2013-03-01 09:18:56.209421 192.168.122.131 -> 192.168.122.121 NFS V4 COMP Call SAVEFH OPEN DELEGRETURN Unknown
  4 2013-03-01 09:18:56.210049 192.168.122.121 -> 192.168.122.131 NFS V4 COMP Reply (Call In 3) SAVEFH OPEN[Malformed Packet]
  5 2013-03-01 09:18:56.210319 192.168.122.131 -> 192.168.122.121 NFS V4 COMP Call LOCK
  6 2013-03-01 09:18:56.211024 192.168.122.121 -> 192.168.122.131 NFS V4 COMP Reply (Call In 5) LOCK(10038)
  7 2013-03-01 09:18:56.215670 192.168.122.131 -> 192.168.122.121 NFS V4 COMP Call SAVEFH OPEN DELEGRETURN Unknown
  8 2013-03-01 09:18:56.216375 192.168.122.121 -> 192.168.122.131 NFS V4 COMP Reply (Call In 7) SAVEFH OPEN[Malformed Packet]
  9 2013-03-01 09:18:56.216596 192.168.122.131 -> 192.168.122.121 NFS V4 COMP Call LOCK
 10 2013-03-01 09:18:56.217332 192.168.122.121 -> 192.168.122.131 NFS V4 COMP Reply (Call In 9) LOCK(10038)
 11 2013-03-01 09:18:56.217674 192.168.122.131 -> 192.168.122.121 NFS V4 COMP Call SAVEFH OPEN DELEGRETURN Unknown

Environment

  • Red Hat Enterprise Linux 6.3 - 6.4
    • kernel >= 2.6.32-279.22.1 and < 2.6.32-279.25.1.el6
    • kernel >= 2.6.32-358.el6 and < 2.6.32-358.6.1.el6
  • NFSv4 Client
  • NOTE: Red Hat Enterprise Linux 5 is not affected by this issue (as of kernel 2.6.18-348.el5)
  • Any NFS Server
  • Application issuing flock on an NFS file with an open mode which differs from the lock request.

Subscriber content preview. For full access to the Red Hat Knowledgebase, please log in.

Not a subscriber? Learn more about the benefits of Red Hat Subscriptions.