RHEL6: NFS4 WRITE continuously sent and completing with NFS4ERR_BAD_STATEID (10025) with NetApp due to multiple filehandles for same file

Solution Unverified - Updated 2024-08-05T07:32:20+00:00 -

Issue

hung task timeout and/or panic, with the process triggering the panic doing a close on an NFS file, flushing pages, and waiting on page writeback to complete
hung task backtrace similar to the following

INFO: task foo:4347 blocked for more than 720 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
foo         D 0000000000000012     0  4347    798 0x00000080
 ffff88101240fc78 0000000000000082 0000000000000000 ffff88020acfd840
 ffff88101240fd08 ffffffff8112dd37 ffff88101240fc58 0000000000000282
 ffff881012791098 ffff88101240ffd8 000000000000fb88 ffff881012791098
Call Trace:
 [<ffffffff8150e3e3>] io_schedule+0x73/0xc0
 [<ffffffff81119d3d>] sync_page+0x3d/0x50
 [<ffffffff8150ed9f>] __wait_on_bit+0x5f/0x90
 [<ffffffff81119f73>] wait_on_page_bit+0x73/0x80
 [<ffffffff8111a39b>] wait_on_page_writeback_range+0xfb/0x190
 [<ffffffff8111a568>] filemap_write_and_wait_range+0x78/0x90
 [<ffffffff811b1ace>] vfs_fsync_range+0x7e/0xe0
 [<ffffffff811b1b9d>] vfs_fsync+0x1d/0x20
 [<ffffffffa03a6670>] nfs_file_flush+0x70/0xa0 [nfs]
...

just prior to the process going blocked, we sometimes see the following message:

nfs4_reclaim_open_state: unhandled error -10026. Zeroing state

In addition, sometime prior to the problem, one or more bad sequence-id messages may be seen.

NFS: v4 server nfs-server  returned a bad sequence-id error!

Environment

Red Hat Enterprise Linux 6 (NFS Client)
- seen on kernels 2.6.32-358.12.1.el6 and 2.6.32-431.5.1.el6
- other kernels likely affected
NetApp (NFS Server)
- Ontap 8.1.2P4
- delegations disabled
NFSv3 and NFSv4 enabled and active on the same NetApp volume

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Select Your Language

RHEL6: NFS4 WRITE continuously sent and completing with NFS4ERR_BAD_STATEID (10025) with NetApp due to multiple filehandles for same file

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links