RHEL6.4: delayed NFS RENEW response from NetApp filer leads to temporarily expired lease, and repeated NFS4 WRITE with NFS4ERR_BAD_STATEID reply
Issue
- A TIBCO process got stuck and went into defunct mode making it a zombie process and won't die with any commands.
- The problem is the stuck process won't release its resources holding some ports and unable to release the lock on a file that is sitting on a NetApp NFSv4 share.
- The only solution to fix the problem is by rebooting this Redhat Linux VM.
- Here is a sample backtrace we see when the issue occurs
INFO: task tibemsd64:2455 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
tibemsd64 D 0000000000000000 0 2455 1 0x00000080
ffff88023ae419c8 0000000000000082 ffff88023ae41948 ffffffffa028ecd0
ffff880237693400 ffff88023ae41978 ffff880238ba9c30 ffff88023a3067e0
ffff88023aa75ab8 ffff88023ae41fd8 000000000000fb88 ffff88023aa75ab8
Call Trace:
[<ffffffffa028ecd0>] ? rpc_execute+0x50/0xa0 [sunrpc]
[<ffffffff810a2431>] ? ktime_get_ts+0xb1/0xf0
[<ffffffff81119e10>] ? sync_page+0x0/0x50
[<ffffffff8150e8c3>] io_schedule+0x73/0xc0
[<ffffffff81119e4d>] sync_page+0x3d/0x50
[<ffffffff8150f12a>] __wait_on_bit_lock+0x5a/0xc0
[<ffffffff81119de7>] __lock_page+0x67/0x70
[<ffffffff81096de0>] ? wake_bit_function+0x0/0x50
[<ffffffff81119c1e>] ? find_get_page+0x1e/0xa0
[<ffffffff8111ae90>] find_lock_page+0x50/0x80
[<ffffffff8111af0d>] grab_cache_page_write_begin+0x4d/0xc0
[<ffffffffa0325267>] nfs_write_begin+0x77/0x220 [nfs]
[<ffffffff8111a7b3>] generic_file_buffered_write+0x123/0x2e0
[<ffffffff8111c210>] __generic_file_aio_write+0x260/0x490
[<ffffffff81437b73>] ? sock_recvmsg+0x133/0x160
[<ffffffff8111c4c8>] generic_file_aio_write+0x88/0x100
[<ffffffffa0325f8e>] nfs_file_write+0xde/0x1f0 [nfs]
[<ffffffff8118106a>] do_sync_write+0xfa/0x140
[<ffffffff81096da0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8121bed6>] ? security_file_permission+0x16/0x20
[<ffffffff81181368>] vfs_write+0xb8/0x1a0
[<ffffffff81181c61>] sys_write+0x51/0x90
[<ffffffff810dc685>] ? __audit_syscall_exit+0x265/0x290
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Environment
- Red Hat Enterprise Linux 6 (NFS Client)
- seen on kernel 2.6.32-358.18.1.el6
- NFS Server
- NetApp Ontap 8.1.4P2 7-mode
- NFS4 with (read and write) delegations enabled
- NFS4 lease time = 30 seconds
- NOTE: This is the default for NetApp options nfs.v4.lease_seconds according to NetApp library ECMM1278346 - Specifying the NFSv4 locking lease period
- Application
- seen with TIBCO EMS
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
