RHEL6.4: web servers utilizing NFSv4 share on Nexsan, hung tasks, OPEN repeatedly completing with NFS4ERR_STALE_CLIENTID
Issue
- We have a couple web servers, utilizing an NFS share on a Nexsan, that are getting hung tasks consistently.
- Here is a snippet of the log entries from the kernel:
Mar 11 05:17:58 localhost kernel: INFO: task httpd:4123 blocked for more than 120 seconds.
Mar 11 05:17:58 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 11 05:17:58 localhost kernel: httpd D 0000000000000001 0 4123 2167 0x00000080
Mar 11 05:17:58 localhost kernel: ffff8800a3fdbba8 0000000000000082 ffff8800a3fdbb28 ffff88013acdb300
Mar 11 05:17:58 localhost kernel: ffff8800a3fdbb28 ffffffff8119b02a ffff88013add9c00 ffff88013acdb300
Mar 11 05:17:58 localhost kernel: ffff8800a5841058 ffff8800a3fdbfd8 000000000000fb88 ffff8800a5841058
Mar 11 05:17:58 localhost kernel: Call Trace:
Mar 11 05:17:58 localhost kernel: [<ffffffff8119b02a>] ? dput+0x9a/0x150
Mar 11 05:17:58 localhost kernel: [<ffffffff8150ee1e>] __mutex_lock_slowpath+0x13e/0x180
Mar 11 05:17:58 localhost kernel: [<ffffffff8150ecbb>] mutex_lock+0x2b/0x50
Mar 11 05:17:58 localhost kernel: [<ffffffff8119045b>] do_lookup+0x11b/0x230
Mar 11 05:17:58 localhost kernel: [<ffffffff81190ca4>] __link_path_walk+0x734/0x1030
Mar 11 05:17:58 localhost kernel: [<ffffffff8119182a>] path_walk+0x6a/0xe0
Mar 11 05:17:58 localhost kernel: [<ffffffff811919fb>] do_path_lookup+0x5b/0xa0
Mar 11 05:17:58 localhost kernel: [<ffffffff81182540>] ? get_empty_filp+0xa0/0x180
Mar 11 05:17:58 localhost kernel: [<ffffffff8119293b>] do_filp_open+0xfb/0xdd0
Mar 11 05:17:58 localhost kernel: [<ffffffffa02a66b6>] ? nfs_revalidate_inode+0x26/0x60 [nfs]
Mar 11 05:17:58 localhost kernel: [<ffffffff8119f642>] ? alloc_fd+0x92/0x160
Mar 11 05:17:58 localhost kernel: [<ffffffff8117df59>] do_sys_open+0x69/0x140
Mar 11 05:17:58 localhost kernel: [<ffffffff8117e070>] sys_open+0x20/0x30
Mar 11 05:17:58 localhost kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Environment
- Red Hat Enterprise Linux 6.4 (2 NFSv4 Clients)
- VMware virtual machines
- kernel 2.6.32-358.0.1.el6
- Apache/nginx and host 2000 web servers
- configured for HA/redundancy so if one goes down the other one should pick up
- NFSv4 Server
- Nexsan
- sharing out a carved out LUN via NFS only
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.