RHEL6.4: web servers utilizing NFSv4 share on Nexsan, hung tasks, OPEN repeatedly completing with NFS4ERR_STALE_CLIENTID

Solution Unverified - Updated -

Issue

  • We have a couple web servers, utilizing an NFS share on a Nexsan, that are getting hung tasks consistently.
  • Here is a snippet of the log entries from the kernel:
Mar 11 05:17:58 localhost kernel: INFO: task httpd:4123 blocked for more than 120 seconds.
Mar 11 05:17:58 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 11 05:17:58 localhost kernel: httpd         D 0000000000000001     0  4123   2167 0x00000080
Mar 11 05:17:58 localhost kernel: ffff8800a3fdbba8 0000000000000082 ffff8800a3fdbb28 ffff88013acdb300
Mar 11 05:17:58 localhost kernel: ffff8800a3fdbb28 ffffffff8119b02a ffff88013add9c00 ffff88013acdb300
Mar 11 05:17:58 localhost kernel: ffff8800a5841058 ffff8800a3fdbfd8 000000000000fb88 ffff8800a5841058
Mar 11 05:17:58 localhost kernel: Call Trace:
Mar 11 05:17:58 localhost kernel: [<ffffffff8119b02a>] ? dput+0x9a/0x150
Mar 11 05:17:58 localhost kernel: [<ffffffff8150ee1e>] __mutex_lock_slowpath+0x13e/0x180
Mar 11 05:17:58 localhost kernel: [<ffffffff8150ecbb>] mutex_lock+0x2b/0x50
Mar 11 05:17:58 localhost kernel: [<ffffffff8119045b>] do_lookup+0x11b/0x230
Mar 11 05:17:58 localhost kernel: [<ffffffff81190ca4>] __link_path_walk+0x734/0x1030
Mar 11 05:17:58 localhost kernel: [<ffffffff8119182a>] path_walk+0x6a/0xe0
Mar 11 05:17:58 localhost kernel: [<ffffffff811919fb>] do_path_lookup+0x5b/0xa0
Mar 11 05:17:58 localhost kernel: [<ffffffff81182540>] ? get_empty_filp+0xa0/0x180
Mar 11 05:17:58 localhost kernel: [<ffffffff8119293b>] do_filp_open+0xfb/0xdd0
Mar 11 05:17:58 localhost kernel: [<ffffffffa02a66b6>] ? nfs_revalidate_inode+0x26/0x60 [nfs]
Mar 11 05:17:58 localhost kernel: [<ffffffff8119f642>] ? alloc_fd+0x92/0x160
Mar 11 05:17:58 localhost kernel: [<ffffffff8117df59>] do_sys_open+0x69/0x140
Mar 11 05:17:58 localhost kernel: [<ffffffff8117e070>] sys_open+0x20/0x30
Mar 11 05:17:58 localhost kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

Environment

  • Red Hat Enterprise Linux 6.4 (2 NFSv4 Clients)
    • VMware virtual machines
    • kernel 2.6.32-358.0.1.el6
    • Apache/nginx and host 2000 web servers
    • configured for HA/redundancy so if one goes down the other one should pick up
  • NFSv4 Server
    • Nexsan
    • sharing out a carved out LUN via NFS only

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content