NFSD threads become deadlocked when one of the threads tries to lock the same i_mutex twice

Solution Verified - Updated -

Issue

  • nfsd server process show hung_task messages in /var/log/messages, after which nfsd becomes totally unresponsive to NFSD clients and will never leave the uninterruptible sleep "D" state.

  • Apart from the unavailability of NFS, the operating system remains unaffected.

  • The first hung_task backtrace will contain the symbols nfsd_rename and lock_rename

kernel: INFO: task nfsd:16573 blocked for more than 120 seconds.  
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.  
kernel:  D ffff810511adc0c0 0 1657316574 16572 (L-TLB)  
kernel: ffff81057ee8dd00 0000000000000046 ffff8104242b9400 ffffffff8854bd72  
kernel: ffff81057ee8dd50 000000000000000a ffff8105f9a71080 ffff810511adc0c0  
kernel: 00080121893de2c4 0000000000002a1f ffff8105f9a71268 0000000c30703f40  
kernel: Call Trace:  
kernel: [ffffffff8854bd72] :nfsd:exp_find_key+0x89/0x9c  
kernel: [ffffffff8009cce3] set_current_groups+0x116/0x164  
kernel: [ffffffff80063c4f] __mutex_lock_slowpath+0x60/0x9b  
kernel: [ffffffff88547c7b] :nfsd:fh_verify+0x450/0x4bd  
kernel: [ffffffff80063c99] .text.lock.mutex+0xf/0x14  
kernel: [ffffffff800487cb] lock_rename+0xb5/0xbd  
kernel: [ffffffff8854a6c5] :nfsd:nfsd_rename+0x116/0x31e  
kernel: [ffffffff8854ff3d] :nfsd:nfsd3_proc_rename+0x141/0x154  
kernel: [ffffffff885451db] :nfsd:nfsd_dispatch+0xd8/0x1d6  
kernel: [ffffffff884c5649] :sunrpc:svc_process+0x44c/0x713  
kernel: [ffffffff80064604] __down_read+0x12/0x92  
kernel: [ffffffff885455a1] :nfsd:nfsd+0x0/0x2cb  
kernel: [ffffffff88545746] :nfsd:nfsd+0x1a5/0x2cb  
kernel: [ffffffff8005dfb1] child_rip+0xa/0x11  
kernel: [ffffffff885455a1] :nfsd:nfsd+0x0/0x2cb  
kernel: [ffffffff885455a1] :nfsd:nfsd+0x0/0x2cb
  • Subsequent hung_tasks for operations (like rmdir), which may come along and try to acquire an i_mutex on that same file will go into uninterruptible sleep waiting for that i_mutex to be released.

  • It never is released, and so eventually every NFSD task that stumbles upon locking the same resource will be forever hung.

kernel: INFO: task nfsd:16586 blocked for more than 120 seconds.  
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.  
kernel:  D ffffffff80153804 0 16586 1 16584 16580 (L-TLB)  
kernel: ffff81042a003d40 0000000000000046 ffff8105bf148800 ffffffff8854bd72  
kernel: ffff8105b34dd860 000000000000000a ffff8105992df7a0 ffff81010b7e8040  
kernel: 00080121894b0b2f 000000000000276b ffff8105992df988 0000000400000282  
kernel: Call Trace:  
kernel: [ffffffff8854bd72] :nfsd:exp_find_key+0x89/0x9c  
kernel: [ffffffff80063c4f] __mutex_lock_slowpath+0x60/0x9b  
kernel: [ffffffff80063c99] .text.lock.mutex+0xf/0x14  
kernel: [ffffffff8004a63d] vfs_rmdir+0x86/0x11d  
kernel: [ffffffff88548530] :nfsd:nfsd_unlink+0x1ed/0x24b  
kernel: [ffffffff8854fdef] :nfsd:nfsd3_proc_rmdir+0xa8/0xb5  
kernel: [ffffffff885451db] :nfsd:nfsd_dispatch+0xd8/0x1d6  
kernel: [ffffffff884c5649] :sunrpc:svc_process+0x44c/0x713  
kernel: [ffffffff80064604] __down_read+0x12/0x92  
kernel: [ffffffff885455a1] :nfsd:nfsd+0x0/0x2cb  
kernel: [ffffffff88545746] :nfsd:nfsd+0x1a5/0x2cb  
kernel: [ffffffff8005dfb1] child_rip+0xa/0x11  
kernel: [ffffffff885455a1] :nfsd:nfsd+0x0/0x2cb  
kernel: [ffffffff885455a1] :nfsd:nfsd+0x0/0x2cb  
kernel: [ffffffff8005dfa7] child_rip+0x0/0x11  
kernel:

Environment

  • Red Hat Enterprise Linux 5.6 - 5.8
    • kernel-2.6.18-274.5.1.el5; kernel-2.6.18-274.17.1.el5; kernel-2.6.18-300.el5
  • Red Hat Enterprise Linux 6.3 - 6.4
    • kernel-2.6.32-279.*el6; kernel-2.6.32-358.*el6
  • NFS server (nfsd)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content