NFSD threads become deadlocked when one of the threads tries to lock the same i_mutex twice
Issue
-
nfsdserver process showhung_taskmessages in/var/log/messages, after which nfsd becomes totally unresponsive to NFSD clients and will never leave the uninterruptible sleep "D" state. -
Apart from the unavailability of NFS, the operating system remains unaffected.
-
The first hung_task backtrace will contain the symbols
nfsd_renameandlock_rename
kernel: INFO: task nfsd:16573 blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: D ffff810511adc0c0 0 1657316574 16572 (L-TLB)
kernel: ffff81057ee8dd00 0000000000000046 ffff8104242b9400 ffffffff8854bd72
kernel: ffff81057ee8dd50 000000000000000a ffff8105f9a71080 ffff810511adc0c0
kernel: 00080121893de2c4 0000000000002a1f ffff8105f9a71268 0000000c30703f40
kernel: Call Trace:
kernel: [ffffffff8854bd72] :nfsd:exp_find_key+0x89/0x9c
kernel: [ffffffff8009cce3] set_current_groups+0x116/0x164
kernel: [ffffffff80063c4f] __mutex_lock_slowpath+0x60/0x9b
kernel: [ffffffff88547c7b] :nfsd:fh_verify+0x450/0x4bd
kernel: [ffffffff80063c99] .text.lock.mutex+0xf/0x14
kernel: [ffffffff800487cb] lock_rename+0xb5/0xbd
kernel: [ffffffff8854a6c5] :nfsd:nfsd_rename+0x116/0x31e
kernel: [ffffffff8854ff3d] :nfsd:nfsd3_proc_rename+0x141/0x154
kernel: [ffffffff885451db] :nfsd:nfsd_dispatch+0xd8/0x1d6
kernel: [ffffffff884c5649] :sunrpc:svc_process+0x44c/0x713
kernel: [ffffffff80064604] __down_read+0x12/0x92
kernel: [ffffffff885455a1] :nfsd:nfsd+0x0/0x2cb
kernel: [ffffffff88545746] :nfsd:nfsd+0x1a5/0x2cb
kernel: [ffffffff8005dfb1] child_rip+0xa/0x11
kernel: [ffffffff885455a1] :nfsd:nfsd+0x0/0x2cb
kernel: [ffffffff885455a1] :nfsd:nfsd+0x0/0x2cb
-
Subsequent hung_tasks for operations (like rmdir), which may come along and try to acquire an
i_mutexon that same file will go into uninterruptible sleep waiting for thati_mutexto be released. -
It never is released, and so eventually every NFSD task that stumbles upon locking the same resource will be forever hung.
kernel: INFO: task nfsd:16586 blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: D ffffffff80153804 0 16586 1 16584 16580 (L-TLB)
kernel: ffff81042a003d40 0000000000000046 ffff8105bf148800 ffffffff8854bd72
kernel: ffff8105b34dd860 000000000000000a ffff8105992df7a0 ffff81010b7e8040
kernel: 00080121894b0b2f 000000000000276b ffff8105992df988 0000000400000282
kernel: Call Trace:
kernel: [ffffffff8854bd72] :nfsd:exp_find_key+0x89/0x9c
kernel: [ffffffff80063c4f] __mutex_lock_slowpath+0x60/0x9b
kernel: [ffffffff80063c99] .text.lock.mutex+0xf/0x14
kernel: [ffffffff8004a63d] vfs_rmdir+0x86/0x11d
kernel: [ffffffff88548530] :nfsd:nfsd_unlink+0x1ed/0x24b
kernel: [ffffffff8854fdef] :nfsd:nfsd3_proc_rmdir+0xa8/0xb5
kernel: [ffffffff885451db] :nfsd:nfsd_dispatch+0xd8/0x1d6
kernel: [ffffffff884c5649] :sunrpc:svc_process+0x44c/0x713
kernel: [ffffffff80064604] __down_read+0x12/0x92
kernel: [ffffffff885455a1] :nfsd:nfsd+0x0/0x2cb
kernel: [ffffffff88545746] :nfsd:nfsd+0x1a5/0x2cb
kernel: [ffffffff8005dfb1] child_rip+0xa/0x11
kernel: [ffffffff885455a1] :nfsd:nfsd+0x0/0x2cb
kernel: [ffffffff885455a1] :nfsd:nfsd+0x0/0x2cb
kernel: [ffffffff8005dfa7] child_rip+0x0/0x11
kernel:
Environment
- Red Hat Enterprise Linux 5.6 - 5.8
- kernel-2.6.18-274.5.1.el5; kernel-2.6.18-274.17.1.el5; kernel-2.6.18-300.el5
- Red Hat Enterprise Linux 6.3 - 6.4
- kernel-2.6.32-279.*el6; kernel-2.6.32-358.*el6
- NFS server (nfsd)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
