NFSD threads become deadlocked when one of the threads tries to lock the same i_mutex twice
Issue
-
nfsd
server process showhung_task
messages in/var/log/messages
, after which nfsd becomes totally unresponsive to NFSD clients and will never leave the uninterruptible sleep "D" state. -
Apart from the unavailability of NFS, the operating system remains unaffected.
-
The first hung_task backtrace will contain the symbols
nfsd_rename
andlock_rename
kernel: INFO: task nfsd:16573 blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: D ffff810511adc0c0 0 1657316574 16572 (L-TLB)
kernel: ffff81057ee8dd00 0000000000000046 ffff8104242b9400 ffffffff8854bd72
kernel: ffff81057ee8dd50 000000000000000a ffff8105f9a71080 ffff810511adc0c0
kernel: 00080121893de2c4 0000000000002a1f ffff8105f9a71268 0000000c30703f40
kernel: Call Trace:
kernel: [ffffffff8854bd72] :nfsd:exp_find_key+0x89/0x9c
kernel: [ffffffff8009cce3] set_current_groups+0x116/0x164
kernel: [ffffffff80063c4f] __mutex_lock_slowpath+0x60/0x9b
kernel: [ffffffff88547c7b] :nfsd:fh_verify+0x450/0x4bd
kernel: [ffffffff80063c99] .text.lock.mutex+0xf/0x14
kernel: [ffffffff800487cb] lock_rename+0xb5/0xbd
kernel: [ffffffff8854a6c5] :nfsd:nfsd_rename+0x116/0x31e
kernel: [ffffffff8854ff3d] :nfsd:nfsd3_proc_rename+0x141/0x154
kernel: [ffffffff885451db] :nfsd:nfsd_dispatch+0xd8/0x1d6
kernel: [ffffffff884c5649] :sunrpc:svc_process+0x44c/0x713
kernel: [ffffffff80064604] __down_read+0x12/0x92
kernel: [ffffffff885455a1] :nfsd:nfsd+0x0/0x2cb
kernel: [ffffffff88545746] :nfsd:nfsd+0x1a5/0x2cb
kernel: [ffffffff8005dfb1] child_rip+0xa/0x11
kernel: [ffffffff885455a1] :nfsd:nfsd+0x0/0x2cb
kernel: [ffffffff885455a1] :nfsd:nfsd+0x0/0x2cb
-
Subsequent hung_tasks for operations (like rmdir), which may come along and try to acquire an
i_mutex
on that same file will go into uninterruptible sleep waiting for thati_mutex
to be released. -
It never is released, and so eventually every NFSD task that stumbles upon locking the same resource will be forever hung.
kernel: INFO: task nfsd:16586 blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: D ffffffff80153804 0 16586 1 16584 16580 (L-TLB)
kernel: ffff81042a003d40 0000000000000046 ffff8105bf148800 ffffffff8854bd72
kernel: ffff8105b34dd860 000000000000000a ffff8105992df7a0 ffff81010b7e8040
kernel: 00080121894b0b2f 000000000000276b ffff8105992df988 0000000400000282
kernel: Call Trace:
kernel: [ffffffff8854bd72] :nfsd:exp_find_key+0x89/0x9c
kernel: [ffffffff80063c4f] __mutex_lock_slowpath+0x60/0x9b
kernel: [ffffffff80063c99] .text.lock.mutex+0xf/0x14
kernel: [ffffffff8004a63d] vfs_rmdir+0x86/0x11d
kernel: [ffffffff88548530] :nfsd:nfsd_unlink+0x1ed/0x24b
kernel: [ffffffff8854fdef] :nfsd:nfsd3_proc_rmdir+0xa8/0xb5
kernel: [ffffffff885451db] :nfsd:nfsd_dispatch+0xd8/0x1d6
kernel: [ffffffff884c5649] :sunrpc:svc_process+0x44c/0x713
kernel: [ffffffff80064604] __down_read+0x12/0x92
kernel: [ffffffff885455a1] :nfsd:nfsd+0x0/0x2cb
kernel: [ffffffff88545746] :nfsd:nfsd+0x1a5/0x2cb
kernel: [ffffffff8005dfb1] child_rip+0xa/0x11
kernel: [ffffffff885455a1] :nfsd:nfsd+0x0/0x2cb
kernel: [ffffffff885455a1] :nfsd:nfsd+0x0/0x2cb
kernel: [ffffffff8005dfa7] child_rip+0x0/0x11
kernel:
Environment
- Red Hat Enterprise Linux 5.6 - 5.8
- kernel-2.6.18-274.5.1.el5; kernel-2.6.18-274.17.1.el5; kernel-2.6.18-300.el5
- Red Hat Enterprise Linux 6.3 - 6.4
- kernel-2.6.32-279.*el6; kernel-2.6.32-358.*el6
- NFS server (nfsd)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.