kthreadd self deadlock in NFS failure recovery path leading to the system becoming unresponsive
Environment
- Red Hat Enterprise Linux 7
- kernel 3.10.0-1160.88.1.el7.x86_64
- nfsv4
Issue
- The system is not responding due to lots of
Dstate (uninterruptible sleep) processes are not back to running state. - The
Dstate processes is waiting on kthreadd process.
Resolution
- This issue has been tracked with Bug 2164219 and closed as WONTFIX, as RHEL 7 is currently in Maintenance Support 2 Phase.
Workaround:
- Disable pNFS on RHEL7. You can find the attached knowledge base article How to check whether pNFS is enabled on NFS client ?
- If pNFS is required, please upgrade system to RHEL8/9.
Diagnostic Steps
- The D state processes are mainly waiting on
kthreaddprocess.
crash> ps -m | grep UN | sort | tail
[ 1 14:38:03.969] [UN] PID: 21290 TASK: ffff9b8473eb6300 CPU: 8 COMMAND: "kworker/8:0"
[ 1 14:40:01.240] [UN] PID: 114781 TASK: ffff9baad0769080 CPU: 8 COMMAND: "kworker/8:0H"
[ 1 14:41:08.993] [UN] PID: 20209 TASK: ffff9b84c6af0000 CPU: 7 COMMAND: "kworker/7:1"
[ 1 14:42:56.615] [UN] PID: 15312 TASK: ffff9b9d28c0c200 CPU: 9 COMMAND: "kworker/u256:6"
[ 1 14:43:16.124] [UN] PID: 19705 TASK: ffff9b8b35dd5280 CPU: 14 COMMAND: "kworker/14:2"
[ 1 14:44:42.215] [UN] PID: 116703 TASK: ffff9b83db105280 CPU: 2 COMMAND: "kworker/2:1"
[ 1 14:44:42.835] [UN] PID: 16443 TASK: ffff9b9d28c0a100 CPU: 13 COMMAND: "kworker/13:0"
[ 1 14:45:07.128] [UN] PID: 7565 TASK: ffff9b846797b180 CPU: 9 COMMAND: "kworker/9:1"
[ 1 14:45:39.406] [UN] PID: 17446 TASK: ffff9b9c7204d280 CPU: 5 COMMAND: "kworker/5:0"
[ 1 14:45:40.094] [UN] PID: 15113 TASK: ffff9bb279c69080 CPU: 17 COMMAND: "kworker/17:1"
crash> bt
PID: 15113 TASK: ffff9bb279c69080 CPU: 17 COMMAND: "kworker/17:1"
#0 [ffff9b83f86e3b90] __schedule at ffffffffa77b78d8
#1 [ffff9b83f86e3bf8] schedule at ffffffffa77b7ca9
#2 [ffff9b83f86e3c08] schedule_timeout at ffffffffa77b5831
#3 [ffff9b83f86e3cb8] wait_for_completion at ffffffffa77b805d
#4 [ffff9b83f86e3d18] kthread_create_on_node at ffffffffa70cb47a
#5 [ffff9b83f86e3dd0] create_worker at ffffffffa70c3ebb
#6 [ffff9b83f86e3e20] manage_workers at ffffffffa70c415e
#7 [ffff9b83f86e3e68] worker_thread at ffffffffa70c4693
#8 [ffff9b83f86e3ec8] kthread at ffffffffa70cb621
#9 [ffff9b83f86e3f50] ret_from_fork_nospec_begin at ffffffffa77c51dd
- But, the following process is also in D state which is trying to get acknowledge from
kthreadd. That makes a deadlock situation. - The machine hangs, and vmcore analysis revealed that kthreadd was locked attempting to create a nfs state manager, while attempting to recovery memory to create another process.
crash> ps -m 2
[ 0 00:08:57.941] [UN] PID: 2 TASK: ffff9b84c6d11080 CPU: 14 COMMAND: "kthreadd"
crash> bt 2
PID: 2 TASK: ffff9b84c6d11080 CPU: 14 COMMAND: "kthreadd"
#0 [ffff9b84c6d3b400] __schedule at ffffffffa77b78d8
#1 [ffff9b84c6d3b468] schedule at ffffffffa77b7ca9
#2 [ffff9b84c6d3b478] schedule_timeout at ffffffffa77b5831
#3 [ffff9b84c6d3b528] wait_for_completion at ffffffffa77b805d
#4 [ffff9b84c6d3b588] kthread_create_on_node at ffffffffa70cb47a
#5 [ffff9b84c6d3b640] nfs4_schedule_state_manager at ffffffffc09aaab1 [nfsv4]
#6 [ffff9b84c6d3b6a0] nfs41_handle_sequence_flag_errors at ffffffffc09ab529 [nfsv4]
#7 [ffff9b84c6d3b6c8] nfs41_sequence_process at ffffffffc098fd3c [nfsv4]
#8 [ffff9b84c6d3b708] nfs4_layoutreturn_done at ffffffffc0991951 [nfsv4]
#9 [ffff9b84c6d3b730] rpc_exit_task at ffffffffc06bb821 [sunrpc]
#10 [ffff9b84c6d3b748] __rpc_execute at ffffffffc06bd969 [sunrpc]
#11 [ffff9b84c6d3b7b0] rpc_execute at ffffffffc06c0268 [sunrpc]
#12 [ffff9b84c6d3b7e0] rpc_run_task at ffffffffc06ae856 [sunrpc]
#13 [ffff9b84c6d3b810] nfs4_proc_layoutreturn at ffffffffc0999e07 [nfsv4]
#14 [ffff9b84c6d3b8b0] pnfs_send_layoutreturn at ffffffffc09beac6 [nfsv4]
#15 [ffff9b84c6d3b8f8] _pnfs_return_layout at ffffffffc09c0b28 [nfsv4]
#16 [ffff9b84c6d3b980] nfs4_evict_inode at ffffffffc09abc44 [nfsv4]
#17 [ffff9b84c6d3b998] evict at ffffffffa727ab34
#18 [ffff9b84c6d3b9c0] dispose_list at ffffffffa727ac3e
#19 [ffff9b84c6d3b9e8] prune_icache_sb at ffffffffa727bd4c
#20 [ffff9b84c6d3ba50] prune_super at ffffffffa725e8ab
#21 [ffff9b84c6d3ba80] shrink_slab at ffffffffa71dd6d5
#22 [ffff9b84c6d3bb28] do_try_to_free_pages at ffffffffa71e09ea
#23 [ffff9b84c6d3bba0] try_to_free_pages at ffffffffa71e0c3c
#24 [ffff9b84c6d3bc38] __alloc_pages_nodemask at ffffffffa71d45a1
#25 [ffff9b84c6d3bd68] copy_process at ffffffffa709cd45
#26 [ffff9b84c6d3bdf8] do_fork at ffffffffa709e791
#27 [ffff9b84c6d3be70] kernel_thread at ffffffffa709ea66
#28 [ffff9b84c6d3be80] kthreadd at ffffffffa70cc051
#29 [ffff9b84c6d3bf50] ret_from_fork_nospec_begin at ffffffffa77c51dd
255 struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
256 void *data, int node,
257 const char namefmt[],
258 ...)
259 {
...
271 wake_up_process(kthreadd_task);
272 wait_for_completion(&create.done); <-- infinite loop as `kthreadd` is running in this call trace itself.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments