kthreadd self deadlock in NFS failure recovery path leading to the system becoming unresponsive

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 7
  • kernel 3.10.0-1160.88.1.el7.x86_64
  • nfsv4

Issue

  • The system is not responding due to lots of D state (uninterruptible sleep) processes are not back to running state.
  • The D state processes is waiting on kthreadd process.

Resolution

  • This issue has been tracked with Bug 2164219 and closed as WONTFIX, as RHEL 7 is currently in Maintenance Support 2 Phase.

Workaround:

Diagnostic Steps

  • The D state processes are mainly waiting on kthreadd process.
crash> ps -m | grep UN | sort | tail 
[ 1 14:38:03.969] [UN]  PID: 21290    TASK: ffff9b8473eb6300  CPU: 8    COMMAND: "kworker/8:0"
[ 1 14:40:01.240] [UN]  PID: 114781   TASK: ffff9baad0769080  CPU: 8    COMMAND: "kworker/8:0H"
[ 1 14:41:08.993] [UN]  PID: 20209    TASK: ffff9b84c6af0000  CPU: 7    COMMAND: "kworker/7:1"
[ 1 14:42:56.615] [UN]  PID: 15312    TASK: ffff9b9d28c0c200  CPU: 9    COMMAND: "kworker/u256:6"
[ 1 14:43:16.124] [UN]  PID: 19705    TASK: ffff9b8b35dd5280  CPU: 14   COMMAND: "kworker/14:2"
[ 1 14:44:42.215] [UN]  PID: 116703   TASK: ffff9b83db105280  CPU: 2    COMMAND: "kworker/2:1"
[ 1 14:44:42.835] [UN]  PID: 16443    TASK: ffff9b9d28c0a100  CPU: 13   COMMAND: "kworker/13:0"
[ 1 14:45:07.128] [UN]  PID: 7565     TASK: ffff9b846797b180  CPU: 9    COMMAND: "kworker/9:1"
[ 1 14:45:39.406] [UN]  PID: 17446    TASK: ffff9b9c7204d280  CPU: 5    COMMAND: "kworker/5:0"
[ 1 14:45:40.094] [UN]  PID: 15113    TASK: ffff9bb279c69080  CPU: 17   COMMAND: "kworker/17:1"


crash> bt
PID: 15113    TASK: ffff9bb279c69080  CPU: 17   COMMAND: "kworker/17:1"
 #0 [ffff9b83f86e3b90] __schedule at ffffffffa77b78d8
 #1 [ffff9b83f86e3bf8] schedule at ffffffffa77b7ca9
 #2 [ffff9b83f86e3c08] schedule_timeout at ffffffffa77b5831
 #3 [ffff9b83f86e3cb8] wait_for_completion at ffffffffa77b805d
 #4 [ffff9b83f86e3d18] kthread_create_on_node at ffffffffa70cb47a
 #5 [ffff9b83f86e3dd0] create_worker at ffffffffa70c3ebb
 #6 [ffff9b83f86e3e20] manage_workers at ffffffffa70c415e
 #7 [ffff9b83f86e3e68] worker_thread at ffffffffa70c4693
 #8 [ffff9b83f86e3ec8] kthread at ffffffffa70cb621
 #9 [ffff9b83f86e3f50] ret_from_fork_nospec_begin at ffffffffa77c51dd
  • But, the following process is also in D state which is trying to get acknowledge from kthreadd. That makes a deadlock situation.
  • The machine hangs, and vmcore analysis revealed that kthreadd was locked attempting to create a nfs state manager, while attempting to recovery memory to create another process.
crash> ps -m 2
[ 0 00:08:57.941] [UN]  PID: 2        TASK: ffff9b84c6d11080  CPU: 14   COMMAND: "kthreadd"
crash> bt 2
PID: 2        TASK: ffff9b84c6d11080  CPU: 14   COMMAND: "kthreadd"
 #0 [ffff9b84c6d3b400] __schedule at ffffffffa77b78d8
 #1 [ffff9b84c6d3b468] schedule at ffffffffa77b7ca9
 #2 [ffff9b84c6d3b478] schedule_timeout at ffffffffa77b5831
 #3 [ffff9b84c6d3b528] wait_for_completion at ffffffffa77b805d
 #4 [ffff9b84c6d3b588] kthread_create_on_node at ffffffffa70cb47a
 #5 [ffff9b84c6d3b640] nfs4_schedule_state_manager at ffffffffc09aaab1 [nfsv4]
 #6 [ffff9b84c6d3b6a0] nfs41_handle_sequence_flag_errors at ffffffffc09ab529 [nfsv4]
 #7 [ffff9b84c6d3b6c8] nfs41_sequence_process at ffffffffc098fd3c [nfsv4]
 #8 [ffff9b84c6d3b708] nfs4_layoutreturn_done at ffffffffc0991951 [nfsv4]
 #9 [ffff9b84c6d3b730] rpc_exit_task at ffffffffc06bb821 [sunrpc]
#10 [ffff9b84c6d3b748] __rpc_execute at ffffffffc06bd969 [sunrpc]
#11 [ffff9b84c6d3b7b0] rpc_execute at ffffffffc06c0268 [sunrpc]
#12 [ffff9b84c6d3b7e0] rpc_run_task at ffffffffc06ae856 [sunrpc]
#13 [ffff9b84c6d3b810] nfs4_proc_layoutreturn at ffffffffc0999e07 [nfsv4]
#14 [ffff9b84c6d3b8b0] pnfs_send_layoutreturn at ffffffffc09beac6 [nfsv4]
#15 [ffff9b84c6d3b8f8] _pnfs_return_layout at ffffffffc09c0b28 [nfsv4]
#16 [ffff9b84c6d3b980] nfs4_evict_inode at ffffffffc09abc44 [nfsv4]
#17 [ffff9b84c6d3b998] evict at ffffffffa727ab34
#18 [ffff9b84c6d3b9c0] dispose_list at ffffffffa727ac3e
#19 [ffff9b84c6d3b9e8] prune_icache_sb at ffffffffa727bd4c
#20 [ffff9b84c6d3ba50] prune_super at ffffffffa725e8ab
#21 [ffff9b84c6d3ba80] shrink_slab at ffffffffa71dd6d5
#22 [ffff9b84c6d3bb28] do_try_to_free_pages at ffffffffa71e09ea
#23 [ffff9b84c6d3bba0] try_to_free_pages at ffffffffa71e0c3c
#24 [ffff9b84c6d3bc38] __alloc_pages_nodemask at ffffffffa71d45a1
#25 [ffff9b84c6d3bd68] copy_process at ffffffffa709cd45
#26 [ffff9b84c6d3bdf8] do_fork at ffffffffa709e791
#27 [ffff9b84c6d3be70] kernel_thread at ffffffffa709ea66
#28 [ffff9b84c6d3be80] kthreadd at ffffffffa70cc051
#29 [ffff9b84c6d3bf50] ret_from_fork_nospec_begin at ffffffffa77c51dd



     255 struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
     256                        void *data, int node,
     257                        const char namefmt[],
     258                        ...)
     259 {
...
     271     wake_up_process(kthreadd_task);
     272     wait_for_completion(&create.done);  <-- infinite loop as `kthreadd` is running in this call trace itself.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments