System got hung due to possible loss of communication with NFS server.
Issue
-
System got hung due to possible loss of communication with NFS server.
-
5 tasks are in UN state. 293 tasks are in ZO state:
crash> ps -S
RU: 6
UN: 5
IN: 1593
ZO: 293
- Almost all of ZO state tasks are of sshd:
crash> ps -m | grep ZO | less
[ 0 00:03:05.338] [ZO] PID: 5076 TASK: ffff8800414aedd0 CPU: 1 COMMAND: "sshd"
[ 0 00:04:12.527] [ZO] PID: 5071 TASK: ffff880041963ec0 CPU: 1 COMMAND: "sshd"
[ 0 00:09:12.986] [ZO] PID: 5057 TASK: ffff880041811f60 CPU: 2 COMMAND: "sshd"
[ 0 00:14:13.173] [ZO] PID: 5049 TASK: ffff88010fc2ce70 CPU: 0 COMMAND: "sshd"
[ 0 00:19:12.318] [ZO] PID: 5000 TASK: ffff88003c341f60 CPU: 1 COMMAND: "sshd"
[ 0 00:24:12.463] [ZO] PID: 4992 TASK: ffff8800414f0000 CPU: 1 COMMAND: "sshd"
[ 0 00:29:12.588] [ZO] PID: 4985 TASK: ffff880041592f10 CPU: 2 COMMAND: "sshd"
[ 0 00:34:12.803] [ZO] PID: 4970 TASK: ffff88003c340fb0 CPU: 0 COMMAND: "sshd"
[ 0 00:39:12.335] [ZO] PID: 4964 TASK: ffff8800414ade20 CPU: 2 COMMAND: "sshd"
[ 0 00:44:12.518] [ZO] PID: 4955 TASK: ffff8800414a8000 CPU: 1 COMMAND: "sshd"
[ 0 00:49:12.672] [ZO] PID: 4903 TASK: ffff88011146af10 CPU: 1 COMMAND: "sshd"
[ 0 00:54:12.783] [ZO] PID: 4891 TASK: ffff880040a72f10 CPU: 2 COMMAND: "sshd"
[ 0 00:59:12.943] [ZO] PID: 4884 TASK: ffff880040a9de20 CPU: 1 COMMAND: "sshd"
[ 0 01:04:13.065] [ZO] PID: 4874 TASK: ffff88003f9d3ec0 CPU: 2 COMMAND: "sshd"
[ 0 01:09:12.558] [ZO] PID: 4870 TASK: ffff88003f9d2f10 CPU: 0 COMMAND: "sshd"
[ 0 01:14:12.703] [ZO] PID: 4862 TASK: ffff880111c81f60 CPU: 0 COMMAND: "sshd"
[ 0 01:19:12.852] [ZO] PID: 4805 TASK: ffff88003f98af10 CPU: 1 COMMAND: "sshd"
[ 0 01:24:12.972] [ZO] PID: 4797 TASK: ffff88003f98bec0 CPU: 2 COMMAND: "sshd"
[ 0 01:29:13.143] [ZO] PID: 4791 TASK: ffff8801124b5e20 CPU: 0 COMMAND: "sshd"
[ 0 01:34:12.815] [ZO] PID: 4778 TASK: ffff88003c346dd0 CPU: 1 COMMAND: "sshd"
[ 0 01:39:12.405] [ZO] PID: 4770 TASK: ffff88011215af10 CPU: 1 COMMAND: "sshd"
[...]
- UN state tasks:
crash> ps -m | grep UN | tail
[ 1 01:33:20.032] [UN] PID: 1 TASK: ffff880139b20000 CPU: 3 COMMAND: "systemd"
[ 1 02:20:51.015] [UN] PID: 10636 TASK: ffff880137709f60 CPU: 3 COMMAND: "savscand"
[ 1 04:20:51.448] [UN] PID: 13159 TASK: ffff880028054e70 CPU: 2 COMMAND: "WL_ConfSys_pwcp"
[ 1 10:06:13.354] [UN] PID: 44 TASK: ffff88013902ce70 CPU: 1 COMMAND: "fsnotify_mark"
[ 1 10:06:13.335] [UN] PID: 13891 TASK: ffff880137e81f60 CPU: 2 COMMAND: "savscand"
- Backtrace of the oldest UN state task. Looks like it's waiting for the NFS I/O to be completed:
crash> bt 13891
PID: 13891 TASK: ffff880137e81f60 CPU: 2 COMMAND: "savscand"
#0 [ffff880089d63930] __schedule at ffffffff8168b225
#1 [ffff880089d63998] schedule at ffffffff8168b879
#2 [ffff880089d639a8] rpc_wait_bit_killable at ffffffffa02cee24 [sunrpc]
#3 [ffff880089d639c8] __wait_on_bit at ffffffff81689425
#4 [ffff880089d63a08] out_of_line_wait_on_bit at ffffffff816894d1
#5 [ffff880089d63a80] __rpc_wait_for_completion_task at ffffffffa02cedfd [sunrpc]
#6 [ffff880089d63a90] _nfs4_proc_open_confirm at ffffffffa065da08 [nfsv4]
#7 [ffff880089d63b18] _nfs4_open_and_get_state at ffffffffa0665e40 [nfsv4]
#8 [ffff880089d63bc0] nfs4_do_open at ffffffffa0666240 [nfsv4]
#9 [ffff880089d63c88] nfs4_atomic_open at ffffffffa0666747 [nfsv4]
#10 [ffff880089d63ce0] nfs4_file_open at ffffffffa067ae90 [nfsv4]
#11 [ffff880089d63d90] do_dentry_open at ffffffff811fbf07
#12 [ffff880089d63dd8] vfs_open at ffffffff811fc0df
#13 [ffff880089d63e00] dentry_open at ffffffff811fc1a9
#14 [ffff880089d63e38] fanotify_read at ffffffff8124652d
#15 [ffff880089d63f00] vfs_read at ffffffff811fe0ee
#16 [ffff880089d63f38] sys_read at ffffffff811fecbf
#17 [ffff880089d63f80] system_call_fastpath at ffffffff816967c9
RIP: 00007f9079eb122d RSP: 00007f90477fdd30 RFLAGS: 00000293
RAX: 0000000000000000 RBX: ffffffff816967c9 RCX: ffffffffffffffff
RDX: 000000000000002f RSI: 00007f90477fde00 RDI: 0000000000000042
RBP: 0000000058a85a69 R8: 0000000000000000 R9: 0000000000003643
R10: 0000000000000012 R11: 0000000000000293 R12: 0000000000003629
R13: 00007f9048000ef0 R14: 00007f9048000f58 R15: 00007f90477fde00
ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b
Environment
- Red Hat Enterprise Linux 7.3 (kernel-3.10.0-514.6.1.el7)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.