System got hung due to possible loss of communication with NFS server.

Solution Unverified - Updated -

Issue

  • System got hung due to possible loss of communication with NFS server.

  • 5 tasks are in UN state. 293 tasks are in ZO state:

crash> ps -S
  RU: 6
  UN: 5
  IN: 1593
  ZO: 293
  • Almost all of ZO state tasks are of sshd:
crash> ps -m | grep ZO | less

[ 0 00:03:05.338] [ZO]  PID: 5076   TASK: ffff8800414aedd0  CPU: 1   COMMAND: "sshd"
[ 0 00:04:12.527] [ZO]  PID: 5071   TASK: ffff880041963ec0  CPU: 1   COMMAND: "sshd"
[ 0 00:09:12.986] [ZO]  PID: 5057   TASK: ffff880041811f60  CPU: 2   COMMAND: "sshd"
[ 0 00:14:13.173] [ZO]  PID: 5049   TASK: ffff88010fc2ce70  CPU: 0   COMMAND: "sshd"
[ 0 00:19:12.318] [ZO]  PID: 5000   TASK: ffff88003c341f60  CPU: 1   COMMAND: "sshd"
[ 0 00:24:12.463] [ZO]  PID: 4992   TASK: ffff8800414f0000  CPU: 1   COMMAND: "sshd"
[ 0 00:29:12.588] [ZO]  PID: 4985   TASK: ffff880041592f10  CPU: 2   COMMAND: "sshd"
[ 0 00:34:12.803] [ZO]  PID: 4970   TASK: ffff88003c340fb0  CPU: 0   COMMAND: "sshd"
[ 0 00:39:12.335] [ZO]  PID: 4964   TASK: ffff8800414ade20  CPU: 2   COMMAND: "sshd"
[ 0 00:44:12.518] [ZO]  PID: 4955   TASK: ffff8800414a8000  CPU: 1   COMMAND: "sshd"
[ 0 00:49:12.672] [ZO]  PID: 4903   TASK: ffff88011146af10  CPU: 1   COMMAND: "sshd"
[ 0 00:54:12.783] [ZO]  PID: 4891   TASK: ffff880040a72f10  CPU: 2   COMMAND: "sshd"
[ 0 00:59:12.943] [ZO]  PID: 4884   TASK: ffff880040a9de20  CPU: 1   COMMAND: "sshd"
[ 0 01:04:13.065] [ZO]  PID: 4874   TASK: ffff88003f9d3ec0  CPU: 2   COMMAND: "sshd"
[ 0 01:09:12.558] [ZO]  PID: 4870   TASK: ffff88003f9d2f10  CPU: 0   COMMAND: "sshd"
[ 0 01:14:12.703] [ZO]  PID: 4862   TASK: ffff880111c81f60  CPU: 0   COMMAND: "sshd"
[ 0 01:19:12.852] [ZO]  PID: 4805   TASK: ffff88003f98af10  CPU: 1   COMMAND: "sshd"
[ 0 01:24:12.972] [ZO]  PID: 4797   TASK: ffff88003f98bec0  CPU: 2   COMMAND: "sshd"
[ 0 01:29:13.143] [ZO]  PID: 4791   TASK: ffff8801124b5e20  CPU: 0   COMMAND: "sshd"
[ 0 01:34:12.815] [ZO]  PID: 4778   TASK: ffff88003c346dd0  CPU: 1   COMMAND: "sshd"
[ 0 01:39:12.405] [ZO]  PID: 4770   TASK: ffff88011215af10  CPU: 1   COMMAND: "sshd"

[...]

  • UN state tasks:
crash> ps -m | grep UN | tail
[ 1 01:33:20.032] [UN]  PID: 1      TASK: ffff880139b20000  CPU: 3   COMMAND: "systemd"
[ 1 02:20:51.015] [UN]  PID: 10636  TASK: ffff880137709f60  CPU: 3   COMMAND: "savscand"
[ 1 04:20:51.448] [UN]  PID: 13159  TASK: ffff880028054e70  CPU: 2   COMMAND: "WL_ConfSys_pwcp"
[ 1 10:06:13.354] [UN]  PID: 44     TASK: ffff88013902ce70  CPU: 1   COMMAND: "fsnotify_mark"
[ 1 10:06:13.335] [UN]  PID: 13891  TASK: ffff880137e81f60  CPU: 2   COMMAND: "savscand"
  • Backtrace of the oldest UN state task. Looks like it's waiting for the NFS I/O to be completed:
crash> bt 13891
PID: 13891  TASK: ffff880137e81f60  CPU: 2   COMMAND: "savscand"
 #0 [ffff880089d63930] __schedule at ffffffff8168b225
 #1 [ffff880089d63998] schedule at ffffffff8168b879
 #2 [ffff880089d639a8] rpc_wait_bit_killable at ffffffffa02cee24 [sunrpc]
 #3 [ffff880089d639c8] __wait_on_bit at ffffffff81689425
 #4 [ffff880089d63a08] out_of_line_wait_on_bit at ffffffff816894d1
 #5 [ffff880089d63a80] __rpc_wait_for_completion_task at ffffffffa02cedfd [sunrpc]
 #6 [ffff880089d63a90] _nfs4_proc_open_confirm at ffffffffa065da08 [nfsv4]
 #7 [ffff880089d63b18] _nfs4_open_and_get_state at ffffffffa0665e40 [nfsv4]
 #8 [ffff880089d63bc0] nfs4_do_open at ffffffffa0666240 [nfsv4]
 #9 [ffff880089d63c88] nfs4_atomic_open at ffffffffa0666747 [nfsv4]
#10 [ffff880089d63ce0] nfs4_file_open at ffffffffa067ae90 [nfsv4]
#11 [ffff880089d63d90] do_dentry_open at ffffffff811fbf07
#12 [ffff880089d63dd8] vfs_open at ffffffff811fc0df
#13 [ffff880089d63e00] dentry_open at ffffffff811fc1a9
#14 [ffff880089d63e38] fanotify_read at ffffffff8124652d
#15 [ffff880089d63f00] vfs_read at ffffffff811fe0ee
#16 [ffff880089d63f38] sys_read at ffffffff811fecbf
#17 [ffff880089d63f80] system_call_fastpath at ffffffff816967c9
    RIP: 00007f9079eb122d  RSP: 00007f90477fdd30  RFLAGS: 00000293
    RAX: 0000000000000000  RBX: ffffffff816967c9  RCX: ffffffffffffffff
    RDX: 000000000000002f  RSI: 00007f90477fde00  RDI: 0000000000000042
    RBP: 0000000058a85a69   R8: 0000000000000000   R9: 0000000000003643
    R10: 0000000000000012  R11: 0000000000000293  R12: 0000000000003629
    R13: 00007f9048000ef0  R14: 00007f9048000f58  R15: 00007f90477fde00
    ORIG_RAX: 0000000000000000  CS: 0033  SS: 002b

Environment

  • Red Hat Enterprise Linux 7.3 (kernel-3.10.0-514.6.1.el7)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content