NFS Client hang experienced with IPoverIB setup
Issue
- NFS Client hang experienced with IPoverIB setup with Mellanox OFED 4.4, 4.5 and 4.6
- NFS tasks are hung waiting with TCP socket state CLOSE_WAIT
crash> ps -S
RU: 29
IN: 1343
WA: 1
UN: 11
All blocked tasks on the system were NFS ( iperf tests over TCP were successful without hang issue)
crash> ps -m | grep UN
[0 00:00:00.000] [UN] PID: 147171 TASK: ffff898304098010 CPU: 18 COMMAND: "ssh"
[0 00:00:03.077] [UN] PID: 148379 TASK: ffff89bcb54757e0 CPU: 4 COMMAND: "df"
[0 00:00:26.490] [UN] PID: 148336 TASK: ffff89a174f00010 CPU: 3 COMMAND: "bash"
[0 00:00:51.093] [UN] PID: 148196 TASK: ffff8995363f0010 CPU: 2 COMMAND: "ls"
[0 00:08:27.989] [UN] PID: 146168 TASK: ffff89bb48da34c0 CPU: 2 COMMAND: "dd"
[0 00:08:42.673] [UN] PID: 145899 TASK: ffff89bb48da4650 CPU: 6 COMMAND: "dd"
[0 00:08:42.959] [UN] PID: 147211 TASK: ffff89a442e857e0 CPU: 5 COMMAND: "df"
[0 00:08:44.780] [UN] PID: 147075 TASK: ffff899e57af91a0 CPU: 27 COMMAND: "agetit"
[0 00:08:45.143] [UN] PID: 146620 TASK: ffff89c1b7ecb4c0 CPU: 22 COMMAND: "agetit"
[0 00:08:45.107] [UN] PID: 147112 TASK: ffff89c0838dc650 CPU: 13 COMMAND: "chk_file"
[0 00:08:46.742] [UN] PID: 147007 TASK: ffff89c27fed34c0 CPU: 1 COMMAND: "dd" << Oldest D state task
== Oldest hung task
crash> bt
PID: 147007 TASK: ffff89c27fed34c0 CPU: 1 COMMAND: "dd"
#0 [ffff89c1bcf4fb78] __schedule at ffffffffa8968972
#1 [ffff89c1bcf4fc00] schedule at ffffffffa8968e19
#2 [ffff89c1bcf4fc10] rpc_wait_bit_killable at ffffffffc0510f14 [sunrpc]
#3 [ffff89c1bcf4fc30] __wait_on_bit at ffffffffa8966a97
#4 [ffff89c1bcf4fc70] out_of_line_wait_on_bit at ffffffffa8966c01
#5 [ffff89c1bcf4fce8] __rpc_wait_for_completion_task at ffffffffc0510eed [sunrpc]
#6 [ffff89c1bcf4fcf8] nfs4_do_close at ffffffffc1470527 [nfsv4]
#7 [ffff89c1bcf4fda8] __nfs4_close at ffffffffc148123d [nfsv4]
#8 [ffff89c1bcf4fde8] nfs4_close_sync at ffffffffc14822f8 [nfsv4]
#9 [ffff89c1bcf4fdf8] nfs4_close_context at ffffffffc1463e7d [nfsv4]
#10 [ffff89c1bcf4fe08] __put_nfs_open_context at ffffffffc06632df [nfs]
#11 [ffff89c1bcf4fe48] nfs_file_clear_open_context at ffffffffc0665514 [nfs]
#12 [ffff89c1bcf4fe78] nfs_file_release at ffffffffc066101b [nfs]
#13 [ffff89c1bcf4fe98] __fput at ffffffffa8443b4c
#14 [ffff89c1bcf4fee0] ____fput at ffffffffa8443dae
#15 [ffff89c1bcf4fef0] task_work_run at ffffffffa82be88b
#16 [ffff89c1bcf4ff30] do_notify_resume at ffffffffa822bc65
#17 [ffff89c1bcf4ff50] int_signal at ffffffffa8976134
Environment
- RHEL 7.4
- NFSV3, NFSV4
- MLNX_OFED_LINUX-4.4-2.0.7.0 (OFED-4.4-2.0.7)
- MLNX_OFED_LINUX-4.5-1.0.1.0 (OFED-4.5-1.0.1)
- MLNX_OFED_LINUX-4.6-1.0.1.1 (OFED-4.6-1.0.1)
- RDMA also seen in the setup but not directly related.
- GPFS
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.