processes reading the same file on a cifs share block indefinitely on cifsInodeInfo.lock_sem semaphore after a network disruption or server restart
Issue
One or more processes block indefinitely on a cifs share after a network disruption or cifs server restart. The workload involves many processes reading the same file.
* Task is blocked with following logs:
[9011350.022859] CIFS VFS: Free previous auth_key.response = ffff8e4df93eec00
[9011360.019214] CIFS VFS: Free previous auth_key.response = ffff8e4e14ee43c0
[9011360.030211] CIFS VFS: Free previous auth_key.response = ffff8e4cddede3c0
[9011370.022694] CIFS VFS: Free previous auth_key.response = ffff8e4f70f20840
[9011370.023729] CIFS VFS: Send error in SessSetup = -11
[9011374.884548] CIFS VFS: Free previous auth_key.response = ffff8e4f70f20600
[9320401.323395] INFO: task fglrun:983 blocked for more than 120 seconds.
[9320401.324326] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[9320401.324659] fglrun D ffff8e4bd7e4c100 0 983 1 0x00000082
[9320401.324963] Call Trace:
[9320401.325263] [<ffffffffb8167c49>] schedule+0x29/0x70
[9320401.325591] [<ffffffffb8169535>] rwsem_down_write_failed+0x225/0x3a0
[9320401.325934] [<ffffffffc0556280>] ? free_rsp_buf+0x30/0x40 [cifs]
[9320401.326267] [<ffffffffb7d86c47>] call_rwsem_down_write_failed+0x17/0x30
[9320401.326625] [<ffffffffb8166f4d>] down_write+0x2d/0x3d
[9320401.326981] [<ffffffffc0549ec6>] cifsFileInfo_put+0x196/0x3a0 [cifs]
[9320401.327363] [<ffffffffc054a0ee>] cifs_close+0x1e/0x30 [cifs]
[9320401.327728] [<ffffffffb7c4364c>] __fput+0xec/0x260
[9320401.328096] [<ffffffffb7c438ae>] ____fput+0xe/0x10
[9320401.328541] [<ffffffffb7abe79b>] task_work_run+0xbb/0xe0
[9320401.328929] [<ffffffffb7a9dc61>] do_exit+0x2d1/0xa40
[9320401.329329] [<ffffffffb816f608>] ? __do_page_fault+0x228/0x500
[9320401.329722] [<ffffffffb8174d21>] ? system_call_after_swapgs+0xae/0x146
[9320401.330123] [<ffffffffb7a9e44f>] do_group_exit+0x3f/0xa0
[9320401.330548] [<ffffffffb7a9e4c4>] SyS_exit_group+0x14/0x20
[9320401.330963] [<ffffffffb8174ddb>] system_call_fastpath+0x22/0x27
[9320401.331404] [<ffffffffb8174d21>] ? system_call_after_swapgs+0xae/0x146
Environment
- Red Hat Enterprise Linux 7,8
- seen on 4.18.0-147.el8 plus fix for https://access.redhat.com/solutions/3446331
- seen on 3.10.0-957.1.3.el7
- cifs
- workload of many processes reading the same file
- network or server issues leading to cifs client to reconnect
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.