RHEL7.3: PIDs blocked due to fscache hangs and kworker messages "CacheFiles: Error: Overlong wait for old active object to go away"
Issue
- The following CachFiles messages are seen in the messages file over and over:
CacheFiles: Error: Overlong wait for old active object to go away
[9165279.370604] CacheFiles: Error: Overlong wait for old active object to go away
[9165279.371138] CacheFiles: object: OBJc8052
[9165279.371753] CacheFiles: objstate=LOOK_UP_OBJECT fl=8 wbusy=2 ev=0[0]
[9165279.372330] CacheFiles: ops=0 inp=0 exc=0
[9165279.372939] CacheFiles: parent=ffff881ffe500480
[9165279.373567] CacheFiles: cookie=ffff881efb3f6210 [pr=ffff881fddb18000 nd=ffff883ffab06800 fl=22]
[9165279.374149] CacheFiles: key=[12] '030002000000000081498967'
[9165279.374782] CacheFiles: xobject: OBJbeeb6
[9165279.375397] CacheFiles: xobjstate=WAIT_FOR_CLEARANCE fl=30 wbusy=0 ev=0[10]
[9165279.375947] CacheFiles: xops=0 inp=0 exc=0
[9165279.376525] CacheFiles: xparent=ffff881ffe500480
[9165279.377133] CacheFiles: xcookie=ffff883d403c6d68 [pr=ffff881fddb18000 nd= (null) fl=18]
HPC grid jobs are getting blocked from finishing and keeping new jobs from being scheduled. One task blocked a very long time is stuck in __fscache_wait_on_invalidate
# cat /proc/31723/stack
[<ffffffffa07f7e8e>] __fscache_wait_on_invalidate+0x2e/0x30 [fscache]
[<ffffffffa0881d81>] nfs_invalidate_mapping+0x61/0x100 [nfs]
[<ffffffffa088249a>] __nfs_revalidate_mapping+0xfa/0x280 [nfs]
[<ffffffffa0882a73>] nfs_revalidate_mapping_protected+0x13/0x20 [nfs]
[<ffffffffa087efa4>] nfs_file_read+0x44/0xf0 [nfs]
[<ffffffff811fe0bd>] do_sync_read+0x8d/0xd0
[<ffffffff811fe86e>] vfs_read+0x9e/0x170
[<ffffffff811ff43f>] SyS_read+0x7f/0xe0
[<ffffffff81697709>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
Environment
- Red Hat Enterprise Linux 7.3 (NFS client)
- kernel-3.10.0-514.26.2.el7
- cachefilesd-0.10.9-1.el7
- NFSv3 with fscache enabled
- autofs used to mount the NFSv3 share with 'fsc'
- /etc/cachefilesd.conf contains default settings
- xfs filesystem over linear LVM volume is used for /var/cache/fscache but has some non-default settings (noatime,nodiratime)
/dev/mapper/vg-cachevol /var/cache/fscache xfs rw,seclabel,noatime,nodiratime,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota 0 0
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.