The server is getting hung up. Lots of tasks are calling fanotify_handle_event() and waiting for permission event response from userspace. The server crashes if hung_task_panic is enabled. Otherwise a hard reboot is the only resolution.

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 7.9
  • Red Hat Enterprise Linux 8.5 ~ 8.9

  • VMware Endpoint Security Solution (vsep/pool) which was formerly vShield Endpoint that works in conjunction with various security softwares (e.g. Trend Micro, McAfee, Symantec)

  • F-Secure (fsavd/fsaccd)
  • Microsoft Defender for Linux (wdavdaemon)
  • Kaspersky (kesl)
  • Twistlock Defender (one of the certified cloud native security for OpenShift operators)
  • Cybereason Sensor for Linux
  • TrendMicro Deep Security Agent for Linux
  • Airlock (airlock-client.service)

Issue

  • The server is getting hung up.

  • The server crashes if hung_task_panic is enabled. Otherwise a hard reboot is the only resolution once the server gets hung up completely.

  • Blocked task message observed in kernel ring buffer:

[13919.504321] INFO: task fsbspamd:19047 blocked for more than 120 seconds.
[13919.504396] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[13919.504433] fsbspamd        D ffff999c7fc1acc0     0 19047   1246 0x00000080
[13919.504458] Call Trace:
[13919.504595]  [<ffffffffb0b85da9>] schedule+0x29/0x70
[13919.504668]  [<ffffffffb0699d55>] fanotify_handle_event+0x265/0x3a0
[13919.504704]  [<ffffffffb0acd65c>] ? udp_send_skb+0xac/0x2b0
[13919.504749]  [<ffffffffb04c7780>] ? wake_up_atomic_t+0x30/0x30
[13919.504754]  [<ffffffffb0696128>] fsnotify+0x388/0x460
[13919.504786]  [<ffffffffb0707f3e>] security_file_open+0x6e/0x70
[13919.504822]  [<ffffffffb064a7b9>] do_dentry_open+0xc9/0x2d0
[13919.504829]  [<ffffffffb0707992>] ? security_inode_permission+0x22/0x30
[13919.504834]  [<ffffffffb064aa5a>] vfs_open+0x5a/0xb0
[13919.504853]  [<ffffffffb06590da>] ? may_open+0x5a/0x120
[13919.504860]  [<ffffffffb065d006>] do_last+0x1f6/0x1340
[13919.504866]  [<ffffffffb065e21d>] path_openat+0xcd/0x5a0
[13919.504872]  [<ffffffffb066046d>] do_filp_open+0x4d/0xb0
[13919.504884]  [<ffffffffb066e512>] ? __alloc_fd+0xc2/0x170
[13919.504890]  [<ffffffffb064c044>] do_sys_open+0x124/0x220
[13919.504896]  [<ffffffffb064c15e>] SyS_open+0x1e/0x20
[13919.504910]  [<ffffffffb0b92ed2>] system_call_fastpath+0x25/0x2a
  • Lots of tasks are calling fanotify_handle_event(), waiting for permission event response from userspace, and getting stuck in TASK_UNINTERRUPTIBLE sleep indefinitely.

  • At this time, a task, that is responsible for responding with permission event response back to the tasks, itself is waiting for permission event response from userspace as well, which is a deadlock condition caused by userspace bug.

Resolution

  • Please note that we Red Hat can't fix any userspace bug in the code that we do not maintain. What we can only try to do is to mitigate the userspace bug to make it less deadly.

  • On private BZ#1834767, our engineering team was trying to back-port some patches from upstream to not fix the buggy userspace but to make it less deadly by allowing the deadlocked process to be killed.

af6a51130626 fanotify: Fold dequeue_event() into process_access_response()
8c5544666c9d fanotify: Move locking inside get_one_event()
f7db89accc9c fsnotify: Create function to remove event from notification list
ca6f86998d81 fanotify: Simplify cleaning of access_list
40873284d710 fanotify: Track permission event state
fabf7f29b3e2 fanotify: Use interruptible wait when waiting for permission events
  • The BZ was closed without a resolution of errata because the bug did not meet the inclusion criteria for RHEL7 Maintenance Phase 2.
  • A possible workaround is to use NFSv3 instead of NFSv4 because a task from security software that is responsible for responding back to the hung tasks with permission event response is calling call_usermodehelper_exec() to ask a kworker to run /usr/sbin/request-key.
  • The NFSv4 protocol represents the local system's UID and GID values on the wire as strings of the form user [at] domain. The process of translating from UID to string and string to UID is referred to as "ID mapping." The /usr/sbin/nfsidmap program performs translations on behalf of the kernel. The kernel uses the request-key mechanism to perform an upcall. /usr/sbin/nfsidmap is invoked by /sbin/request-key, performs the translation, and initializes a key with the resulting information.
  • Please note that using NFSv3 cannot be a possible workaround unless the buggy userspace task, that is responsible for responding back to other tasks that are waiting for the fanotify permission event response, is stuck on NFSv4 mount.

  • On BZ#1897416, some fanotify patches, that would work to mitigate this sort of issues, were backported to RHEL8.6 and onwards. Please note, we didn't mean to fix any userspace bug with those patches. What we could only try to do with those patches was to make this sort of issues less deadly.

  • If you are facing this sort of issues (caused by some userspace bug) on RHEL7.x or RHEL8.x older than RHEL8.6, then please upgrade to RHEL8.6 or onwards (RHEL9 included) and try to check whether the issue is mitigated even slightly.
  • Also we would strongly recommend engaging the provider/support of the security software, that you are running, to check for some bug and fix for that bug (if any).

Important Notice

  • Please keep in mind that what we backported to RHEL8.6 and onwards are not for fixing any userspace bug. Those patches were backported with a hope that this sort of issues caused by some userspace bug would get less deadly.

Root Cause

  • What the tasks, that are requesting to open files and waiting for permission event response from userspace, are waiting for should be responded back from the task that is responsible to do that.

  • The task is, however, trying to open a file sitting on an NFS mount and thus waiting for a kworker to execute request-key.

  • The kworker is, however, waiting for permission event response from userspace to open and execute request-key, too.

  • Very likely, this is a userspace bug invoked by the security software using kernel's fanotify APIs/subsystem.

Diagnostic Steps

Analysis of the vmcore file from RHEL8.5 guest (runnning on VMware ESXi) in case of TrendMicro Deep Security Agent for Linux is the culprit:

The kernel (RHEL8.5) got hung up with lots of blocked task messages:

      KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/4.18.0-348.20.1.el8_5.x86_64/vmlinux
    DUMPFILE: /cores/retrace/tasks/105439176/crash/vmcore.vmsn
LOAD AVERAGE: 8579.33, 8577.50, 8573.11
     RELEASE: 4.18.0-348.20.1.el8_5.x86_64

        DMI_BIOS_VENDOR: VMware, Inc.
       DMI_BIOS_VERSION: VMW71.00V.16460286.B64.2006250725
          DMI_BIOS_DATE: 06/25/2020
         DMI_SYS_VENDOR: VMware, Inc.
       DMI_PRODUCT_NAME: VMware7,1

[661576.769543] INFO: task systemd:1 blocked for more than 120 seconds.
[661576.770733] INFO: task in:imjournal:1233 blocked for more than 120 seconds.
[661576.787841] INFO: task dsp.fwdpi.servi:1606 blocked for more than 120 seconds.
[661576.800733] INFO: task CSSLCertThread:1617 blocked for more than 120 seconds.
[661576.812642] INFO: task CInstalledSoftw:206870 blocked for more than 120 seconds.
[661576.823313] INFO: task pool-ds_am:1943 blocked for more than 120 seconds.
[661576.834765] INFO: task amqzmuc0:7142 blocked for more than 120 seconds.
[661576.845956] INFO: task amqzmuc0:7143 blocked for more than 120 seconds.
[661576.857811] INFO: task amqzmuc0:7144 blocked for more than 120 seconds.
[661699.647550] INFO: task systemd:1 blocked for more than 120 seconds.

crash> foreach UN ps -m | tail
[13 14:29:13.847] [UN]  PID: 7270     TASK: ffff890885419e40  CPU: 1    COMMAND: "JIT Sampler"
[13 14:29:44.947] [UN]  PID: 7144     TASK: ffff8908a03dbc80  CPU: 0    COMMAND: "amqzmuc0"
[13 14:29:44.937] [UN]  PID: 7143     TASK: ffff8908bfc59e40  CPU: 1    COMMAND: "amqzmuc0"
[13 14:29:44.937] [UN]  PID: 7142     TASK: ffff890846e5bc80  CPU: 1    COMMAND: "amqzmuc0"
[13 14:30:10.094] [UN]  PID: 1617     TASK: ffff890844f4bc80  CPU: 0    COMMAND: "CSSLCertThread"
[13 14:30:15.013] [UN]  PID: 206870   TASK: ffff890bfd541e40  CPU: 1    COMMAND: "CInstalledSoftw"
[13 14:30:16.659] [UN]  PID: 1606     TASK: ffff890846391e40  CPU: 0    COMMAND: "dsp.fwdpi.servi"
[13 14:30:18.683] [UN]  PID: 1233     TASK: ffff89084836dac0  CPU: 1    COMMAND: "in:imjournal"
[13 14:30:18.755] [UN]  PID: 1943     TASK: ffff8908a02d5ac0  CPU: 1    COMMAND: "pool-ds_am"
[13 14:30:18.842] [UN]  PID: 1        TASK: ffff8908436c3c80  CPU: 0    COMMAND: "systemd"

Taking the longest one:

PID: 1        TASK: ffff8908436c3c80  CPU: 0    COMMAND: "systemd"
 #0 [ffffa5158313fb30] __schedule at ffffffff9eb7b38d
 #1 [ffffa5158313fbc0] schedule at ffffffff9eb7b867
 #2 [ffffa5158313fbd0] fanotify_handle_event at ffffffff9e581476
 #3 [ffffa5158313fc38] fsnotify at ffffffff9e57d5f3
 #4 [ffffa5158313fcd0] do_dentry_open at ffffffff9e52c44e
 #5 [ffffa5158313fd00] path_openat at ffffffff9e540dfe
 #6 [ffffa5158313fdd8] do_filp_open at ffffffff9e543113
 #7 [ffffa5158313fee0] do_sys_open at ffffffff9e52dd24
 #8 [ffffa5158313ff38] do_syscall_64 at ffffffff9e2042bb
 #9 [ffffa5158313ff50] entry_SYSCALL_64_after_hwframe at ffffffff9ec000ad
    RIP: 00007f2f497ce301  RSP: 00007ffcba703870  RFLAGS: 00000293
    RAX: ffffffffffffffda  RBX: 00005648aefacbb0  RCX: 00007f2f497ce301
    RDX: 0000000000080000  RSI: 00005648aed96d40  RDI: 00000000ffffff9c
    RBP: 0000000000000008   R8: 0000000000000000   R9: 0000000000000001
    R10: 0000000000000000  R11: 0000000000000293  R12: 00007f2f4af3bee3
    R13: 00007f2f4af3bee3  R14: 0000000000000001  R15: 00005648ad8b956e
    ORIG_RAX: 0000000000000101  CS: 0033  SS: 002b

This one is trying to open this file and stuck waiting for permission event response from userspace:

crash> filename ffff8908a01db000
struct filename {
  name = 0xffff8908a01db020 "/sys/fs/cgroup/systemd/user.slice/user-0.slice/user@0.service/dev-mapper-rhel\\x2dswap.swap/cgroup.procs",
  uptr = 0x5648aed96d40 <error: Cannot access memory at address 0x5648aed96d40>,
  refcnt = 2,
  aname = 0xffff89084000f860,
  iname = 0xffff8908a01db020 "/sys/fs/cgroup/systemd/user.slice/user-0.slice/user@0.service/dev-mapper-rhel\\x2dswap.swap/cgroup.procs"
}

8580 tasks are waiting for the permission event response from userspace at this time:

crash> px &((struct fsnotify_group *)0xffff8908a0380b00)->fanotify_data.access_waitq
$17 = (wait_queue_head_t *) 0xffff8908a0380ba8

crash> waitq 0xffff8908a0380ba8 | wc -l
8580

We are waiting in this wait queue until the event->response is no longer 0.

crash> fanotify_perm_event_info ffff8908469934c0
struct fanotify_perm_event_info {
  fae = {
    fse = {
      list = {
        next = 0xffff890846e4c880,
        prev = 0xffff8908a0380b10
      },
      inode = 0xffff890849f88000,
      mask = 65536
    },
    path = {
      mnt = 0xffff890aefbfd3a0,
      dentry = 0xffff890b939a6600
    },
    pid = 0xffff890af0c5a200
  },
  response = 0, <<-------
  fd = 13
}

At this time, what are holding the fanotify type file handles are all the threads from TrendMicro Deep Security Agent for Linux as shown below:

crash> foreach files -R fanotify | awk '$1~/PID/{print "px ((struct file *)((struct task_struct *)0x" $4 ")->mm->exe_file)->f_path->dentry"}' >fanotify_tsk_exe.txt
crash> < fanotify_tsk_exe.txt | awk '$4~/dentry/{print $NF}' > fanotify_tsk_exe_dents.txt
crash> files -d < fanotify_tsk_exe_dents.txt | paste - - | awk '{print $NF}' | sort | uniq
/opt/ds_agent/ds_am

The filp's ->private_data is equal to the fsnotify_group of which wait queue containing the 8580 tasks that are stuck waiting for the permission event response from userspace.

crash> foreach files -R fanotify | awk '$NF~/fanotify/{print $2}' | sort | uniq
ffff8908462e3100

crash> px ((struct file *)0xffff8908462e3100)->private_data
$159 = (void *) 0xffff8908a0380b00

This is very likely a userspace bug of Deep Security Agent.

vmcore analysis in case of VMware Endpoint security (vsep/pool) contributing to the issue

Blocked task messages in kernel ring buffer:

crash> log | grep blocked
[73438.768248] INFO: task crond:1384 blocked for more than 120 seconds.
[73438.768542] INFO: task vsep:1440 blocked for more than 120 seconds.
[73438.769020] INFO: task vsep:1444 blocked for more than 120 seconds.
[73438.769421] INFO: task crond:9420 blocked for more than 120 seconds.
[73438.769785] INFO: task TaniumClient:13542 blocked for more than 120 seconds.
[73438.770193] INFO: task kworker/u4:3:19509 blocked for more than 120 seconds.
[73558.767852] INFO: task crond:1384 blocked for more than 120 seconds.
[73558.768881] INFO: task vsep:1444 blocked for more than 120 seconds.
[73558.768999] INFO: task crond:9420 blocked for more than 120 seconds.
[73558.769114] INFO: task TaniumClient:13542 blocked for more than 120 seconds.

Lots of UN state tasks calling fanotify_handle_event() and waiting.

crash> ps -S | awk 'NR==3'
  UN: 159

crash> foreach UN bt | awk '/#2 /{print $3,$5}' | sort | uniq -c | sort -nr
    158 fanotify_handle_event ffffffff97c99cf5
      1 schedule_timeout ffffffff98183891

The stack backtrace of the hung task is like this:

PID: 9420   TASK: ffff9020719c9070  CPU: 0   COMMAND: "crond"
 #0 [ffff9021393a3a68] __schedule at ffffffff981858da
 #1 [ffff9021393a3af8] schedule at ffffffff98185d89
 #2 [ffff9021393a3b08] fanotify_handle_event at ffffffff97c99cf5
 #3 [ffff9021393a3b80] fsnotify at ffffffff97c960c8
 #4 [ffff9021393a3c30] security_file_open at ffffffff97d07ede
 #5 [ffff9021393a3c50] do_dentry_open at ffffffff97c4a759
 #6 [ffff9021393a3c98] vfs_open at ffffffff97c4a9fa
 #7 [ffff9021393a3cc0] do_last at ffffffff97c5cfa6
 #8 [ffff9021393a3d70] path_openat at ffffffff97c5e1bd
 #9 [ffff9021393a3e08] do_filp_open at ffffffff97c6040d
#10 [ffff9021393a3ee0] do_sys_open at ffffffff97c4bfe4
#11 [ffff9021393a3f40] sys_open at ffffffff97c4c0fe
#12 [ffff9021393a3f50] system_call_fastpath at ffffffff98192ed2

The hung tasks are in the fsnotify group's access wait queue:

crash> px &((struct fsnotify_group *)0xffff9020b8c6fd00)->fanotify_data.access_waitq
$13 = (wait_queue_head_t *) 0xffff9020b8c6fdb8

crash> waitq 0xffff9020b8c6fdb8
PID: 1397   TASK: ffff9021358ce2a0  CPU: 0   COMMAND: "agetty"
PID: 9126   TASK: ffff9020bb9d8000  CPU: 1   COMMAND: "sshd"
PID: 8594   TASK: ffff902003eaa0e0  CPU: 0   COMMAND: "vsftpd"
PID: 8581   TASK: ffff9020a235d230  CPU: 0   COMMAND: "vsftpd"
PID: 8163   TASK: ffff90200361e2a0  CPU: 0   COMMAND: "vsftpd"
    ... skipping ...

crash> waitq 0xffff9020b8c6fdb8 | wc -l
158

The access list is empty. The events are still in the notification list because they are not retrieved by userspace yet.

crash> px &((struct fsnotify_group *)0xffff9020b8c6fd00)->fanotify_data.access_list
$83 = (struct list_head *) 0xffff9020b8c6fda8

crash> list -H 0xffff9020b8c6fda8
(empty)

crash> px &((struct fsnotify_group *)0xffff9020b8c6fd00)->notification_list
$16 = (struct list_head *) 0xffff9020b8c6fd18

crash> list -o fsnotify_event.list -s fsnotify_event.inode -H 0xffff9020b8c6fd18 | paste - - | awk '{print $NF}' | sort | uniq -c
      1 0xffff9020671f7d40
      1 0xffff902067a966c0
      1 0xffff90206b78ee40
      1 0xffff9020860d1400
     16 0xffff9020b20817c0
    131 0xffff9020b2081b80
      1 0xffff9020b21dd040
      1 0xffff9020b22e75c0
      1 0xffff9020b7639f40
      1 0xffff9020ba1a8c80
      1 0xffff9020ba224500
      1 0xffff9021361f1b80
      1 0xffff9021361f3980

At this time, a task named vsep (/usr/sbin/vsep) holds a fanotify type file handle and is trying to open a file sitting on an NFSv4 mount and thus calling call_usermodehelper_exec() to run request-key and waiting.

PID: 1440   TASK: ffff9021370ce2a0  CPU: 1   COMMAND: "vsep"
 #0 [ffff902136c475d8] __schedule at ffffffff981858da
 #1 [ffff902136c47668] schedule at ffffffff98185d89
 #2 [ffff902136c47678] schedule_timeout at ffffffff98183891
 #3 [ffff902136c47728] wait_for_completion at ffffffff9818613d
 #4 [ffff902136c47788] call_usermodehelper_exec at ffffffff97aba949
 #5 [ffff902136c477d0] call_sbin_request_key at ffffffff97cfefec
 #6 [ffff902136c478e8] request_key_and_link at ffffffff97cff55a
 #7 [ffff902136c479b8] request_key at ffffffff97cff7b7
 #8 [ffff902136c479f0] nfs_idmap_get_key at ffffffffc083d88f [nfsv4]
 #9 [ffff902136c47a50] nfs_map_name_to_uid at ffffffffc083e307 [nfsv4]
#10 [ffff902136c47ab0] nfs_fattr_map_and_free_names at ffffffffc083e54f [nfsv4]
#11 [ffff902136c47ae0] _nfs4_open_and_get_state at ffffffffc0826043 [nfsv4]
#12 [ffff902136c47b88] nfs4_do_open at ffffffffc082657c [nfsv4]
#13 [ffff902136c47c60] nfs4_atomic_open at ffffffffc0826af7 [nfsv4]
#14 [ffff902136c47cb8] nfs4_file_open at ffffffffc083bb96 [nfsv4]
#15 [ffff902136c47d68] do_dentry_open at ffffffff97c4a872
#16 [ffff902136c47db0] vfs_open at ffffffff97c4a9fa
#17 [ffff902136c47dd8] dentry_open at ffffffff97c4aa99
#18 [ffff902136c47e10] fanotify_read at ffffffff97c9a605
#19 [ffff902136c47ed8] vfs_read at ffffffff97c4cfbf
#20 [ffff902136c47f08] sys_read at ffffffff97c4de2f
#21 [ffff902136c47f50] system_call_fastpath at ffffffff98192ed2

crash> px ((struct file *)((struct task_struct *)0xffff9021370ce2a0)->mm->exe_file)->f_path.dentry
$5 = (struct dentry *) 0xffff902136061900

crash> files -d 0xffff902136061900
     DENTRY           INODE           SUPERBLK     TYPE PATH
ffff902136061900 ffff9020b22888c0 ffff902036730000 REG  /usr/sbin/vsep <<----- the task's exe file

crash> files 1440 | awk 'NR<4 || /fanotify/'
PID: 1440   TASK: ffff9021370ce2a0  CPU: 1   COMMAND: "vsep"
ROOT: /    CWD: /
 FD       FILE            DENTRY           INODE       TYPE PATH
  2 ffff9020b8c6fe00 ffff9020b2283d80 ffff902138c7f970 UNKN [fanotify] <<----- fanotify type file handle

crash> struct file.private_data ffff9020b8c6fe00
  private_data = 0xffff9020b8c6fd00 <<----- fanotify type file handle's filp->private_data matches the fsnotify group.

crash> files -d 0xffff9020b7465540
     DENTRY           INODE           SUPERBLK     TYPE PATH
ffff9020b7465540 ffff9020b2244e10 ffff9020b4ec0800 REG  /SANbkup/new06/os_backup_20200506.tgz <<----- vsep is trying to open this file.

crash> mount | awk 'NR==1 || /ffff9020b4ec0800/'
     MOUNT           SUPERBLK     TYPE   DEVNAME   DIRNAME
ffff9020b3eee900 ffff9020b4ec0800 nfs4   10.95.1.30:/SANbkup/new06 /SANbkup  <<----- the file is sitting on this NFSv4 mount.

At this time, a kworker, which is responsible for running request-key, is calling fanotify_handle_event() and waiting for permission event response from userspace as well as the other hung tasks.

PID: 19509  TASK: ffff9020a227e2a0  CPU: 0   COMMAND: "kworker/u4:3"
 #0 [ffff902002e0b9e8] __schedule at ffffffff981858da
 #1 [ffff902002e0ba78] schedule at ffffffff98185d89
 #2 [ffff902002e0ba88] fanotify_handle_event at ffffffff97c99cf5
 #3 [ffff902002e0bb00] fsnotify at ffffffff97c960c8
 #4 [ffff902002e0bbb0] security_file_open at ffffffff97d07ede
 #5 [ffff902002e0bbd0] do_dentry_open at ffffffff97c4a759
 #6 [ffff902002e0bc18] vfs_open at ffffffff97c4a9fa
 #7 [ffff902002e0bc40] do_last at ffffffff97c5cfa6
 #8 [ffff902002e0bcf0] path_openat at ffffffff97c5e1bd
 #9 [ffff902002e0bd88] do_filp_open at ffffffff97c6040d
#10 [ffff902002e0be58] do_open_exec at ffffffff97c54824
#11 [ffff902002e0be88] do_execve_common at ffffffff97c55792
#12 [ffff902002e0bf10] do_execve at ffffffff97c55e28
#13 [ffff902002e0bf20] ____call_usermodehelper at ffffffff97abafff

Usually, vsep receives a request to open a file, asks a scanner in the backend, which is probably a task named fpscand in this case, and then the scanner performs virus scan on the file, let vsep know whether it allows or denies to open the file. And then, if the scanner allows to open the file, vsep opens the file and responds with the file handle back to the task that is requesting to open the file, which is permission event response in this case.

crash> bt 1343
PID: 1343   TASK: ffff9020b73be2a0  CPU: 0   COMMAND: "fpscand"
 #0 [ffff9020b3ed7c68] __schedule at ffffffff981858da
 #1 [ffff9020b3ed7cf8] schedule at ffffffff98185d89
 #2 [ffff9020b3ed7d08] schedule_timeout at ffffffff98183891
 #3 [ffff9020b3ed7db0] inet_csk_accept at ffffffff980aaf90
 #4 [ffff9020b3ed7e20] inet_accept at ffffffff980db442
 #5 [ffff9020b3ed7e58] SYSC_accept4 at ffffffff98032fc9
 #6 [ffff9020b3ed7f30] sys_accept4 at ffffffff9803558e
 #7 [ffff9020b3ed7f40] sys_accept at ffffffff980355a0
 #8 [ffff9020b3ed7f50] system_call_fastpath at ffffffff98192ed2
    RIP: 000000000052d210  RSP: 00007ffd4472fa48  RFLAGS: 00010246
    RAX: 000000000000002b  RBX: 00000000000027d8  RCX: ffffffff98192e15
    RDX: 00007ffd4472fc5c  RSI: 00007ffd4472fcc0  RDI: 0000000000000003
    RBP: 00007ffd4472fc5c   R8: 000000000161f850   R9: 00000000ffffffff
    R10: 000000000161f8e0  R11: 0000000000000246  R12: 0000000000000003
    R13: 00007ffd4472fc60  R14: 00007ffd4472fc78  R15: 0000000000000000
    ORIG_RAX: 000000000000002b  CS: 0033  SS: 002b

vsep is however trying to open a file sitting on the NFSv4 mount and thus waiting for the completion of request-key which should be completed by kworker. kworker is however waiting for permission event response from userspace to open and execute request-key. Very likely, this is a deadlock condition caused by the userspace bug.

vmcore analysis in case of F-Secure fsavd/fsaccd contributing the issue

Blocked task messages in kernel ring buffer. Kernel panic occurs since hung_task_panic is enabled in this case:

crash> log | grep locked
[13919.504321] INFO: task fsbspamd:19047 blocked for more than 120 seconds.
[13919.506308] Kernel panic - not syncing: hung_task: blocked tasks

crash> p sysctl_hung_task_panic
sysctl_hung_task_panic = $1 = 1

Lots of UN state tasks calling fanotify_handle_event() and waiting.

crash> ps -S | awk 'NR==3'
  UN: 22

crash> foreach UN bt | awk '/#2 /{print $3,$5}' | sort | uniq -c | sort -nr
     20 fanotify_handle_event ffffffffb0699d55
      2 schedule_timeout ffffffffb0b838b1

The stack backtrace of the hung task is like this:

PID: 19050  TASK: ffff999c183d1070  CPU: 3   COMMAND: "fsbspamd"
 #0 [ffff999c18383a68] __schedule at ffffffffb0b858fa
 #1 [ffff999c18383af8] schedule at ffffffffb0b85da9
 #2 [ffff999c18383b08] fanotify_handle_event at ffffffffb0699d55
 #3 [ffff999c18383b80] fsnotify at ffffffffb0696128
 #4 [ffff999c18383c30] security_file_open at ffffffffb0707f3e
 #5 [ffff999c18383c50] do_dentry_open at ffffffffb064a7b9
 #6 [ffff999c18383c98] vfs_open at ffffffffb064aa5a
 #7 [ffff999c18383cc0] do_last at ffffffffb065d006
 #8 [ffff999c18383d70] path_openat at ffffffffb065e21d
 #9 [ffff999c18383e08] do_filp_open at ffffffffb066046d
#10 [ffff999c18383ee0] do_sys_open at ffffffffb064c044
#11 [ffff999c18383f40] sys_open at ffffffffb064c15e
#12 [ffff999c18383f50] system_call_fastpath at ffffffffb0b92ed2

The hung tasks are in the fsnotify group's access wait queue:

crash> px &((struct fsnotify_group *)0xffff999c6f857800)->fanotify_data.access_waitq
$17 = (wait_queue_head_t *) 0xffff999c6f8578b8

crash> waitq 0xffff999c6f8578b8
PID: 1319   TASK: ffff999c725320e0  CPU: 2   COMMAND: "in:imjournal"
PID: 3722   TASK: ffff999c6efc9070  CPU: 2   COMMAND: "mjnetcln"
PID: 3909   TASK: ffff999c70afe2a0  CPU: 3   COMMAND: "tskussrv"
PID: 19050  TASK: ffff999c183d1070  CPU: 3   COMMAND: "fsbspamd"
PID: 19049  TASK: ffff999976570000  CPU: 0   COMMAND: "fsbspamd"
PID: 19048  TASK: ffff9999fb0a5230  CPU: 1   COMMAND: "fsbspamd"
PID: 19047  TASK: ffff999c717462a0  CPU: 0   COMMAND: "fsbspamd"
PID: 19046  TASK: ffff999c28d98000  CPU: 0   COMMAND: "fsbspamd"
PID: 19045  TASK: ffff999c183d20e0  CPU: 3   COMMAND: "fsbspamd"
PID: 19044  TASK: ffff999c183d5230  CPU: 3   COMMAND: "fsbspamd"
PID: 19043  TASK: ffff999c28d99070  CPU: 0   COMMAND: "fsbspamd"
PID: 19042  TASK: ffff999c183d0000  CPU: 3   COMMAND: "fsbspamd"
PID: 19041  TASK: ffff999c183d3150  CPU: 3   COMMAND: "fsbspamd"
PID: 19017  TASK: ffff999c18170000  CPU: 2   COMMAND: "java"
PID: 19012  TASK: ffff999975fc3150  CPU: 1   COMMAND: "python2"
PID: 19005  TASK: ffff999c181741c0  CPU: 2   COMMAND: "mppmon"
PID: 4211   TASK: ffff999c45d020e0  CPU: 0   COMMAND: "java"
PID: 1392   TASK: ffff9999fb1f41c0  CPU: 3   COMMAND: "fsicapd"
PID: 18988  TASK: ffff999c181762a0  CPU: 2   COMMAND: "kworker/u8:2"
PID: 18965  TASK: ffff999c183d41c0  CPU: 1   COMMAND: "csh"

The access list is empty. The events are still in the notification list because they are not retrieved by userspace yet.

crash> px &((struct fsnotify_group *)0xffff999c6f857800)->fanotify_data.access_list
$102 = (struct list_head *) 0xffff999c6f8578a8

crash> list -o fanotify_perm_event_info.fae -s fanotify_perm_event_info -H 0xffff999c6f8578a8
(empty)

crash> px &((struct fsnotify_group *)0xffff999c6f857800)->notification_list
$100 = (struct list_head *) 0xffff999c6f857818

crash> list -o fsnotify_event.list -s fsnotify_event.inode -H 0xffff999c6f857818 | paste - - 
ffff999c182ab880      inode = 0xffff999c6f3f1400
ffff9999f66b2400      inode = 0xffff9999dccaa6c0
ffff9999fb131400      inode = 0xffff999976364500
ffff999c182ab700      inode = 0xffff99997611f200
ffff9999f654d280      inode = 0xffff999c6f16cc80
ffff999c182ab2c0      inode = 0xffff999c45adb5c0
ffff9999f66b2b80      inode = 0xffff999976169b80
ffff9999f66b2080      inode = 0xffff999976169b80
ffff999c18015bc0      inode = 0xffff999976169b80
ffff9999f66b2fc0      inode = 0xffff999976169b80
ffff9999f66b2e40      inode = 0xffff999976169b80
ffff999c18015000      inode = 0xffff999976169b80
ffff999c18015d40      inode = 0xffff999976169b80
ffff9999f654df80      inode = 0xffff999976169b80
ffff999c18015940      inode = 0xffff999976169b80
ffff9999f66b2000      inode = 0xffff999976169b80
ffff9999f66b2700      inode = 0xffff999c358335c0
ffff999c182ab740      inode = 0xffff999c6f308140
ffff999c182ab840      inode = 0xffff9999dcca9040

At this time, one of the hung tasks are trying to open a file sitting on an NFSv4 mount, calling fanotify_handle_event() and waiting for permission event response from userspace:

PID: 18965  TASK: ffff999c183d41c0  CPU: 1   COMMAND: "csh"
 #0 [ffff999c51b37a68] __schedule at ffffffffb0b858fa
 #1 [ffff999c51b37af8] schedule at ffffffffb0b85da9
 #2 [ffff999c51b37b08] fanotify_handle_event at ffffffffb0699d55
 #3 [ffff999c51b37b80] fsnotify at ffffffffb0696128
 #4 [ffff999c51b37c30] security_file_open at ffffffffb0707f3e
 #5 [ffff999c51b37c50] do_dentry_open at ffffffffb064a7b9
 #6 [ffff999c51b37c98] vfs_open at ffffffffb064aa5a
 #7 [ffff999c51b37cc0] do_last at ffffffffb065d006
 #8 [ffff999c51b37d70] path_openat at ffffffffb065e21d
 #9 [ffff999c51b37e08] do_filp_open at ffffffffb066046d
#10 [ffff999c51b37ee0] do_sys_open at ffffffffb064c044
#11 [ffff999c51b37f40] sys_open at ffffffffb064c15e
#12 [ffff999c51b37f50] system_call_fastpath at ffffffffb0b92ed2
    ...

crash> files -d 0xffff999c41548e40
     DENTRY           INODE           SUPERBLK     TYPE PATH
ffff999c41548e40 ffff999c70d405e0 ffff9999fb06f800 REG  /usr2/ft21/.tcshrc

crash> struct file.f_path ffff999c71de9000
  f_path = {
    mnt = 0xffff999c71cb1b20, <<-----
    dentry = 0xffff999c41548e40
  }

crash> px 0xffff999c71cb1b20-0x20
$12 = 0xffff999c71cb1b00 <<-----

crash> mount | awk 'NR==1 || /ffff999c71cb1b00/'
     MOUNT           SUPERBLK     TYPE   DEVNAME   DIRNAME
ffff999c71cb1b00 ffff9999fb06f800 nfs4   10.10.10.10:/usr2 /usr2 <<----- the file is on this NFSv4 mount.

F-Secure security software is running on this system. At this time, fsaccd, which is responsible for responding with permission event response back to the task, is trying to open the file sitting on the NFSv4 mount and thus waiting for the completion of request-key.

PID: 1141   TASK: ffff999c72d241c0  CPU: 2   COMMAND: "fsaccd"
 #0 [ffff9999fb09f5d8] __schedule at ffffffffb0b858fa
 #1 [ffff9999fb09f668] schedule at ffffffffb0b85da9
 #2 [ffff9999fb09f678] schedule_timeout at ffffffffb0b838b1
 #3 [ffff9999fb09f728] wait_for_completion at ffffffffb0b8615d
 #4 [ffff9999fb09f788] call_usermodehelper_exec at ffffffffb04ba949
 #5 [ffff9999fb09f7d0] call_sbin_request_key at ffffffffb06ff04c
 #6 [ffff9999fb09f8e8] request_key_and_link at ffffffffb06ff5ba
 #7 [ffff9999fb09f9b8] request_key at ffffffffb06ff817
 #8 [ffff9999fb09f9f0] nfs_idmap_get_key at ffffffffc065a88f [nfsv4]
 #9 [ffff9999fb09fa50] nfs_map_name_to_uid at ffffffffc065b307 [nfsv4]
#10 [ffff9999fb09fab0] nfs_fattr_map_and_free_names at ffffffffc065b54f [nfsv4]
#11 [ffff9999fb09fae0] _nfs4_open_and_get_state at ffffffffc0643043 [nfsv4]
#12 [ffff9999fb09fb88] nfs4_do_open at ffffffffc064357c [nfsv4]
#13 [ffff9999fb09fc60] nfs4_atomic_open at ffffffffc0643af7 [nfsv4]
#14 [ffff9999fb09fcb8] nfs4_file_open at ffffffffc0658b96 [nfsv4]
#15 [ffff9999fb09fd68] do_dentry_open at ffffffffb064a8d2
#16 [ffff9999fb09fdb0] vfs_open at ffffffffb064aa5a
#17 [ffff9999fb09fdd8] dentry_open at ffffffffb064aaf9
#18 [ffff9999fb09fe10] fanotify_read at ffffffffb069a665
#19 [ffff9999fb09fed8] vfs_read at ffffffffb064d01f
#20 [ffff9999fb09ff08] sys_read at ffffffffb064de8f
#21 [ffff9999fb09ff50] system_call_fastpath at ffffffffb0b92ed2
    ...

fsaccd holds a fanotify type file handle for the fsnotify group 0xffff999c6f857800 at this time:

crash> files 1141 | grep fanotify
  7 ffff999c6f857600 ffff999c6f24ea80 ffff999c7305e060 UNKN [fanotify]

crash> struct file.private_data ffff999c6f857600
  private_data = 0xffff999c6f857800 <<----- the fsnotify group

It's this kworker that should complete request-key.

PID: 18988  TASK: ffff999c181762a0  CPU: 2   COMMAND: "kworker/u8:2"
 #0 [ffff9999dc9cb9e8] __schedule at ffffffffb0b858fa
 #1 [ffff9999dc9cba78] schedule at ffffffffb0b85da9
 #2 [ffff9999dc9cba88] fanotify_handle_event at ffffffffb0699d55
 #3 [ffff9999dc9cbb00] fsnotify at ffffffffb0696128
 #4 [ffff9999dc9cbbb0] security_file_open at ffffffffb0707f3e
 #5 [ffff9999dc9cbbd0] do_dentry_open at ffffffffb064a7b9
 #6 [ffff9999dc9cbc18] vfs_open at ffffffffb064aa5a
 #7 [ffff9999dc9cbc40] do_last at ffffffffb065d006
 #8 [ffff9999dc9cbcf0] path_openat at ffffffffb065e21d
 #9 [ffff9999dc9cbd88] do_filp_open at ffffffffb066046d
#10 [ffff9999dc9cbe58] do_open_exec at ffffffffb0654884
#11 [ffff9999dc9cbe88] do_execve_common at ffffffffb06557f2
#12 [ffff9999dc9cbf10] do_execve at ffffffffb0655e88
#13 [ffff9999dc9cbf20] ____call_usermodehelper at ffffffffb04bafff

The kworker is however trying to open /usr/sbin/request-key to execute and waiting for permission event response from userspace. The response from userspace should be performed by fsaccd. fsaccd is, however, waiting for that kworker to complete the execution of request-key. Very likely, this is a deadlock condition caused by a userspace bug.

When fsaccd receives a request for a file open, it asks fsavd for virus scan on the file. fsavd runs virus scan on the file accordingly, and then let fsaccd know whether it allows or denies to open the file. If fsavd allows to open the file, fsaccd opens it and responds with the file handle to the task that is requesting for the file open, which is i.e. the permission event response in this case.

PID: 1237   TASK: ffff9999fb0a1070  CPU: 2   COMMAND: "fsavd"
 #0 [ffff999c7286fe98] __schedule at ffffffffb0b858fa
 #1 [ffff999c7286ff28] schedule at ffffffffb0b85da9
 #2 [ffff999c7286ff38] sys_pause at ffffffffb04b57fc
 #3 [ffff999c7286ff50] system_call_fastpath at ffffffffb0b92ed2
    RIP: 00007f0e78930f40  RSP: 00007ffc25df0b98  RFLAGS: 00000246
    RAX: 0000000000000022  RBX: 0000000000000000  RCX: ffffffffffffffff
    RDX: 0000000000000004  RSI: 00007ffc25df0bac  RDI: 0000000000000004
    RBP: 00007ffc25df2f06   R8: 0000000000000001   R9: 00007f0e79458740
    R10: 00007ffc25df0620  R11: 0000000000000246  R12: 00007ffc25df0d08
    R13: 00007ffc25df0ce0  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: 0000000000000022  CS: 0033  SS: 002b
  • With patches from upstream fabf7f29b3e2 and 40873284d710, a task that is requesting to open a file waits for permission event response with wait_event_killable() interruptibely and then returns -ERESTARTSYS (-512) if the situation calls for it. And then, if event->state is FAN_EVENT_REPORTED, which means that the event reported to userspace and no answer yet, event->state is replaced with FAN_EVENT_CANCELED subsequently, and thus fanotify_get_response() returns. The tasks waiting for permission event responses no longer block in TASK_UNINTERRUPTIBLE sleep forever. The server no longer gets hung up either.

Another vmcore analysis

The data shows that the kernel crashed due to a blocked task where hung_task_panic sysctl knob was being enabled.

      KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-1160.2.1.el7.x86_64/vmlinux
    DUMPFILE: /cores/retrace/tasks/634345889/crash/vmcore  [PARTIAL DUMP]
     RELEASE: 3.10.0-1160.2.1.el7.x86_64
       PANIC: "Kernel panic - not syncing: hung_task: blocked tasks"

        DMI_BIOS_VENDOR: Phoenix Technologies LTD
       DMI_BIOS_VERSION: 6.00
          DMI_BIOS_DATE: 09/19/2018
         DMI_SYS_VENDOR: VMware, Inc.
       DMI_PRODUCT_NAME: VMware Virtual Platform
       DMI_BOARD_VENDOR: Intel Corporation
         DMI_BOARD_NAME: 440BX Desktop Reference Platform
       DMI_CHASSIS_TYPE: 1

crash> log | grep blocked
[1382263.605883] INFO: task pool:1584 blocked for more than 120 seconds.
[1382263.607562] Kernel panic - not syncing: hung_task: blocked tasks

crash> p sysctl_hung_task_panic
sysctl_hung_task_panic = $1 = 1

crash> p sysctl_hung_task_timeout_secs
sysctl_hung_task_timeout_secs = $2 = 120

There is one task that is getting stuck in TASK_UNINTERRUPTIBLE sleep state for more than 120 seconds.

crash> foreach UN ps -m
[ 0 00:00:00.000] [UN]  PID: 1330   TASK: ffff8dfce0bee300  CPU: 3   COMMAND: "in:imjournal"
[ 0 00:00:06.375] [UN]  PID: 44285  TASK: ffff8df97b7e9080  CPU: 1   COMMAND: "java"
[ 0 00:02:16.275] [UN]  PID: 1584   TASK: ffff8dfcea9d3180  CPU: 3   COMMAND: "pool"

PID: 1584   TASK: ffff8dfcea9d3180  CPU: 3   COMMAND: "pool"
 #0 [ffff8dfa3b7cbb68] __schedule at ffffffffbc786cc7
 #1 [ffff8dfa3b7cbbf8] schedule_preempt_disabled at ffffffffbc788109
 #2 [ffff8dfa3b7cbc08] __mutex_lock_slowpath at ffffffffbc786077
 #3 [ffff8dfa3b7cbc68] mutex_lock at ffffffffbc78544f
 #4 [ffff8dfa3b7cbc80] lookup_slow at ffffffffbc77df5d
 #5 [ffff8dfa3b7cbcb8] path_lookupat at ffffffffbc25cffe
 #6 [ffff8dfa3b7cbd50] filename_lookup at ffffffffbc25d05b
    ...

The task is looking up a file named 18029-lms.log sitting on the following NFSv4 mount to obtain information about the file.

crash> nameidata.path.dentry,last.name ffff8dfa3b7cbd90
  path.dentry = 0xffff8dfc3b799080
  last.name = 0xffff8dfc2920c03b "18029-lms.log"

crash> files -d 0xffff8dfc3b799080
     DENTRY           INODE           SUPERBLK     TYPE PATH
ffff8dfc3b799080 ffff8dfce11a2288 ffff8df97a653000 DIR  /talend/tmp/jobservers/dev/18029-lms/logs

crash> mount | awk 'NR==1 || /ffff8df97a653000/'
     MOUNT           SUPERBLK     TYPE   DEVNAME   DIRNAME
ffff8dfa22b93480 ffff8df97a653000 nfs4   nfssrv:/talend-share/jobservers/dev /talend/tmp

1502 static int lookup_slow(struct nameidata *nd, struct path *path)
1503 {
1504         struct dentry *dentry, *parent;
1505         int err;
1506 
1507         parent = nd->path.dentry;
1508         BUG_ON(nd->inode != parent->d_inode);
1509 
1510         mutex_lock(&parent->d_inode->i_mutex); <<-----
....

We need to get the parent dentry's d_inode mutex before getting the dentry of the file:

crash> px &((struct nameidata *)0xffff8dfa3b7cbd90)->path.dentry->d_inode->i_mutex
$5 = (struct mutex *) 0xffff8dfce11a2330

At this time, the following task has taken that inode mutex and then is waiting for fanotify permission event response from the userspace to open the file.

crash> px ((struct nameidata *)0xffff8dfa3b7cbd90)->path.dentry->d_inode->i_mutex.owner
$6 = (struct task_struct *) 0xffff8df97b7e9080

PID: 44285  TASK: ffff8df97b7e9080  CPU: 1   COMMAND: "java"
 #0 [ffff8dfc0b09f9c8] __schedule at ffffffffbc786cc7
 #1 [ffff8dfc0b09fa58] schedule at ffffffffbc7871e9
 #2 [ffff8dfc0b09fa68] fanotify_handle_event at ffffffffbc29a855
 #3 [ffff8dfc0b09fae0] fsnotify at ffffffffbc296c28
 #4 [ffff8dfc0b09fb98] security_file_open at ffffffffbc308bee
 #5 [ffff8dfc0b09fbb8] do_dentry_open at ffffffffbc24b159
 #6 [ffff8dfc0b09fc00] finish_open at ffffffffbc24b38f
 #7 [ffff8dfc0b09fc18] nfs_atomic_open at ffffffffc080e35e [nfs]
 #8 [ffff8dfc0b09fcc0] do_last at ffffffffbc25e293
 #9 [ffff8dfc0b09fd70] path_openat at ffffffffbc25ec4d
#10 [ffff8dfc0b09fe08] do_filp_open at ffffffffbc260e9d
    ...

crash> dis -lr ffffffffbc25e293
    ...
/usr/src/debug/kernel-3.10.0-1160.2.1.el7/linux-3.10.0-1160.2.1.el7.x86_64/fs/namei.c: 3144
0xffffffffbc25dabe <do_last+638>:   mov    0x48(%rsp),%rax
0xffffffffbc25dac3 <do_last+643>:   mov    0x30(%rax),%rax
0xffffffffbc25dac7 <do_last+647>:   lea    0xa8(%rax),%rdi
0xffffffffbc25dace <do_last+654>:   mov    %rax,0x30(%rsp)
0xffffffffbc25dad3 <do_last+659>:   callq  0xffffffffbc785430 <mutex_lock>
    ...

3058 static int do_last(struct nameidata *nd, struct path *path,
3059                    struct file *file, const struct open_flags *op,
3060                    int *opened, struct filename *name)
3061 {
3062         struct dentry *dir = nd->path.dentry;
....
3133 retry_lookup:
....
3144         mutex_lock(&dir->d_inode->i_mutex); <<----- The task has taken the inode mutex here.
3145         error = lookup_open(nd, path, file, op, got_write, opened);
....

The filp that this task is trying to open is:

crash> struct file.f_path.dentry ffff8dfc4bed7100
  f_path.dentry = 0xffff8dfc1aba3c80

crash> files -d 0xffff8dfc1aba3c80
     DENTRY           INODE           SUPERBLK     TYPE PATH
ffff8dfc1aba3c80 ffff8dfce1189e70 ffff8df97a653000 REG  /talend/tmp/jobservers/dev/18029-lms/logs/18029-lms.log

The inode that needs to take fanotify permission event response from the userspace is:

crash> px  ((struct file *)0xffff8dfc4bed7100)->f_path.dentry->d_inode
$25 = (struct inode *) 0xffff8dfce1189e70

325 int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
326              const unsigned char *file_name, u32 cookie)
327 {
...

    Args passed to fsnotify() are:

    crash> px  ((struct file *)0xffff8dfc4bed7100)->f_path.dentry->d_inode
    $25 = (struct inode *) 0xffff8dfce1189e70 <<----- to_tell

    FS_OPEN_PERM <<----- mask

    crash> px  &((struct file *)0xffff8dfc4bed7100)->f_path
    $26 = (struct path *) 0xffff8dfc4bed7110 <<----- data

    FSNOTIFY_EVENT_PATH <<----- data_is
    0 <<----- cookie 

    NULL <<----- file_name

...
377         while (fsnotify_iter_select_report_types(&iter_info)) {
378                 ret = send_to_group(to_tell, mask, data, data_is, cookie,
379                                     file_name, &iter_info);
...
    193 static int send_to_group(struct inode *to_tell,
    194                          __u32 mask, const void *data,
    195                          int data_is, u32 cookie,
    196                          const unsigned char *file_name,
    197                          struct fsnotify_iter_info *iter_info)
    198 {
    ...
    241         return group->ops->handle_event(group, to_tell, mask, data, data_is,
    ...

    crash> px ((struct fsnotify_group *)0xffff8dfcdf451b00)->ops->handle_event
    $34 = (int (*)(struct fsnotify_group *, struct inode *, u32, const void *, int, const unsigned char *, u32, struct fsnotify_iter_info *)) 0xffffffffbc29a5f0

    crash> sym 0xffffffffbc29a5f0
    ffffffffbc29a5f0 (t) fanotify_handle_event /usr/src/debug/kernel-3.10.0-1160.2.1.el7/linux-3.10.0-1160.2.1.el7.x86_64/fs/notify/fanotify/fanotify.c: 180

        175 static int fanotify_handle_event(struct fsnotify_group *group,
        176                                  struct inode *inode,
        177                                  u32 mask, const void *data, int data_type,
        178                                  const unsigned char *file_name, u32 cookie,
        179                                  struct fsnotify_iter_info *iter_info)
        180 {
        ...
        230 #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
        231         if (mask & FAN_ALL_PERM_EVENTS) {
        232                 ret = fanotify_get_response(group, FANOTIFY_PE(fsn_event),
        233                                             iter_info);
        ...
             59 static int fanotify_get_response(struct fsnotify_group *group,
             60                                  struct fanotify_perm_event_info *event,
             61                                  struct fsnotify_iter_info *iter_info)
             62 {
             63         int ret;
             64 
             65         pr_debug("%s: group=%p event=%p\n", __func__, group, event);
             66 
             67         wait_event(group->fanotify_data.access_waitq, event->response); <<----- we are waiting here.
             ..

The fanotify group's notification list is empty:

crash> px &((struct fsnotify_group *)0xffff8dfcdf451b00)->notification_list
$45 = (struct list_head *) 0xffff8dfcdf451b18

crash> list -H 0xffff8dfcdf451b18 -o fanotify_perm_event_info.fae -s fanotify_perm_event_info
(empty)

The fanotify group's notification list is empty because the events have been retrieved by userspace and moved to the access_list:

crash> px &((struct fsnotify_group *)0xffff8dfcdf451b00)->fanotify_data.access_list
$44 = (struct list_head *) 0xffff8dfcdf451ba8

crash> list -o fanotify_perm_event_info.fae -s fanotify_perm_event_info -H 0xffff8dfcdf451ba8
ffff8df9770d0bc0
struct fanotify_perm_event_info {
  fae = {
    fse = {
      list = {
        next = 0xffff8dfc78505ac0, 
        prev = 0xffff8dfcdf451ba8
      }, 
      inode = 0xffff8dfce1189e70, 
      mask = 0x10000
    }, 
    path = {
      mnt = 0xffff8dfa22b934a0, 
      dentry = 0xffff8dfc1aba3c80
    }, 
    tgid = 0xffff8dfc404e7d80
  }, 
  response = 0x0, <<-----
  fd = 0x8
}
ffff8dfc78505ac0
struct fanotify_perm_event_info {
  fae = {
    fse = {
      list = {
        next = 0xffff8dfcdf451ba8, 
        prev = 0xffff8df9770d0bc0
      }, 
      inode = 0xffff8dfc3cac08c8, 
      mask = 0x10000
    }, 
    path = {
      mnt = 0xffff8dfa217be2a0, 
      dentry = 0xffff8dfc1a935500
    }, 
    tgid = 0xffff8df97bb39680
  }, 
  response = 0x0, <<-----
  fd = 0x7
}

There are 2 inodes in the fsnotify group's access list that are waiting for fanotify permission event response from the userspace as shown above.

crash> files -d 0xffff8dfc1aba3c80
     DENTRY           INODE           SUPERBLK     TYPE PATH
ffff8dfc1aba3c80 ffff8dfce1189e70 ffff8df97a653000 REG  /talend/tmp/jobservers/dev/18029-lms/logs/18029-lms.log

crash> files -d 0xffff8dfc1a935500
     DENTRY           INODE           SUPERBLK     TYPE PATH
ffff8dfc1a935500 ffff8dfc3cac08c8 ffff8dfa283cc000 REG  /var/lib/rsyslog/imjournal.state.tmp

Therefore, the following tasks will be waiting in the access wait queue until fanotify permission event response is given from the userspace:

crash> px &((struct fsnotify_group *)0xffff8dfcdf451b00)->fanotify_data.access_waitq
$36 = (wait_queue_head_t *) 0xffff8dfcdf451bb8

crash> waitq 0xffff8dfcdf451bb8
PID: 1330   TASK: ffff8dfce0bee300  CPU: 3   COMMAND: "in:imjournal"
PID: 44285  TASK: ffff8df97b7e9080  CPU: 1   COMMAND: "java"

At this time there are many tasks named vsep or pool that are from VMware Endpoint Security Solution as shown below. These tasks all are holding a fanotify type file handle of which private_data matches the fsnotify group just right above.

crash> foreach files | grep fanotify -B5 | grep -e PID -e FD -e fanotify
PID: 1579   TASK: ffff8dfa283c4200  CPU: 2   COMMAND: "vsep"
 FD       FILE            DENTRY           INODE       TYPE PATH
  2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1584   TASK: ffff8dfcea9d3180  CPU: 3   COMMAND: "pool"
 FD       FILE            DENTRY           INODE       TYPE PATH
  2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1585   TASK: ffff8dfcea9d0000  CPU: 1   COMMAND: "pool"
 FD       FILE            DENTRY           INODE       TYPE PATH
  2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1586   TASK: ffff8dfa2fbc2100  CPU: 3   COMMAND: "pool"
 FD       FILE            DENTRY           INODE       TYPE PATH
  2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1587   TASK: ffff8dfce0bc1080  CPU: 3   COMMAND: "pool"
 FD       FILE            DENTRY           INODE       TYPE PATH
  2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1588   TASK: ffff8dfce0bc0000  CPU: 2   COMMAND: "pool"
 FD       FILE            DENTRY           INODE       TYPE PATH
  2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1589   TASK: ffff8dfce0bc4200  CPU: 1   COMMAND: "pool"
 FD       FILE            DENTRY           INODE       TYPE PATH
  2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1590   TASK: ffff8dfa39ba4200  CPU: 2   COMMAND: "pool"
 FD       FILE            DENTRY           INODE       TYPE PATH
  2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1591   TASK: ffff8dfa2bbe0000  CPU: 3   COMMAND: "pool"
 FD       FILE            DENTRY           INODE       TYPE PATH
  2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1592   TASK: ffff8dfa2bbe1080  CPU: 0   COMMAND: "pool"
 FD       FILE            DENTRY           INODE       TYPE PATH
  2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1593   TASK: ffff8dfa2bbe2100  CPU: 0   COMMAND: "pool"
 FD       FILE            DENTRY           INODE       TYPE PATH
  2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1594   TASK: ffff8dfa2bbe3180  CPU: 0   COMMAND: "pool"
 FD       FILE            DENTRY           INODE       TYPE PATH
  2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1595   TASK: ffff8dfa2bbe4200  CPU: 1   COMMAND: "pool"
 FD       FILE            DENTRY           INODE       TYPE PATH
  2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1596   TASK: ffff8dfa2bbe5280  CPU: 0   COMMAND: "pool"
 FD       FILE            DENTRY           INODE       TYPE PATH
  2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1597   TASK: ffff8dfa2bbe6300  CPU: 1   COMMAND: "pool"
 FD       FILE            DENTRY           INODE       TYPE PATH
  2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1598   TASK: ffff8dfa2bbd8000  CPU: 0   COMMAND: "pool"
 FD       FILE            DENTRY           INODE       TYPE PATH
  2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1599   TASK: ffff8dfa2bbd9080  CPU: 2   COMMAND: "pool"
 FD       FILE            DENTRY           INODE       TYPE PATH
  2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1600   TASK: ffff8dfa2bbda100  CPU: 3   COMMAND: "pool"
 FD       FILE            DENTRY           INODE       TYPE PATH
  2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1601   TASK: ffff8dfa2bbdb180  CPU: 1   COMMAND: "pool"
 FD       FILE            DENTRY           INODE       TYPE PATH
  2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1602   TASK: ffff8dfa2bbdc200  CPU: 0   COMMAND: "pool"
 FD       FILE            DENTRY           INODE       TYPE PATH
  2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1603   TASK: ffff8dfa2bbdd280  CPU: 3   COMMAND: "pool"
 FD       FILE            DENTRY           INODE       TYPE PATH
  2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1604   TASK: ffff8dfa2bbde300  CPU: 1   COMMAND: "vsep"
 FD       FILE            DENTRY           INODE       TYPE PATH
  2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1605   TASK: ffff8dfa2bbf0000  CPU: 0   COMMAND: "vsep"
 FD       FILE            DENTRY           INODE       TYPE PATH
  2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1606   TASK: ffff8dfa2bbf1080  CPU: 3   COMMAND: "vsep"
 FD       FILE            DENTRY           INODE       TYPE PATH
  2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1607   TASK: ffff8dfa2bbf2100  CPU: 1   COMMAND: "vsep"
 FD       FILE            DENTRY           INODE       TYPE PATH
  2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1608   TASK: ffff8dfa2bbf3180  CPU: 0   COMMAND: "vsep"
 FD       FILE            DENTRY           INODE       TYPE PATH
  2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]

crash> struct file.private_data ffff8dfcdf451a00
  private_data = 0xffff8dfcdf451b00

The latest issue happened with Twistlock Defender

KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-1160.36.2.el7.x86_64/vmlinux
    DUMPFILE: /cores/retrace/tasks/224240356/crash/vmcore.vmss
LOAD AVERAGE: 1106.86, 1106.33, 1104.16
     RELEASE: 3.10.0-1160.36.2.el7.x86_64

    DMI_BIOS_VENDOR: Phoenix Technologies LTD
       DMI_BIOS_VERSION: 6.00
      DMI_BIOS_DATE: 05/28/2020
     DMI_SYS_VENDOR: VMware, Inc.
       DMI_PRODUCT_NAME: VMware Virtual Platform

crash> foreach UN ps -m | tail
[3 16:44:00.903] [UN]  PID: 4093   TASK: ffff92470f726300  CPU: 2   COMMAND: "PatrolAgent"
[3 16:44:01.884] [UN]  PID: 4091   TASK: ffff92470f725280  CPU: 3   COMMAND: "PatrolAgent"
[3 16:44:01.911] [UN]  PID: 4090   TASK: ffff92470f724200  CPU: 2   COMMAND: "PatrolAgent"
[3 16:44:02.898] [UN]  PID: 4089   TASK: ffff92470f723180  CPU: 1   COMMAND: "PatrolAgent"
[3 16:44:06.527] [UN]  PID: 3918   TASK: ffff9245ba51a100  CPU: 0   COMMAND: "yum"
[3 16:44:06.943] [UN]  PID: 4084   TASK: ffff92470f722100  CPU: 2   COMMAND: "PatrolAgent"
[3 16:44:06.925] [UN]  PID: 4076   TASK: ffff9249214cd280  CPU: 3   COMMAND: "grep"
[3 16:44:06.951] [UN]  PID: 3090   TASK: ffff92484eae1080  CPU: 2   COMMAND: "Broker KM Evnt"
[3 16:44:06.951] [UN]  PID: 4083   TASK: ffff92483ab2a100  CPU: 2   COMMAND: "PatrolAgent"
[3 16:44:06.951] [UN]  PID: 3751   TASK: ffff9247e4202100  CPU: 2   COMMAND: "yum"

Huge number of tasks are stuck calling fanotify_handle_event():

crash> foreach UN bt | awk '/#[2-3] /{print $3,$5}' | paste - - | sort | uniq -c | sort -nr
   1106 fanotify_handle_event ffffffffa8a9b1b5  fsnotify ffffffffa8a97588
      1 schedule_timeout ffffffffa8f86da8   mfe_fileaccess_set_new_entry_scan_state ffffffffc065e695

with the backtraces just like this:

PID: 3751   TASK: ffff9247e4202100  CPU: 2   COMMAND: "yum"
 #0 [ffff92477c4bf8b0] __schedule at ffffffffa8f88dc8
 #1 [ffff92477c4bf918] schedule at ffffffffa8f89199
 #2 [ffff92477c4bf928] fanotify_handle_event at ffffffffa8a9b1b5
 #3 [ffff92477c4bf9a0] fsnotify at ffffffffa8a97588
 #4 [ffff92477c4bfa50] security_file_open at ffffffffa8b0954e
 #5 [ffff92477c4bfa70] do_dentry_open at ffffffffa8a4bb49
 #6 [ffff92477c4bfab8] vfs_open at ffffffffa8a4bdea
 #7 [ffff92477c4bfae0] do_last at ffffffffa8a5e1d6
 #8 [ffff92477c4bfb90] path_openat at ffffffffa8a5f3ed
 #9 [ffff92477c4bfc28] do_filp_open at ffffffffa8a6163d
#10 [ffff92477c4bfd00] do_sys_open at ffffffffa8a4d3d4
#11 [ffff92477c4bfd60] sys_openat at ffffffffa8a4d504
#12 [ffff92477c4bfd70] mfe_aac_sys_openat_64_bit at ffffffffc058c1a2 [mfe_aac_100705153]
#13 [ffff92477c4bff08] mfe_aac_sys_open_64_bit at ffffffffc058c32a [mfe_aac_100705153]
#14 [ffff92477c4bff18] mfe_fileaccess_sys_open_64_bit at ffffffffc065c043 [mfe_fileaccess_100705153]
#15 [ffff92477c4bff50] tracesys at ffffffffa8f96226 (via system_call)
    RIP: 00007f9d52be4efd  RSP: 00007fff8b085650  RFLAGS: 00000293
    RAX: ffffffffffffffda  RBX: 0000000002cb7cc0  RCX: ffffffffffffffff
    RDX: 00000000000001b6  RSI: 0000000000000241  RDI: 0000000002cb7cc0
    RBP: 0000000000000241   R8: 0000000000000000   R9: 0000000000000010
    R10: 00007f9d5223f390  R11: 0000000000000293  R12: 0000000002cb7cc0
    R13: 00000000000001b6  R14: 0000000000000012  R15: 0000000002cd8590
    ORIG_RAX: 0000000000000002  CS: 0033  SS: 002b

The 3rd party security module namely mfe_aac_100705153 is involed in the 
stack because the module made changes on the original sys_call_table for
syscall hooking.

crash> sys -c |awk '$1==2||$1==257'
  2  mfe_fileaccess_sys_open_64_bit 
257  mfe_fileaccess_sys_openat_64_bit 

crash> px ((struct file *)0xffff924893a48d00)->f_path
$5 = {
  mnt = 0xffff92467599e5a0, 
  dentry = 0xffff924785808900
}

The above task is trying to open this file and waiting for the fanotify 
permission event response from the userspace:

crash> files -d 0xffff924785808900
     DENTRY           INODE           SUPERBLK     TYPE PATH
ffff924785808900 ffff92476f7eb5c8 ffff9245366b3000 REG  /usr/lib64/gconv/IBM280.so;6190359f

crash> mount | awk 'NR==1||$2~/ffff9245366b3000/'
     MOUNT           SUPERBLK     TYPE   DEVNAME   DIRNAME
ffff92467599e580 ffff9245366b3000 xfs    /dev/mapper/rootvg-lv_usr /usr      

crash> px ((struct file *)0xffff924893a48d00)->f_path.dentry->d_inode
$6 = (struct inode *) 0xffff92476f7eb5c8

The notification_list:

crash> px &((struct fsnotify_group *)0xffff92492b3c7900)->notification_list
$8 = (struct list_head *) 0xffff92492b3c7918

crash> list -o fanotify_perm_event_info.fae -H 0xffff92492b3c7918 | wc -l
1104

The access_list:

crash>  px &((struct fsnotify_group *)0xffff92492b3c7900)->fanotify_data.access_list
$9 = (struct list_head *) 0xffff92492b3c79a8

crash>  list -o fanotify_perm_event_info.fae -s fanotify_perm_event_info -H 0xffff92492b3c79a8
ffff924920c425c0
struct fanotify_perm_event_info {
  fae = {
    fse = {
      list = {
    next = 0xffff924920c42640, 
    prev = 0xffff92492b3c79a8
      }, 
      inode = 0xffff924785828148, 
      mask = 65536
    }, 
    path = {
      mnt = 0xffff92467599e5a0, 
      dentry = 0xffff924785833c80
    }, 
    tgid = 0xffff92492319d100
  }, 
  response = 0, 
  fd = 10
}
ffff924920c42640
struct fanotify_perm_event_info {
  fae = {
    fse = {
      list = {
    next = 0xffff92492b3c79a8, 
    prev = 0xffff924920c425c0
      }, 
      inode = 0xffff92476f7eb5c8, 
      mask = 65536
    }, 
    path = {
      mnt = 0xffff92467599e5a0, 
      dentry = 0xffff924785808900
    }, 
    tgid = 0xffff92484ea60180
  }, 
  response = 0, 
  fd = 12
}

crash> files -d 0xffff924785833c80
     DENTRY           INODE           SUPERBLK     TYPE PATH
ffff924785833c80 ffff924785828148 ffff9245366b3000 REG  /usr/share/xxxx/node_modules/handlebars/dist/cjs/handlebars/compiler/whitespace-control.js

crash> files -d 0xffff924785808900
     DENTRY           INODE           SUPERBLK     TYPE PATH
ffff924785808900 ffff92476f7eb5c8 ffff9245366b3000 REG  /usr/lib64/gconv/IBM280.so;6190359f

1106 tasks are all waiting for the fanotify permission event response from 
the userspace:

crash> px &((struct fsnotify_group *)0xffff92492b3c7900)->fanotify_data.access_waitq
$10 = (wait_queue_head_t *) 0xffff92492b3c79b8

crash> waitq 0xffff92492b3c79b8
PID: 8506   TASK: ffff924761202100  CPU: 2   COMMAND: "(journald)"
PID: 8130   TASK: ffff92475ef7d280  CPU: 0   COMMAND: "(journald)"
PID: 7801   TASK: ffff92475ef7e300  CPU: 1   COMMAND: "sshd"
PID: 6708   TASK: ffff92476d7c6300  CPU: 3   COMMAND: "containerd-shim"
PID: 7750   TASK: ffff92475ef79080  CPU: 0   COMMAND: "(journald)"
PID: 7374   TASK: ffff92475ef7b180  CPU: 0   COMMAND: "(journald)"
PID: 6997   TASK: ffff92476d7c3180  CPU: 2   COMMAND: "(journald)"
PID: 3448   TASK: ffff924783013180  CPU: 3   COMMAND: "containerd-shim"
PID: 6618   TASK: ffff92476d7c2100  CPU: 1   COMMAND: "(journald)"
PID: 6459   TASK: ffff92476d7c5280  CPU: 1   COMMAND: "rscd"
PID: 6240   TASK: ffff9247715f6300  CPU: 2   COMMAND: "(journald)"
PID: 5864   TASK: ffff92454daed280  CPU: 0   COMMAND: "(journald)"
PID: 121245  TASK: ffff92454c3fa100  CPU: 1   COMMAND: "containerd-shim"
    ... Too long. Skipping ...

crash> waitq 0xffff92492b3c79b8 | wc -l
1106

We have just one task having the fanotify type file handle at this time:

crash> files 2743
PID: 2743   TASK: ffff9248938e4200  CPU: 3   COMMAND: "fsmon"
ROOT: /    CWD: /
 FD       FILE            DENTRY           INODE       TYPE PATH
  0 ffff924921282800 ffff9245b96cf200 ffff92488f450000 FIFO 
  1 ffff92489b459800 ffff92488f4226c0 ffff92488f459e10 FIFO 
  2 ffff92489b458100 ffff9248fcefa000 ffff92488f45a2b0 FIFO 
  3 ffff92489b45be00 ffff92488f422600 ffff92488f45a060 FIFO 
  4 ffff92492b3c7700 ffff9248fcfb09c0 ffff92488f41d990 REG  
  5 ffff92492b3c7e00 ffff9248fcfb0900 ffff92492d11f150 UNKN [fanotify] <<------------
  6 ffff92491a02a900 ffff9248fcfb0a80 ffff92492d11f150 UNKN inotify
 10 ffff9249232dd000 ffff924785833c80 ffff924785828148 REG  /usr/share/xxxx/node_modules/handlebars/dist/cjs/handlebars/compiler/whitespace-control.js
 11 ffff92491c2e7200 ffff92476f75c3c0 ffff924785838508 REG  /usr/lib64/gconv/IBM278.so
 12 ffff9249217fbf00 ffff924785808900 ffff92476f7eb5c8 REG  /usr/lib64/gconv/IBM280.so;6190359f

The fanotify type file handle matches the fsnotify_group:

crash> struct file.private_data ffff92492b3c7e00
  private_data = 0xffff92492b3c7900

The aforementioned 1106 tasks are waiting for the fanotify permission event 
response from this task namely fsmon, which is a child process of Twistlock 
Defender:

crash> ps -p 2743
PID: 0      TASK: ffffffffa9418480  CPU: 0   COMMAND: "swapper/0"
 PID: 1      TASK: ffff9246763a8000  CPU: 0   COMMAND: "systemd"
  PID: 1818   TASK: ffff92490aad3180  CPU: 0   COMMAND: "containerd"
   PID: 2372   TASK: ffff9248f1d4c200  CPU: 0   COMMAND: "containerd-shim"
    PID: 2672   TASK: ffff9248e543b180  CPU: 3   COMMAND: "defender"
     PID: 2743   TASK: ffff9248938e4200  CPU: 3   COMMAND: "fsmon" <<------------

fsmon is stuck in write(1, 0x1e58c60, 174) (syscall arguments are RDI, RSI, RDX, ...)

PID: 2743   TASK: ffff9248938e4200  CPU: 3   COMMAND: "fsmon"
 #0 [ffff92489396fca0] __schedule at ffffffffa8f88dc8
 #1 [ffff92489396fd08] schedule at ffffffffa8f89199
 #2 [ffff92489396fd18] pipe_wait at ffffffffa8a57950
 #3 [ffff92489396fd68] pipe_write at ffffffffa8a57bf9
 #4 [ffff92489396fdf0] do_sync_write at ffffffffa8a4da53
 #5 [ffff92489396fec8] vfs_write at ffffffffa8a4e540
 #6 [ffff92489396ff08] sys_write at ffffffffa8a4f31f
 #7 [ffff92489396ff50] tracesys at ffffffffa8f96226 (via system_call)
    RIP: 00000000004580c1  RSP: 00007ffc9ba0e508  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00000000000000ae  RCX: ffffffffffffffff
    RDX: 00000000000000ae  RSI: 0000000001e58c60  RDI: 0000000000000001
    RBP: 0000000001e58c60   R8: 0000000000abadf0   R9: 0000000001e57880
    R10: 00007ffc9ba0fad8  R11: 0000000000000246  R12: 00000000006cb4a0
    R13: 00000000000000ae  R14: 00000000006ccd60  R15: 00000000006cd1e0
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

crash> files 2743
PID: 2743   TASK: ffff9248938e4200  CPU: 3   COMMAND: "fsmon"
ROOT: /    CWD: /
 FD       FILE            DENTRY           INODE       TYPE PATH
  0 ffff924921282800 ffff9245b96cf200 ffff92488f450000 FIFO 
  1 ffff92489b459800 ffff92488f4226c0 ffff92488f459e10 FIFO 
  2 ffff92489b458100 ffff9248fcefa000 ffff92488f45a2b0 FIFO 
  3 ffff92489b45be00 ffff92488f422600 ffff92488f45a060 FIFO 
  4 ffff92492b3c7700 ffff9248fcfb09c0 ffff92488f41d990 REG  
  5 ffff92492b3c7e00 ffff9248fcfb0900 ffff92492d11f150 UNKN [fanotify]
  6 ffff92491a02a900 ffff9248fcfb0a80 ffff92492d11f150 UNKN inotify
 10 ffff9249232dd000 ffff924785833c80 ffff924785828148 REG  /usr/share/xxxx/node_modules/handlebars/dist/cjs/handlebars/compiler/whitespace-control.js
 11 ffff92491c2e7200 ffff92476f75c3c0 ffff924785838508 REG  /usr/lib64/gconv/IBM278.so
 12 ffff9249217fbf00 ffff924785808900 ffff92476f7eb5c8 REG  /usr/lib64/gconv/IBM280.so;6190359f

Looking at fd=1.

crash> struct file.private_data ffff92489b459800
  private_data = 0xffff92489b5e5500

crash> struct pipe_inode_info.buffers,nrbufs,curbuf,bufs 0xffff92489b5e5500
  buffers = 16
  nrbufs = 16
  curbuf = 3
  bufs = 0xffff924929395c00

crash> p *((struct pipe_buffer *) 0xffff924929395c00 + 2)
$16 = {
  page = 0xffffd84dcaa48c80, 
  offset = 0, 
  len = 3985, 
  ops = 0xffffffffa902ebc0, 
  flags = 0, 
  private = 0
}

crash> p 3985+174
$18 = 4159

Pipe is full, fsmon is stuck writing to the pipe.

Who is supposed to be reading the pipe's other end?  Since the inode is common to both ends, look for it:

crash> foreach files -R ffff92488f459e10 
PID: 2418   TASK: ffff9248f1e20000  CPU: 1   COMMAND: "defender"
ROOT: /    CWD: / 
 FD       FILE            DENTRY           INODE       TYPE PATH
 41 ffff92489b458f00 ffff92488f4226c0 ffff92488f459e10 FIFO 

[repeated for each thread of 2418]

PID: 2743   TASK: ffff9248938e4200  CPU: 3   COMMAND: "fsmon"
ROOT: /    CWD: / 
 FD       FILE            DENTRY           INODE       TYPE PATH
  1 ffff92489b459800 ffff92488f4226c0 ffff92488f459e10 FIFO 

[...]

crash> foreach defender bt

[cutting details, summary below]

Waiting on a futex:

2418 2472 2474 2475 2476 2487 2757 2765 28937

Waiting on epoll:

2473 2494 2742

Blocked in fanotify_handle_event (i.e. recursion):

2672 2764

Look for read side pipe in epoll instances (omitting 2473 and 2742 since 
they do not contain the pipe fd):

crash> bt 2494
PID: 2494   TASK: ffff9248e543e300  CPU: 2   COMMAND: "defender"
 #0 [ffff9245b6bd3cd8] __schedule at ffffffffa8f88dc8
 #1 [ffff9245b6bd3d40] schedule at ffffffffa8f89199
 #2 [ffff9245b6bd3d50] schedule_hrtimeout_range_clock at ffffffffa8f88562
 #3 [ffff9245b6bd3de8] schedule_hrtimeout_range at ffffffffa8f88613
 #4 [ffff9245b6bd3df8] ep_poll at ffffffffa8a9d74e
 #5 [ffff9245b6bd3eb0] sys_epoll_wait at ffffffffa8a9ec2d
 #6 [ffff9245b6bd3f00] sys_epoll_pwait at ffffffffa8a9ed75
 #7 [ffff9245b6bd3f50] tracesys at ffffffffa8f96226 (via system_call)
    RIP: 0000000000472360  RSP: 00007f14118b2658  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 0000000000000039  RCX: ffffffffffffffff
    RDX: 0000000000000080  RSI: 00007f14118b26a8  RDI: 0000000000000003
    RBP: 00007f14118b2ca8   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000039  R11: 0000000000000246  R12: 0000000000000003
    R13: 000000c000102600  R14: 0000000000000000  R15: 000000c0011caa20
    ORIG_RAX: 0000000000000119  CS: 0033  SS: 002b

crash> files 2494 | grep "^ *3 "
  3 ffff9249077a4c00 ffff9248fcebee40 ffff92492d11f150 UNKN [eventpoll]

crash> struct file.private_data ffff9249077a4c00
  private_data = 0xffff9249077b9500

crash> eventpoll.rbr 0xffff9249077b9500
  rbr = {
    rb_node = 0xffff924599985400
  }

crash> tree -t rbtree -o epitem.rbn -s epitem.ffd.fd -N 0xffff924599985400 | grep -B1 "= 41"
ffff924599985c80
  ffd.fd = 41

crash> struct -x epitem.event.events ffff924599985c80
  event.events = 0x8000201d,

0x8000201d == POLLIN | POLLOUT | POLLERR | POLLHUP | POLLRDHUP | EPOLLET

So it does poll the read end of the pipe, alas edge-triggered only, and that 
is why it's blocking despite the pipe having available data to read.

kernel is seemingly doing everything fine, bug is in userspace.
  • Below was caused by McAfee, but with different process name. This system had mfetpd instead of fsmon.
crash> bt
PID: 1        TASK: ffff905003712840  CPU: 2    COMMAND: "systemd"
 #0 [ffffb57ac3147b30] __schedule at ffffffff9cda0731
 #1 [ffffb57ac3147bc0] schedule at ffffffff9cda0cc5
 #2 [ffffb57ac3147bd0] fanotify_handle_event at ffffffff9c790008
 #3 [ffffb57ac3147c38] fsnotify at ffffffff9c78c213
 #4 [ffffb57ac3147cd0] do_dentry_open at ffffffff9c73ad0e
 #5 [ffffb57ac3147d00] path_openat at ffffffff9c74f72e
 #6 [ffffb57ac3147dd8] do_filp_open at ffffffff9c751a43
 #7 [ffffb57ac3147ee0] do_sys_open at ffffffff9c73c5e4
 #8 [ffffb57ac3147f38] do_syscall_64 at ffffffff9c40430b
 #9 [ffffb57ac3147f50] entry_SYSCALL_64_after_hwframe at ffffffff9ce000ad
    RIP: 00007f488efe7301  RSP: 00007fff8db0ec10  RFLAGS: 00000293
    RAX: ffffffffffffffda  RBX: 0000560b858fe820  RCX: 00007f488efe7301
    RDX: 0000000000080000  RSI: 00007fff8db0ed60  RDI: 00000000ffffff9c
    RBP: 0000000000000008   R8: 0000000000000000   R9: 0000000000000001
    R10: 0000000000000000  R11: 0000000000000293  R12: 00007f4890754ee3
    R13: 00007f4890754ee3  R14: 0000000000000001  R15: 0000000000000000
    ORIG_RAX: 0000000000000101  CS: 0033  SS: 002b

crash> fsnotify_group 0xffff90513a03da00 -ox
struct fsnotify_group {
  [ffff90513a03da00] const struct fsnotify_ops *ops;
  [ffff90513a03da08] refcount_t refcnt;
  [ffff90513a03da0c] spinlock_t notification_lock;
  [ffff90513a03da10] struct list_head notification_list;
  [ffff90513a03da20] wait_queue_head_t notification_waitq;
  [ffff90513a03da38] unsigned int q_len;
  [ffff90513a03da3c] unsigned int max_events;
  [ffff90513a03da40] unsigned int priority;
  [ffff90513a03da44] bool shutdown;
  [ffff90513a03da48] struct mutex mark_mutex;
  [ffff90513a03da68] atomic_t num_marks;
  [ffff90513a03da6c] atomic_t user_waits;
  [ffff90513a03da70] struct list_head marks_list;
  [ffff90513a03da80] struct fasync_struct *fsn_fa;
  [ffff90513a03da88] struct fsnotify_event *overflow_event;
  [ffff90513a03da90] struct mem_cgroup *memcg;
         union {
  [ffff90513a03da98]     void *private;
  [ffff90513a03da98]     struct inotify_group_private_data inotify_data;
  [ffff90513a03da98]     struct fanotify_group_private_data fanotify_data;
         };
}
SIZE: 0xd8
  • This fanotify was owned and supposed to be handled by mfetpd process as we can see in the below.
crash> epstree -p -t 2089
mfetpd(2089) -+- Scan Factory(2938) -+- ScanAction Mgr(3239968) 
                                     `- OAS Manager(3240000) 


crash> files | grep fanotify
 34 ffff9051217a4500 ffff90505ce13000 ffff905004d2ccd0 UNKN [fanotify]

crash> struct file.private_data ffff9051217a4500
  private_data = 0xffff90513a03da00,
  • This mfetpd and related processes were from McAfee.
crash> fsinfo -f 0xffff905005a24dc0
== File Info ==
     DENTRY           INODE           SUPERBLK     TYPE PATH
ffff90504093a3e8 ffff905040946730 ffff905007b37000 REG  /opt/McAfee/ens/tp/bin/mfetpd

file operations = ffffffffc0854b00 (B) xfs_file_operations [xfs]
file open mode = Read-Only (0x8020)
file size = 2136272 bytes, ino = 4214918, link count = 1
    uid = 0, gid = 0

== Mount Info ==
ffff9050386d7a40 ffff905007b37000 xfs    /dev/mapper/vg00-lv_opt /opt  
  • Among the threads in this mfetpd process, below was the one was trying to do communication using local unix socket.
crash> bt
PID: 3813     TASK: ffff90513a20d080  CPU: 0    COMMAND: "OAS Req Br->Mgr"
 #0 [ffffb57ac7a23a78] __schedule at ffffffff9cda0731
 #1 [ffffb57ac7a23b08] schedule at ffffffff9cda0cc5
 #2 [ffffb57ac7a23b18] schedule_timeout at ffffffff9cda5424
 #3 [ffffb57ac7a23bb0] sock_alloc_send_pskb at ffffffff9cbbd18f
 #4 [ffffb57ac7a23c30] unix_dgram_sendmsg at ffffffff9cd0086d
 #5 [ffffb57ac7a23cc8] sock_sendmsg at ffffffff9cbb753c
 #6 [ffffb57ac7a23ce0] ____sys_sendmsg at ffffffff9cbb785b
 #7 [ffffb57ac7a23d58] ___sys_sendmsg at ffffffff9cbb9b4c
 #8 [ffffb57ac7a23eb0] __sys_sendmsg at ffffffff9cbb9c17
 #9 [ffffb57ac7a23f38] do_syscall_64 at ffffffff9c40430b
#10 [ffffb57ac7a23f50] entry_SYSCALL_64_after_hwframe at ffffffff9ce000ad
    RIP: 00007f363529a20c  RSP: 00007f35f8ff1a50  RFLAGS: 00000293
    RAX: ffffffffffffffda  RBX: 00007f35f8ff1af0  RCX: 00007f363529a20c
    RDX: 0000000000000000  RSI: 00007f35f8ff1af0  RDI: 0000000000000004
    RBP: 00007f35f8ff1acc   R8: 0000000000000000   R9: 00007f35e0287728
    R10: 0000000000000000  R11: 0000000000000293  R12: 0000000000000004
    R13: 00007f35f8ff1af0  R14: 0000000000000000  R15: 0000000010358940
    ORIG_RAX: 000000000000002e  CS: 0033  SS: 002b

crash> unix_sock.peer,addr,path 0xffff90500fbda140 <-- The socket from this process
  peer = 0xffff90500fbddf00,
  addr = 0x0,
  path = {
    mnt = 0x0,
    dentry = 0x0
  },
crash> unix_sock.peer,addr,path 0xffff90500fbddf00  <-- peer socket
  peer = 0xffff90500fbda140,
  addr = 0x0,
  path = {
    mnt = 0x0,
    dentry = 0x0
  },

crash> sock.sk_wmem_alloc,sk_sndbuf,sk_sndtimeo 0xffff90500fbda140
  sk_wmem_alloc = {
    refs = {
      counter = 0x1c701
    }
  },
  sk_sndbuf = 0x34000,
  sk_sndtimeo = 0x7fffffffffffffff,
crash> pd 0x7fffffffffffffff/1000
$14 = 9223372036854775 <-- seconds
  • It looks like once this process went to sleep, it didn't wake up unless someone does as timeout value is very long.

vmcore analysis in case of Airlock (airlock-client.service) contributing to the issue

Lots of UN state tasks calling fanotify_handle_event() and waiting.

crash> foreach UN bt | grep '#2 ' | sort -k3 | uniq -c -f2 | sort -nr | head
      5  #2 [ffff935d33013a98] fanotify_handle_event at ffffffff9beaa8a5

Processes in UN state:

List of processes in uninterruptible state.

crash> ps -m|grep UN
[0 00:00:30.777] [UN]  PID: 127882   TASK: ffff935d022b3180  CPU: 3    COMMAND: "rhsmcertd-worke"
[0 00:00:30.778] [UN]  PID: 100106   TASK: ffff935e18b1c200  CPU: 1    COMMAND: "systemd-journal"
[0 00:00:30.778] [UN]  PID: 99304    TASK: ffff935df1e5e300  CPU: 1    COMMAND: "db2fm"
[0 00:00:30.778] [UN]  PID: 52402    TASK: ffff935d67e06300  CPU: 1    COMMAND: "db2sysc"
[0 00:00:30.778] [UN]  PID: 3104     TASK: ffff935efa353180  CPU: 1    COMMAND: "db2sysc"
crash> set 3104
    PID: 3104
COMMAND: "db2sysc"
   TASK: ffff935efa353180  [THREAD_INFO: ffff935f2f5bc000]
    CPU: 1
  STATE: TASK_UNINTERRUPTIBLE 

Let's check what is holding the fanotify type file handle, that is associated with the fsnotify_group, are "airlock" thread group:

crash> foreach files -R fanotify  |head
PID: 1287     TASK: ffff935f1b518000  CPU: 1    COMMAND: "airlock"
ROOT: /    CWD: / 
 FD       FILE            DENTRY           INODE       TYPE PATH
 15 ffff935f30d97600 ffff935f0b7afe00 ffff935f3111f150 UNKN [fanotify]

PID: 1303     TASK: ffff935f1e1be300  CPU: 0    COMMAND: "airlock"
ROOT: /    CWD: / 
 FD       FILE            DENTRY           INODE       TYPE PATH
 15 ffff935f30d97600 ffff935f0b7afe00 ffff935f3111f150 UNKN [fanotify]
crash> foreach files -R fanotify | grep COMMAND | awk {'print $8'} | sort | uniq -c
     13 "airlock"

Let's verify the 'airlock' task (1287):

crash> ps 1287
      PID    PPID  CPU       TASK        ST  %MEM      VSZ      RSS  COMM
     1287       1   1  ffff935f1b518000  IN   0.5  1563812    70372  airlock
crash> set 1287
    PID: 1287
COMMAND: "airlock"
   TASK: ffff935f1b518000  [THREAD_INFO: ffff935f1b52c000]
    CPU: 1
  STATE: TASK_INTERRUPTIBLE 
crash> bt 1287
PID: 1287     TASK: ffff935f1b518000  CPU: 1    COMMAND: "airlock"
 #0 [ffff935f1b52fc48] __schedule at ffffffff9c3b78d8
 #1 [ffff935f1b52fcb0] schedule at ffffffff9c3b7ca9
 #2 [ffff935f1b52fcc0] futex_wait_queue_me at ffffffff9bd19856
 #3 [ffff935f1b52fd00] futex_wait at ffffffff9bd1a65b
 #4 [ffff935f1b52fe48] do_futex at ffffffff9bd1c356
 #5 [ffff935f1b52fed8] sys_futex at ffffffff9bd1c7a0
 #6 [ffff935f1b52ff50] system_call_fastpath at ffffffff9c3c539a
    RIP: 000055fafc966383  RSP: 00007ffee31cb1d0  RFLAGS: 00000293
    RAX: 00000000000000ca  RBX: 0000000000000000  RCX: 0000000000000004
    RDX: 0000000000000000  RSI: 0000000000000080  RDI: 000055fafd89e2a8
    RBP: 00007ffee31cc438   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000286  R12: 00007ffee31cc448
    R13: 0000000000000001  R14: 000055fafd89da20  R15: 00007f97506a2b10
    ORIG_RAX: 00000000000000ca  CS: 0033  SS: 002b

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments