The server is getting hung up. Lots of tasks are calling fanotify_handle_event() and waiting for permission event response from userspace. The server crashes if hung_task_panic is enabled. Otherwise a hard reboot is the only resolution.
Environment
- Red Hat Enterprise Linux 7.9
-
Red Hat Enterprise Linux 8.5 ~ 8.9
-
VMware Endpoint Security Solution (vsep/pool) which was formerly vShield Endpoint that works in conjunction with various security softwares (e.g. Trend Micro, McAfee, Symantec)
- F-Secure (fsavd/fsaccd)
- Microsoft Defender for Linux (wdavdaemon)
- Kaspersky (kesl)
- Twistlock Defender (one of the certified cloud native security for OpenShift operators)
- Cybereason Sensor for Linux
- TrendMicro Deep Security Agent for Linux
- Airlock (airlock-client.service)
Issue
-
The server is getting hung up.
-
The server crashes if hung_task_panic is enabled. Otherwise a hard reboot is the only resolution once the server gets hung up completely.
-
Blocked task message observed in kernel ring buffer:
[13919.504321] INFO: task fsbspamd:19047 blocked for more than 120 seconds.
[13919.504396] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[13919.504433] fsbspamd D ffff999c7fc1acc0 0 19047 1246 0x00000080
[13919.504458] Call Trace:
[13919.504595] [<ffffffffb0b85da9>] schedule+0x29/0x70
[13919.504668] [<ffffffffb0699d55>] fanotify_handle_event+0x265/0x3a0
[13919.504704] [<ffffffffb0acd65c>] ? udp_send_skb+0xac/0x2b0
[13919.504749] [<ffffffffb04c7780>] ? wake_up_atomic_t+0x30/0x30
[13919.504754] [<ffffffffb0696128>] fsnotify+0x388/0x460
[13919.504786] [<ffffffffb0707f3e>] security_file_open+0x6e/0x70
[13919.504822] [<ffffffffb064a7b9>] do_dentry_open+0xc9/0x2d0
[13919.504829] [<ffffffffb0707992>] ? security_inode_permission+0x22/0x30
[13919.504834] [<ffffffffb064aa5a>] vfs_open+0x5a/0xb0
[13919.504853] [<ffffffffb06590da>] ? may_open+0x5a/0x120
[13919.504860] [<ffffffffb065d006>] do_last+0x1f6/0x1340
[13919.504866] [<ffffffffb065e21d>] path_openat+0xcd/0x5a0
[13919.504872] [<ffffffffb066046d>] do_filp_open+0x4d/0xb0
[13919.504884] [<ffffffffb066e512>] ? __alloc_fd+0xc2/0x170
[13919.504890] [<ffffffffb064c044>] do_sys_open+0x124/0x220
[13919.504896] [<ffffffffb064c15e>] SyS_open+0x1e/0x20
[13919.504910] [<ffffffffb0b92ed2>] system_call_fastpath+0x25/0x2a
-
Lots of tasks are calling fanotify_handle_event(), waiting for permission event response from userspace, and getting stuck in TASK_UNINTERRUPTIBLE sleep indefinitely.
-
At this time, a task, that is responsible for responding with permission event response back to the tasks, itself is waiting for permission event response from userspace as well, which is a deadlock condition caused by userspace bug.
Resolution
-
Please note that we Red Hat can't fix any userspace bug in the code that we do not maintain. What we can only try to do is to mitigate the userspace bug to make it less deadly.
-
On private BZ#1834767, our engineering team was trying to back-port some patches from upstream to not fix the buggy userspace but to make it less deadly by allowing the deadlocked process to be killed.
af6a51130626 fanotify: Fold dequeue_event() into process_access_response()
8c5544666c9d fanotify: Move locking inside get_one_event()
f7db89accc9c fsnotify: Create function to remove event from notification list
ca6f86998d81 fanotify: Simplify cleaning of access_list
40873284d710 fanotify: Track permission event state
fabf7f29b3e2 fanotify: Use interruptible wait when waiting for permission events
- The BZ was closed without a resolution of errata because the bug did not meet the inclusion criteria for RHEL7 Maintenance Phase 2.
- A possible workaround is to use NFSv3 instead of NFSv4 because a task from security software that is responsible for responding back to the hung tasks with permission event response is calling call_usermodehelper_exec() to ask a kworker to run /usr/sbin/request-key.
- The NFSv4 protocol represents the local system's UID and GID values on the wire as strings of the form user [at] domain. The process of translating from UID to string and string to UID is referred to as "ID mapping." The /usr/sbin/nfsidmap program performs translations on behalf of the kernel. The kernel uses the request-key mechanism to perform an upcall. /usr/sbin/nfsidmap is invoked by /sbin/request-key, performs the translation, and initializes a key with the resulting information.
-
Please note that using NFSv3 cannot be a possible workaround unless the buggy userspace task, that is responsible for responding back to other tasks that are waiting for the fanotify permission event response, is stuck on NFSv4 mount.
-
On BZ#1897416, some fanotify patches, that would work to mitigate this sort of issues, were backported to RHEL8.6 and onwards. Please note, we didn't mean to fix any userspace bug with those patches. What we could only try to do with those patches was to make this sort of issues less deadly.
- If you are facing this sort of issues (caused by some userspace bug) on RHEL7.x or RHEL8.x older than RHEL8.6, then please upgrade to RHEL8.6 or onwards (RHEL9 included) and try to check whether the issue is mitigated even slightly.
- Also we would strongly recommend engaging the provider/support of the security software, that you are running, to check for some bug and fix for that bug (if any).
Important Notice
- Please keep in mind that what we backported to RHEL8.6 and onwards are not for fixing any userspace bug. Those patches were backported with a hope that this sort of issues caused by some userspace bug would get less deadly.
Root Cause
-
What the tasks, that are requesting to open files and waiting for permission event response from userspace, are waiting for should be responded back from the task that is responsible to do that.
-
The task is, however, trying to open a file sitting on an NFS mount and thus waiting for a kworker to execute request-key.
-
The kworker is, however, waiting for permission event response from userspace to open and execute request-key, too.
-
Very likely, this is a userspace bug invoked by the security software using kernel's fanotify APIs/subsystem.
Diagnostic Steps
Analysis of the vmcore file from RHEL8.5 guest (runnning on VMware ESXi) in case of TrendMicro Deep Security Agent for Linux is the culprit:
The kernel (RHEL8.5) got hung up with lots of blocked task messages:
KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/4.18.0-348.20.1.el8_5.x86_64/vmlinux
DUMPFILE: /cores/retrace/tasks/105439176/crash/vmcore.vmsn
LOAD AVERAGE: 8579.33, 8577.50, 8573.11
RELEASE: 4.18.0-348.20.1.el8_5.x86_64
DMI_BIOS_VENDOR: VMware, Inc.
DMI_BIOS_VERSION: VMW71.00V.16460286.B64.2006250725
DMI_BIOS_DATE: 06/25/2020
DMI_SYS_VENDOR: VMware, Inc.
DMI_PRODUCT_NAME: VMware7,1
[661576.769543] INFO: task systemd:1 blocked for more than 120 seconds.
[661576.770733] INFO: task in:imjournal:1233 blocked for more than 120 seconds.
[661576.787841] INFO: task dsp.fwdpi.servi:1606 blocked for more than 120 seconds.
[661576.800733] INFO: task CSSLCertThread:1617 blocked for more than 120 seconds.
[661576.812642] INFO: task CInstalledSoftw:206870 blocked for more than 120 seconds.
[661576.823313] INFO: task pool-ds_am:1943 blocked for more than 120 seconds.
[661576.834765] INFO: task amqzmuc0:7142 blocked for more than 120 seconds.
[661576.845956] INFO: task amqzmuc0:7143 blocked for more than 120 seconds.
[661576.857811] INFO: task amqzmuc0:7144 blocked for more than 120 seconds.
[661699.647550] INFO: task systemd:1 blocked for more than 120 seconds.
crash> foreach UN ps -m | tail
[13 14:29:13.847] [UN] PID: 7270 TASK: ffff890885419e40 CPU: 1 COMMAND: "JIT Sampler"
[13 14:29:44.947] [UN] PID: 7144 TASK: ffff8908a03dbc80 CPU: 0 COMMAND: "amqzmuc0"
[13 14:29:44.937] [UN] PID: 7143 TASK: ffff8908bfc59e40 CPU: 1 COMMAND: "amqzmuc0"
[13 14:29:44.937] [UN] PID: 7142 TASK: ffff890846e5bc80 CPU: 1 COMMAND: "amqzmuc0"
[13 14:30:10.094] [UN] PID: 1617 TASK: ffff890844f4bc80 CPU: 0 COMMAND: "CSSLCertThread"
[13 14:30:15.013] [UN] PID: 206870 TASK: ffff890bfd541e40 CPU: 1 COMMAND: "CInstalledSoftw"
[13 14:30:16.659] [UN] PID: 1606 TASK: ffff890846391e40 CPU: 0 COMMAND: "dsp.fwdpi.servi"
[13 14:30:18.683] [UN] PID: 1233 TASK: ffff89084836dac0 CPU: 1 COMMAND: "in:imjournal"
[13 14:30:18.755] [UN] PID: 1943 TASK: ffff8908a02d5ac0 CPU: 1 COMMAND: "pool-ds_am"
[13 14:30:18.842] [UN] PID: 1 TASK: ffff8908436c3c80 CPU: 0 COMMAND: "systemd"
Taking the longest one:
PID: 1 TASK: ffff8908436c3c80 CPU: 0 COMMAND: "systemd"
#0 [ffffa5158313fb30] __schedule at ffffffff9eb7b38d
#1 [ffffa5158313fbc0] schedule at ffffffff9eb7b867
#2 [ffffa5158313fbd0] fanotify_handle_event at ffffffff9e581476
#3 [ffffa5158313fc38] fsnotify at ffffffff9e57d5f3
#4 [ffffa5158313fcd0] do_dentry_open at ffffffff9e52c44e
#5 [ffffa5158313fd00] path_openat at ffffffff9e540dfe
#6 [ffffa5158313fdd8] do_filp_open at ffffffff9e543113
#7 [ffffa5158313fee0] do_sys_open at ffffffff9e52dd24
#8 [ffffa5158313ff38] do_syscall_64 at ffffffff9e2042bb
#9 [ffffa5158313ff50] entry_SYSCALL_64_after_hwframe at ffffffff9ec000ad
RIP: 00007f2f497ce301 RSP: 00007ffcba703870 RFLAGS: 00000293
RAX: ffffffffffffffda RBX: 00005648aefacbb0 RCX: 00007f2f497ce301
RDX: 0000000000080000 RSI: 00005648aed96d40 RDI: 00000000ffffff9c
RBP: 0000000000000008 R8: 0000000000000000 R9: 0000000000000001
R10: 0000000000000000 R11: 0000000000000293 R12: 00007f2f4af3bee3
R13: 00007f2f4af3bee3 R14: 0000000000000001 R15: 00005648ad8b956e
ORIG_RAX: 0000000000000101 CS: 0033 SS: 002b
This one is trying to open this file and stuck waiting for permission event response from userspace:
crash> filename ffff8908a01db000
struct filename {
name = 0xffff8908a01db020 "/sys/fs/cgroup/systemd/user.slice/user-0.slice/user@0.service/dev-mapper-rhel\\x2dswap.swap/cgroup.procs",
uptr = 0x5648aed96d40 <error: Cannot access memory at address 0x5648aed96d40>,
refcnt = 2,
aname = 0xffff89084000f860,
iname = 0xffff8908a01db020 "/sys/fs/cgroup/systemd/user.slice/user-0.slice/user@0.service/dev-mapper-rhel\\x2dswap.swap/cgroup.procs"
}
8580 tasks are waiting for the permission event response from userspace at this time:
crash> px &((struct fsnotify_group *)0xffff8908a0380b00)->fanotify_data.access_waitq
$17 = (wait_queue_head_t *) 0xffff8908a0380ba8
crash> waitq 0xffff8908a0380ba8 | wc -l
8580
We are waiting in this wait queue until the event->response is no longer 0.
crash> fanotify_perm_event_info ffff8908469934c0
struct fanotify_perm_event_info {
fae = {
fse = {
list = {
next = 0xffff890846e4c880,
prev = 0xffff8908a0380b10
},
inode = 0xffff890849f88000,
mask = 65536
},
path = {
mnt = 0xffff890aefbfd3a0,
dentry = 0xffff890b939a6600
},
pid = 0xffff890af0c5a200
},
response = 0, <<-------
fd = 13
}
At this time, what are holding the fanotify type file handles are all the threads from TrendMicro Deep Security Agent for Linux as shown below:
crash> foreach files -R fanotify | awk '$1~/PID/{print "px ((struct file *)((struct task_struct *)0x" $4 ")->mm->exe_file)->f_path->dentry"}' >fanotify_tsk_exe.txt
crash> < fanotify_tsk_exe.txt | awk '$4~/dentry/{print $NF}' > fanotify_tsk_exe_dents.txt
crash> files -d < fanotify_tsk_exe_dents.txt | paste - - | awk '{print $NF}' | sort | uniq
/opt/ds_agent/ds_am
The filp's ->private_data is equal to the fsnotify_group of which wait queue containing the 8580 tasks that are stuck waiting for the permission event response from userspace.
crash> foreach files -R fanotify | awk '$NF~/fanotify/{print $2}' | sort | uniq
ffff8908462e3100
crash> px ((struct file *)0xffff8908462e3100)->private_data
$159 = (void *) 0xffff8908a0380b00
This is very likely a userspace bug of Deep Security Agent.
vmcore analysis in case of VMware Endpoint security (vsep/pool) contributing to the issue
Blocked task messages in kernel ring buffer:
crash> log | grep blocked
[73438.768248] INFO: task crond:1384 blocked for more than 120 seconds.
[73438.768542] INFO: task vsep:1440 blocked for more than 120 seconds.
[73438.769020] INFO: task vsep:1444 blocked for more than 120 seconds.
[73438.769421] INFO: task crond:9420 blocked for more than 120 seconds.
[73438.769785] INFO: task TaniumClient:13542 blocked for more than 120 seconds.
[73438.770193] INFO: task kworker/u4:3:19509 blocked for more than 120 seconds.
[73558.767852] INFO: task crond:1384 blocked for more than 120 seconds.
[73558.768881] INFO: task vsep:1444 blocked for more than 120 seconds.
[73558.768999] INFO: task crond:9420 blocked for more than 120 seconds.
[73558.769114] INFO: task TaniumClient:13542 blocked for more than 120 seconds.
Lots of UN state tasks calling fanotify_handle_event() and waiting.
crash> ps -S | awk 'NR==3'
UN: 159
crash> foreach UN bt | awk '/#2 /{print $3,$5}' | sort | uniq -c | sort -nr
158 fanotify_handle_event ffffffff97c99cf5
1 schedule_timeout ffffffff98183891
The stack backtrace of the hung task is like this:
PID: 9420 TASK: ffff9020719c9070 CPU: 0 COMMAND: "crond"
#0 [ffff9021393a3a68] __schedule at ffffffff981858da
#1 [ffff9021393a3af8] schedule at ffffffff98185d89
#2 [ffff9021393a3b08] fanotify_handle_event at ffffffff97c99cf5
#3 [ffff9021393a3b80] fsnotify at ffffffff97c960c8
#4 [ffff9021393a3c30] security_file_open at ffffffff97d07ede
#5 [ffff9021393a3c50] do_dentry_open at ffffffff97c4a759
#6 [ffff9021393a3c98] vfs_open at ffffffff97c4a9fa
#7 [ffff9021393a3cc0] do_last at ffffffff97c5cfa6
#8 [ffff9021393a3d70] path_openat at ffffffff97c5e1bd
#9 [ffff9021393a3e08] do_filp_open at ffffffff97c6040d
#10 [ffff9021393a3ee0] do_sys_open at ffffffff97c4bfe4
#11 [ffff9021393a3f40] sys_open at ffffffff97c4c0fe
#12 [ffff9021393a3f50] system_call_fastpath at ffffffff98192ed2
The hung tasks are in the fsnotify group's access wait queue:
crash> px &((struct fsnotify_group *)0xffff9020b8c6fd00)->fanotify_data.access_waitq
$13 = (wait_queue_head_t *) 0xffff9020b8c6fdb8
crash> waitq 0xffff9020b8c6fdb8
PID: 1397 TASK: ffff9021358ce2a0 CPU: 0 COMMAND: "agetty"
PID: 9126 TASK: ffff9020bb9d8000 CPU: 1 COMMAND: "sshd"
PID: 8594 TASK: ffff902003eaa0e0 CPU: 0 COMMAND: "vsftpd"
PID: 8581 TASK: ffff9020a235d230 CPU: 0 COMMAND: "vsftpd"
PID: 8163 TASK: ffff90200361e2a0 CPU: 0 COMMAND: "vsftpd"
... skipping ...
crash> waitq 0xffff9020b8c6fdb8 | wc -l
158
The access list is empty. The events are still in the notification list because they are not retrieved by userspace yet.
crash> px &((struct fsnotify_group *)0xffff9020b8c6fd00)->fanotify_data.access_list
$83 = (struct list_head *) 0xffff9020b8c6fda8
crash> list -H 0xffff9020b8c6fda8
(empty)
crash> px &((struct fsnotify_group *)0xffff9020b8c6fd00)->notification_list
$16 = (struct list_head *) 0xffff9020b8c6fd18
crash> list -o fsnotify_event.list -s fsnotify_event.inode -H 0xffff9020b8c6fd18 | paste - - | awk '{print $NF}' | sort | uniq -c
1 0xffff9020671f7d40
1 0xffff902067a966c0
1 0xffff90206b78ee40
1 0xffff9020860d1400
16 0xffff9020b20817c0
131 0xffff9020b2081b80
1 0xffff9020b21dd040
1 0xffff9020b22e75c0
1 0xffff9020b7639f40
1 0xffff9020ba1a8c80
1 0xffff9020ba224500
1 0xffff9021361f1b80
1 0xffff9021361f3980
At this time, a task named vsep (/usr/sbin/vsep) holds a fanotify type file handle and is trying to open a file sitting on an NFSv4 mount and thus calling call_usermodehelper_exec() to run request-key and waiting.
PID: 1440 TASK: ffff9021370ce2a0 CPU: 1 COMMAND: "vsep"
#0 [ffff902136c475d8] __schedule at ffffffff981858da
#1 [ffff902136c47668] schedule at ffffffff98185d89
#2 [ffff902136c47678] schedule_timeout at ffffffff98183891
#3 [ffff902136c47728] wait_for_completion at ffffffff9818613d
#4 [ffff902136c47788] call_usermodehelper_exec at ffffffff97aba949
#5 [ffff902136c477d0] call_sbin_request_key at ffffffff97cfefec
#6 [ffff902136c478e8] request_key_and_link at ffffffff97cff55a
#7 [ffff902136c479b8] request_key at ffffffff97cff7b7
#8 [ffff902136c479f0] nfs_idmap_get_key at ffffffffc083d88f [nfsv4]
#9 [ffff902136c47a50] nfs_map_name_to_uid at ffffffffc083e307 [nfsv4]
#10 [ffff902136c47ab0] nfs_fattr_map_and_free_names at ffffffffc083e54f [nfsv4]
#11 [ffff902136c47ae0] _nfs4_open_and_get_state at ffffffffc0826043 [nfsv4]
#12 [ffff902136c47b88] nfs4_do_open at ffffffffc082657c [nfsv4]
#13 [ffff902136c47c60] nfs4_atomic_open at ffffffffc0826af7 [nfsv4]
#14 [ffff902136c47cb8] nfs4_file_open at ffffffffc083bb96 [nfsv4]
#15 [ffff902136c47d68] do_dentry_open at ffffffff97c4a872
#16 [ffff902136c47db0] vfs_open at ffffffff97c4a9fa
#17 [ffff902136c47dd8] dentry_open at ffffffff97c4aa99
#18 [ffff902136c47e10] fanotify_read at ffffffff97c9a605
#19 [ffff902136c47ed8] vfs_read at ffffffff97c4cfbf
#20 [ffff902136c47f08] sys_read at ffffffff97c4de2f
#21 [ffff902136c47f50] system_call_fastpath at ffffffff98192ed2
crash> px ((struct file *)((struct task_struct *)0xffff9021370ce2a0)->mm->exe_file)->f_path.dentry
$5 = (struct dentry *) 0xffff902136061900
crash> files -d 0xffff902136061900
DENTRY INODE SUPERBLK TYPE PATH
ffff902136061900 ffff9020b22888c0 ffff902036730000 REG /usr/sbin/vsep <<----- the task's exe file
crash> files 1440 | awk 'NR<4 || /fanotify/'
PID: 1440 TASK: ffff9021370ce2a0 CPU: 1 COMMAND: "vsep"
ROOT: / CWD: /
FD FILE DENTRY INODE TYPE PATH
2 ffff9020b8c6fe00 ffff9020b2283d80 ffff902138c7f970 UNKN [fanotify] <<----- fanotify type file handle
crash> struct file.private_data ffff9020b8c6fe00
private_data = 0xffff9020b8c6fd00 <<----- fanotify type file handle's filp->private_data matches the fsnotify group.
crash> files -d 0xffff9020b7465540
DENTRY INODE SUPERBLK TYPE PATH
ffff9020b7465540 ffff9020b2244e10 ffff9020b4ec0800 REG /SANbkup/new06/os_backup_20200506.tgz <<----- vsep is trying to open this file.
crash> mount | awk 'NR==1 || /ffff9020b4ec0800/'
MOUNT SUPERBLK TYPE DEVNAME DIRNAME
ffff9020b3eee900 ffff9020b4ec0800 nfs4 10.95.1.30:/SANbkup/new06 /SANbkup <<----- the file is sitting on this NFSv4 mount.
At this time, a kworker, which is responsible for running request-key, is calling fanotify_handle_event() and waiting for permission event response from userspace as well as the other hung tasks.
PID: 19509 TASK: ffff9020a227e2a0 CPU: 0 COMMAND: "kworker/u4:3"
#0 [ffff902002e0b9e8] __schedule at ffffffff981858da
#1 [ffff902002e0ba78] schedule at ffffffff98185d89
#2 [ffff902002e0ba88] fanotify_handle_event at ffffffff97c99cf5
#3 [ffff902002e0bb00] fsnotify at ffffffff97c960c8
#4 [ffff902002e0bbb0] security_file_open at ffffffff97d07ede
#5 [ffff902002e0bbd0] do_dentry_open at ffffffff97c4a759
#6 [ffff902002e0bc18] vfs_open at ffffffff97c4a9fa
#7 [ffff902002e0bc40] do_last at ffffffff97c5cfa6
#8 [ffff902002e0bcf0] path_openat at ffffffff97c5e1bd
#9 [ffff902002e0bd88] do_filp_open at ffffffff97c6040d
#10 [ffff902002e0be58] do_open_exec at ffffffff97c54824
#11 [ffff902002e0be88] do_execve_common at ffffffff97c55792
#12 [ffff902002e0bf10] do_execve at ffffffff97c55e28
#13 [ffff902002e0bf20] ____call_usermodehelper at ffffffff97abafff
Usually, vsep receives a request to open a file, asks a scanner in the backend, which is probably a task named fpscand in this case, and then the scanner performs virus scan on the file, let vsep know whether it allows or denies to open the file. And then, if the scanner allows to open the file, vsep opens the file and responds with the file handle back to the task that is requesting to open the file, which is permission event response in this case.
crash> bt 1343
PID: 1343 TASK: ffff9020b73be2a0 CPU: 0 COMMAND: "fpscand"
#0 [ffff9020b3ed7c68] __schedule at ffffffff981858da
#1 [ffff9020b3ed7cf8] schedule at ffffffff98185d89
#2 [ffff9020b3ed7d08] schedule_timeout at ffffffff98183891
#3 [ffff9020b3ed7db0] inet_csk_accept at ffffffff980aaf90
#4 [ffff9020b3ed7e20] inet_accept at ffffffff980db442
#5 [ffff9020b3ed7e58] SYSC_accept4 at ffffffff98032fc9
#6 [ffff9020b3ed7f30] sys_accept4 at ffffffff9803558e
#7 [ffff9020b3ed7f40] sys_accept at ffffffff980355a0
#8 [ffff9020b3ed7f50] system_call_fastpath at ffffffff98192ed2
RIP: 000000000052d210 RSP: 00007ffd4472fa48 RFLAGS: 00010246
RAX: 000000000000002b RBX: 00000000000027d8 RCX: ffffffff98192e15
RDX: 00007ffd4472fc5c RSI: 00007ffd4472fcc0 RDI: 0000000000000003
RBP: 00007ffd4472fc5c R8: 000000000161f850 R9: 00000000ffffffff
R10: 000000000161f8e0 R11: 0000000000000246 R12: 0000000000000003
R13: 00007ffd4472fc60 R14: 00007ffd4472fc78 R15: 0000000000000000
ORIG_RAX: 000000000000002b CS: 0033 SS: 002b
vsep is however trying to open a file sitting on the NFSv4 mount and thus waiting for the completion of request-key which should be completed by kworker. kworker is however waiting for permission event response from userspace to open and execute request-key. Very likely, this is a deadlock condition caused by the userspace bug.
vmcore analysis in case of F-Secure fsavd/fsaccd contributing the issue
Blocked task messages in kernel ring buffer. Kernel panic occurs since hung_task_panic is enabled in this case:
crash> log | grep locked
[13919.504321] INFO: task fsbspamd:19047 blocked for more than 120 seconds.
[13919.506308] Kernel panic - not syncing: hung_task: blocked tasks
crash> p sysctl_hung_task_panic
sysctl_hung_task_panic = $1 = 1
Lots of UN state tasks calling fanotify_handle_event() and waiting.
crash> ps -S | awk 'NR==3'
UN: 22
crash> foreach UN bt | awk '/#2 /{print $3,$5}' | sort | uniq -c | sort -nr
20 fanotify_handle_event ffffffffb0699d55
2 schedule_timeout ffffffffb0b838b1
The stack backtrace of the hung task is like this:
PID: 19050 TASK: ffff999c183d1070 CPU: 3 COMMAND: "fsbspamd"
#0 [ffff999c18383a68] __schedule at ffffffffb0b858fa
#1 [ffff999c18383af8] schedule at ffffffffb0b85da9
#2 [ffff999c18383b08] fanotify_handle_event at ffffffffb0699d55
#3 [ffff999c18383b80] fsnotify at ffffffffb0696128
#4 [ffff999c18383c30] security_file_open at ffffffffb0707f3e
#5 [ffff999c18383c50] do_dentry_open at ffffffffb064a7b9
#6 [ffff999c18383c98] vfs_open at ffffffffb064aa5a
#7 [ffff999c18383cc0] do_last at ffffffffb065d006
#8 [ffff999c18383d70] path_openat at ffffffffb065e21d
#9 [ffff999c18383e08] do_filp_open at ffffffffb066046d
#10 [ffff999c18383ee0] do_sys_open at ffffffffb064c044
#11 [ffff999c18383f40] sys_open at ffffffffb064c15e
#12 [ffff999c18383f50] system_call_fastpath at ffffffffb0b92ed2
The hung tasks are in the fsnotify group's access wait queue:
crash> px &((struct fsnotify_group *)0xffff999c6f857800)->fanotify_data.access_waitq
$17 = (wait_queue_head_t *) 0xffff999c6f8578b8
crash> waitq 0xffff999c6f8578b8
PID: 1319 TASK: ffff999c725320e0 CPU: 2 COMMAND: "in:imjournal"
PID: 3722 TASK: ffff999c6efc9070 CPU: 2 COMMAND: "mjnetcln"
PID: 3909 TASK: ffff999c70afe2a0 CPU: 3 COMMAND: "tskussrv"
PID: 19050 TASK: ffff999c183d1070 CPU: 3 COMMAND: "fsbspamd"
PID: 19049 TASK: ffff999976570000 CPU: 0 COMMAND: "fsbspamd"
PID: 19048 TASK: ffff9999fb0a5230 CPU: 1 COMMAND: "fsbspamd"
PID: 19047 TASK: ffff999c717462a0 CPU: 0 COMMAND: "fsbspamd"
PID: 19046 TASK: ffff999c28d98000 CPU: 0 COMMAND: "fsbspamd"
PID: 19045 TASK: ffff999c183d20e0 CPU: 3 COMMAND: "fsbspamd"
PID: 19044 TASK: ffff999c183d5230 CPU: 3 COMMAND: "fsbspamd"
PID: 19043 TASK: ffff999c28d99070 CPU: 0 COMMAND: "fsbspamd"
PID: 19042 TASK: ffff999c183d0000 CPU: 3 COMMAND: "fsbspamd"
PID: 19041 TASK: ffff999c183d3150 CPU: 3 COMMAND: "fsbspamd"
PID: 19017 TASK: ffff999c18170000 CPU: 2 COMMAND: "java"
PID: 19012 TASK: ffff999975fc3150 CPU: 1 COMMAND: "python2"
PID: 19005 TASK: ffff999c181741c0 CPU: 2 COMMAND: "mppmon"
PID: 4211 TASK: ffff999c45d020e0 CPU: 0 COMMAND: "java"
PID: 1392 TASK: ffff9999fb1f41c0 CPU: 3 COMMAND: "fsicapd"
PID: 18988 TASK: ffff999c181762a0 CPU: 2 COMMAND: "kworker/u8:2"
PID: 18965 TASK: ffff999c183d41c0 CPU: 1 COMMAND: "csh"
The access list is empty. The events are still in the notification list because they are not retrieved by userspace yet.
crash> px &((struct fsnotify_group *)0xffff999c6f857800)->fanotify_data.access_list
$102 = (struct list_head *) 0xffff999c6f8578a8
crash> list -o fanotify_perm_event_info.fae -s fanotify_perm_event_info -H 0xffff999c6f8578a8
(empty)
crash> px &((struct fsnotify_group *)0xffff999c6f857800)->notification_list
$100 = (struct list_head *) 0xffff999c6f857818
crash> list -o fsnotify_event.list -s fsnotify_event.inode -H 0xffff999c6f857818 | paste - -
ffff999c182ab880 inode = 0xffff999c6f3f1400
ffff9999f66b2400 inode = 0xffff9999dccaa6c0
ffff9999fb131400 inode = 0xffff999976364500
ffff999c182ab700 inode = 0xffff99997611f200
ffff9999f654d280 inode = 0xffff999c6f16cc80
ffff999c182ab2c0 inode = 0xffff999c45adb5c0
ffff9999f66b2b80 inode = 0xffff999976169b80
ffff9999f66b2080 inode = 0xffff999976169b80
ffff999c18015bc0 inode = 0xffff999976169b80
ffff9999f66b2fc0 inode = 0xffff999976169b80
ffff9999f66b2e40 inode = 0xffff999976169b80
ffff999c18015000 inode = 0xffff999976169b80
ffff999c18015d40 inode = 0xffff999976169b80
ffff9999f654df80 inode = 0xffff999976169b80
ffff999c18015940 inode = 0xffff999976169b80
ffff9999f66b2000 inode = 0xffff999976169b80
ffff9999f66b2700 inode = 0xffff999c358335c0
ffff999c182ab740 inode = 0xffff999c6f308140
ffff999c182ab840 inode = 0xffff9999dcca9040
At this time, one of the hung tasks are trying to open a file sitting on an NFSv4 mount, calling fanotify_handle_event() and waiting for permission event response from userspace:
PID: 18965 TASK: ffff999c183d41c0 CPU: 1 COMMAND: "csh"
#0 [ffff999c51b37a68] __schedule at ffffffffb0b858fa
#1 [ffff999c51b37af8] schedule at ffffffffb0b85da9
#2 [ffff999c51b37b08] fanotify_handle_event at ffffffffb0699d55
#3 [ffff999c51b37b80] fsnotify at ffffffffb0696128
#4 [ffff999c51b37c30] security_file_open at ffffffffb0707f3e
#5 [ffff999c51b37c50] do_dentry_open at ffffffffb064a7b9
#6 [ffff999c51b37c98] vfs_open at ffffffffb064aa5a
#7 [ffff999c51b37cc0] do_last at ffffffffb065d006
#8 [ffff999c51b37d70] path_openat at ffffffffb065e21d
#9 [ffff999c51b37e08] do_filp_open at ffffffffb066046d
#10 [ffff999c51b37ee0] do_sys_open at ffffffffb064c044
#11 [ffff999c51b37f40] sys_open at ffffffffb064c15e
#12 [ffff999c51b37f50] system_call_fastpath at ffffffffb0b92ed2
...
crash> files -d 0xffff999c41548e40
DENTRY INODE SUPERBLK TYPE PATH
ffff999c41548e40 ffff999c70d405e0 ffff9999fb06f800 REG /usr2/ft21/.tcshrc
crash> struct file.f_path ffff999c71de9000
f_path = {
mnt = 0xffff999c71cb1b20, <<-----
dentry = 0xffff999c41548e40
}
crash> px 0xffff999c71cb1b20-0x20
$12 = 0xffff999c71cb1b00 <<-----
crash> mount | awk 'NR==1 || /ffff999c71cb1b00/'
MOUNT SUPERBLK TYPE DEVNAME DIRNAME
ffff999c71cb1b00 ffff9999fb06f800 nfs4 10.10.10.10:/usr2 /usr2 <<----- the file is on this NFSv4 mount.
F-Secure security software is running on this system. At this time, fsaccd, which is responsible for responding with permission event response back to the task, is trying to open the file sitting on the NFSv4 mount and thus waiting for the completion of request-key.
PID: 1141 TASK: ffff999c72d241c0 CPU: 2 COMMAND: "fsaccd"
#0 [ffff9999fb09f5d8] __schedule at ffffffffb0b858fa
#1 [ffff9999fb09f668] schedule at ffffffffb0b85da9
#2 [ffff9999fb09f678] schedule_timeout at ffffffffb0b838b1
#3 [ffff9999fb09f728] wait_for_completion at ffffffffb0b8615d
#4 [ffff9999fb09f788] call_usermodehelper_exec at ffffffffb04ba949
#5 [ffff9999fb09f7d0] call_sbin_request_key at ffffffffb06ff04c
#6 [ffff9999fb09f8e8] request_key_and_link at ffffffffb06ff5ba
#7 [ffff9999fb09f9b8] request_key at ffffffffb06ff817
#8 [ffff9999fb09f9f0] nfs_idmap_get_key at ffffffffc065a88f [nfsv4]
#9 [ffff9999fb09fa50] nfs_map_name_to_uid at ffffffffc065b307 [nfsv4]
#10 [ffff9999fb09fab0] nfs_fattr_map_and_free_names at ffffffffc065b54f [nfsv4]
#11 [ffff9999fb09fae0] _nfs4_open_and_get_state at ffffffffc0643043 [nfsv4]
#12 [ffff9999fb09fb88] nfs4_do_open at ffffffffc064357c [nfsv4]
#13 [ffff9999fb09fc60] nfs4_atomic_open at ffffffffc0643af7 [nfsv4]
#14 [ffff9999fb09fcb8] nfs4_file_open at ffffffffc0658b96 [nfsv4]
#15 [ffff9999fb09fd68] do_dentry_open at ffffffffb064a8d2
#16 [ffff9999fb09fdb0] vfs_open at ffffffffb064aa5a
#17 [ffff9999fb09fdd8] dentry_open at ffffffffb064aaf9
#18 [ffff9999fb09fe10] fanotify_read at ffffffffb069a665
#19 [ffff9999fb09fed8] vfs_read at ffffffffb064d01f
#20 [ffff9999fb09ff08] sys_read at ffffffffb064de8f
#21 [ffff9999fb09ff50] system_call_fastpath at ffffffffb0b92ed2
...
fsaccd holds a fanotify type file handle for the fsnotify group 0xffff999c6f857800 at this time:
crash> files 1141 | grep fanotify
7 ffff999c6f857600 ffff999c6f24ea80 ffff999c7305e060 UNKN [fanotify]
crash> struct file.private_data ffff999c6f857600
private_data = 0xffff999c6f857800 <<----- the fsnotify group
It's this kworker that should complete request-key.
PID: 18988 TASK: ffff999c181762a0 CPU: 2 COMMAND: "kworker/u8:2"
#0 [ffff9999dc9cb9e8] __schedule at ffffffffb0b858fa
#1 [ffff9999dc9cba78] schedule at ffffffffb0b85da9
#2 [ffff9999dc9cba88] fanotify_handle_event at ffffffffb0699d55
#3 [ffff9999dc9cbb00] fsnotify at ffffffffb0696128
#4 [ffff9999dc9cbbb0] security_file_open at ffffffffb0707f3e
#5 [ffff9999dc9cbbd0] do_dentry_open at ffffffffb064a7b9
#6 [ffff9999dc9cbc18] vfs_open at ffffffffb064aa5a
#7 [ffff9999dc9cbc40] do_last at ffffffffb065d006
#8 [ffff9999dc9cbcf0] path_openat at ffffffffb065e21d
#9 [ffff9999dc9cbd88] do_filp_open at ffffffffb066046d
#10 [ffff9999dc9cbe58] do_open_exec at ffffffffb0654884
#11 [ffff9999dc9cbe88] do_execve_common at ffffffffb06557f2
#12 [ffff9999dc9cbf10] do_execve at ffffffffb0655e88
#13 [ffff9999dc9cbf20] ____call_usermodehelper at ffffffffb04bafff
The kworker is however trying to open /usr/sbin/request-key to execute and waiting for permission event response from userspace. The response from userspace should be performed by fsaccd. fsaccd is, however, waiting for that kworker to complete the execution of request-key. Very likely, this is a deadlock condition caused by a userspace bug.
When fsaccd receives a request for a file open, it asks fsavd for virus scan on the file. fsavd runs virus scan on the file accordingly, and then let fsaccd know whether it allows or denies to open the file. If fsavd allows to open the file, fsaccd opens it and responds with the file handle to the task that is requesting for the file open, which is i.e. the permission event response in this case.
PID: 1237 TASK: ffff9999fb0a1070 CPU: 2 COMMAND: "fsavd"
#0 [ffff999c7286fe98] __schedule at ffffffffb0b858fa
#1 [ffff999c7286ff28] schedule at ffffffffb0b85da9
#2 [ffff999c7286ff38] sys_pause at ffffffffb04b57fc
#3 [ffff999c7286ff50] system_call_fastpath at ffffffffb0b92ed2
RIP: 00007f0e78930f40 RSP: 00007ffc25df0b98 RFLAGS: 00000246
RAX: 0000000000000022 RBX: 0000000000000000 RCX: ffffffffffffffff
RDX: 0000000000000004 RSI: 00007ffc25df0bac RDI: 0000000000000004
RBP: 00007ffc25df2f06 R8: 0000000000000001 R9: 00007f0e79458740
R10: 00007ffc25df0620 R11: 0000000000000246 R12: 00007ffc25df0d08
R13: 00007ffc25df0ce0 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: 0000000000000022 CS: 0033 SS: 002b
- With patches from upstream fabf7f29b3e2 and 40873284d710, a task that is requesting to open a file waits for permission event response with wait_event_killable() interruptibely and then returns -ERESTARTSYS (-512) if the situation calls for it. And then, if event->state is FAN_EVENT_REPORTED, which means that the event reported to userspace and no answer yet, event->state is replaced with FAN_EVENT_CANCELED subsequently, and thus fanotify_get_response() returns. The tasks waiting for permission event responses no longer block in TASK_UNINTERRUPTIBLE sleep forever. The server no longer gets hung up either.
Another vmcore analysis
The data shows that the kernel crashed due to a blocked task where hung_task_panic sysctl knob was being enabled.
KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-1160.2.1.el7.x86_64/vmlinux
DUMPFILE: /cores/retrace/tasks/634345889/crash/vmcore [PARTIAL DUMP]
RELEASE: 3.10.0-1160.2.1.el7.x86_64
PANIC: "Kernel panic - not syncing: hung_task: blocked tasks"
DMI_BIOS_VENDOR: Phoenix Technologies LTD
DMI_BIOS_VERSION: 6.00
DMI_BIOS_DATE: 09/19/2018
DMI_SYS_VENDOR: VMware, Inc.
DMI_PRODUCT_NAME: VMware Virtual Platform
DMI_BOARD_VENDOR: Intel Corporation
DMI_BOARD_NAME: 440BX Desktop Reference Platform
DMI_CHASSIS_TYPE: 1
crash> log | grep blocked
[1382263.605883] INFO: task pool:1584 blocked for more than 120 seconds.
[1382263.607562] Kernel panic - not syncing: hung_task: blocked tasks
crash> p sysctl_hung_task_panic
sysctl_hung_task_panic = $1 = 1
crash> p sysctl_hung_task_timeout_secs
sysctl_hung_task_timeout_secs = $2 = 120
There is one task that is getting stuck in TASK_UNINTERRUPTIBLE sleep state for more than 120 seconds.
crash> foreach UN ps -m
[ 0 00:00:00.000] [UN] PID: 1330 TASK: ffff8dfce0bee300 CPU: 3 COMMAND: "in:imjournal"
[ 0 00:00:06.375] [UN] PID: 44285 TASK: ffff8df97b7e9080 CPU: 1 COMMAND: "java"
[ 0 00:02:16.275] [UN] PID: 1584 TASK: ffff8dfcea9d3180 CPU: 3 COMMAND: "pool"
PID: 1584 TASK: ffff8dfcea9d3180 CPU: 3 COMMAND: "pool"
#0 [ffff8dfa3b7cbb68] __schedule at ffffffffbc786cc7
#1 [ffff8dfa3b7cbbf8] schedule_preempt_disabled at ffffffffbc788109
#2 [ffff8dfa3b7cbc08] __mutex_lock_slowpath at ffffffffbc786077
#3 [ffff8dfa3b7cbc68] mutex_lock at ffffffffbc78544f
#4 [ffff8dfa3b7cbc80] lookup_slow at ffffffffbc77df5d
#5 [ffff8dfa3b7cbcb8] path_lookupat at ffffffffbc25cffe
#6 [ffff8dfa3b7cbd50] filename_lookup at ffffffffbc25d05b
...
The task is looking up a file named 18029-lms.log sitting on the following NFSv4 mount to obtain information about the file.
crash> nameidata.path.dentry,last.name ffff8dfa3b7cbd90
path.dentry = 0xffff8dfc3b799080
last.name = 0xffff8dfc2920c03b "18029-lms.log"
crash> files -d 0xffff8dfc3b799080
DENTRY INODE SUPERBLK TYPE PATH
ffff8dfc3b799080 ffff8dfce11a2288 ffff8df97a653000 DIR /talend/tmp/jobservers/dev/18029-lms/logs
crash> mount | awk 'NR==1 || /ffff8df97a653000/'
MOUNT SUPERBLK TYPE DEVNAME DIRNAME
ffff8dfa22b93480 ffff8df97a653000 nfs4 nfssrv:/talend-share/jobservers/dev /talend/tmp
1502 static int lookup_slow(struct nameidata *nd, struct path *path)
1503 {
1504 struct dentry *dentry, *parent;
1505 int err;
1506
1507 parent = nd->path.dentry;
1508 BUG_ON(nd->inode != parent->d_inode);
1509
1510 mutex_lock(&parent->d_inode->i_mutex); <<-----
....
We need to get the parent dentry's d_inode mutex before getting the dentry of the file:
crash> px &((struct nameidata *)0xffff8dfa3b7cbd90)->path.dentry->d_inode->i_mutex
$5 = (struct mutex *) 0xffff8dfce11a2330
At this time, the following task has taken that inode mutex and then is waiting for fanotify permission event response from the userspace to open the file.
crash> px ((struct nameidata *)0xffff8dfa3b7cbd90)->path.dentry->d_inode->i_mutex.owner
$6 = (struct task_struct *) 0xffff8df97b7e9080
PID: 44285 TASK: ffff8df97b7e9080 CPU: 1 COMMAND: "java"
#0 [ffff8dfc0b09f9c8] __schedule at ffffffffbc786cc7
#1 [ffff8dfc0b09fa58] schedule at ffffffffbc7871e9
#2 [ffff8dfc0b09fa68] fanotify_handle_event at ffffffffbc29a855
#3 [ffff8dfc0b09fae0] fsnotify at ffffffffbc296c28
#4 [ffff8dfc0b09fb98] security_file_open at ffffffffbc308bee
#5 [ffff8dfc0b09fbb8] do_dentry_open at ffffffffbc24b159
#6 [ffff8dfc0b09fc00] finish_open at ffffffffbc24b38f
#7 [ffff8dfc0b09fc18] nfs_atomic_open at ffffffffc080e35e [nfs]
#8 [ffff8dfc0b09fcc0] do_last at ffffffffbc25e293
#9 [ffff8dfc0b09fd70] path_openat at ffffffffbc25ec4d
#10 [ffff8dfc0b09fe08] do_filp_open at ffffffffbc260e9d
...
crash> dis -lr ffffffffbc25e293
...
/usr/src/debug/kernel-3.10.0-1160.2.1.el7/linux-3.10.0-1160.2.1.el7.x86_64/fs/namei.c: 3144
0xffffffffbc25dabe <do_last+638>: mov 0x48(%rsp),%rax
0xffffffffbc25dac3 <do_last+643>: mov 0x30(%rax),%rax
0xffffffffbc25dac7 <do_last+647>: lea 0xa8(%rax),%rdi
0xffffffffbc25dace <do_last+654>: mov %rax,0x30(%rsp)
0xffffffffbc25dad3 <do_last+659>: callq 0xffffffffbc785430 <mutex_lock>
...
3058 static int do_last(struct nameidata *nd, struct path *path,
3059 struct file *file, const struct open_flags *op,
3060 int *opened, struct filename *name)
3061 {
3062 struct dentry *dir = nd->path.dentry;
....
3133 retry_lookup:
....
3144 mutex_lock(&dir->d_inode->i_mutex); <<----- The task has taken the inode mutex here.
3145 error = lookup_open(nd, path, file, op, got_write, opened);
....
The filp that this task is trying to open is:
crash> struct file.f_path.dentry ffff8dfc4bed7100
f_path.dentry = 0xffff8dfc1aba3c80
crash> files -d 0xffff8dfc1aba3c80
DENTRY INODE SUPERBLK TYPE PATH
ffff8dfc1aba3c80 ffff8dfce1189e70 ffff8df97a653000 REG /talend/tmp/jobservers/dev/18029-lms/logs/18029-lms.log
The inode that needs to take fanotify permission event response from the userspace is:
crash> px ((struct file *)0xffff8dfc4bed7100)->f_path.dentry->d_inode
$25 = (struct inode *) 0xffff8dfce1189e70
325 int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
326 const unsigned char *file_name, u32 cookie)
327 {
...
Args passed to fsnotify() are:
crash> px ((struct file *)0xffff8dfc4bed7100)->f_path.dentry->d_inode
$25 = (struct inode *) 0xffff8dfce1189e70 <<----- to_tell
FS_OPEN_PERM <<----- mask
crash> px &((struct file *)0xffff8dfc4bed7100)->f_path
$26 = (struct path *) 0xffff8dfc4bed7110 <<----- data
FSNOTIFY_EVENT_PATH <<----- data_is
0 <<----- cookie
NULL <<----- file_name
...
377 while (fsnotify_iter_select_report_types(&iter_info)) {
378 ret = send_to_group(to_tell, mask, data, data_is, cookie,
379 file_name, &iter_info);
...
193 static int send_to_group(struct inode *to_tell,
194 __u32 mask, const void *data,
195 int data_is, u32 cookie,
196 const unsigned char *file_name,
197 struct fsnotify_iter_info *iter_info)
198 {
...
241 return group->ops->handle_event(group, to_tell, mask, data, data_is,
...
crash> px ((struct fsnotify_group *)0xffff8dfcdf451b00)->ops->handle_event
$34 = (int (*)(struct fsnotify_group *, struct inode *, u32, const void *, int, const unsigned char *, u32, struct fsnotify_iter_info *)) 0xffffffffbc29a5f0
crash> sym 0xffffffffbc29a5f0
ffffffffbc29a5f0 (t) fanotify_handle_event /usr/src/debug/kernel-3.10.0-1160.2.1.el7/linux-3.10.0-1160.2.1.el7.x86_64/fs/notify/fanotify/fanotify.c: 180
175 static int fanotify_handle_event(struct fsnotify_group *group,
176 struct inode *inode,
177 u32 mask, const void *data, int data_type,
178 const unsigned char *file_name, u32 cookie,
179 struct fsnotify_iter_info *iter_info)
180 {
...
230 #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
231 if (mask & FAN_ALL_PERM_EVENTS) {
232 ret = fanotify_get_response(group, FANOTIFY_PE(fsn_event),
233 iter_info);
...
59 static int fanotify_get_response(struct fsnotify_group *group,
60 struct fanotify_perm_event_info *event,
61 struct fsnotify_iter_info *iter_info)
62 {
63 int ret;
64
65 pr_debug("%s: group=%p event=%p\n", __func__, group, event);
66
67 wait_event(group->fanotify_data.access_waitq, event->response); <<----- we are waiting here.
..
The fanotify group's notification list is empty:
crash> px &((struct fsnotify_group *)0xffff8dfcdf451b00)->notification_list
$45 = (struct list_head *) 0xffff8dfcdf451b18
crash> list -H 0xffff8dfcdf451b18 -o fanotify_perm_event_info.fae -s fanotify_perm_event_info
(empty)
The fanotify group's notification list is empty because the events have been retrieved by userspace and moved to the access_list:
crash> px &((struct fsnotify_group *)0xffff8dfcdf451b00)->fanotify_data.access_list
$44 = (struct list_head *) 0xffff8dfcdf451ba8
crash> list -o fanotify_perm_event_info.fae -s fanotify_perm_event_info -H 0xffff8dfcdf451ba8
ffff8df9770d0bc0
struct fanotify_perm_event_info {
fae = {
fse = {
list = {
next = 0xffff8dfc78505ac0,
prev = 0xffff8dfcdf451ba8
},
inode = 0xffff8dfce1189e70,
mask = 0x10000
},
path = {
mnt = 0xffff8dfa22b934a0,
dentry = 0xffff8dfc1aba3c80
},
tgid = 0xffff8dfc404e7d80
},
response = 0x0, <<-----
fd = 0x8
}
ffff8dfc78505ac0
struct fanotify_perm_event_info {
fae = {
fse = {
list = {
next = 0xffff8dfcdf451ba8,
prev = 0xffff8df9770d0bc0
},
inode = 0xffff8dfc3cac08c8,
mask = 0x10000
},
path = {
mnt = 0xffff8dfa217be2a0,
dentry = 0xffff8dfc1a935500
},
tgid = 0xffff8df97bb39680
},
response = 0x0, <<-----
fd = 0x7
}
There are 2 inodes in the fsnotify group's access list that are waiting for fanotify permission event response from the userspace as shown above.
crash> files -d 0xffff8dfc1aba3c80
DENTRY INODE SUPERBLK TYPE PATH
ffff8dfc1aba3c80 ffff8dfce1189e70 ffff8df97a653000 REG /talend/tmp/jobservers/dev/18029-lms/logs/18029-lms.log
crash> files -d 0xffff8dfc1a935500
DENTRY INODE SUPERBLK TYPE PATH
ffff8dfc1a935500 ffff8dfc3cac08c8 ffff8dfa283cc000 REG /var/lib/rsyslog/imjournal.state.tmp
Therefore, the following tasks will be waiting in the access wait queue until fanotify permission event response is given from the userspace:
crash> px &((struct fsnotify_group *)0xffff8dfcdf451b00)->fanotify_data.access_waitq
$36 = (wait_queue_head_t *) 0xffff8dfcdf451bb8
crash> waitq 0xffff8dfcdf451bb8
PID: 1330 TASK: ffff8dfce0bee300 CPU: 3 COMMAND: "in:imjournal"
PID: 44285 TASK: ffff8df97b7e9080 CPU: 1 COMMAND: "java"
At this time there are many tasks named vsep or pool that are from VMware Endpoint Security Solution as shown below. These tasks all are holding a fanotify type file handle of which private_data matches the fsnotify group just right above.
crash> foreach files | grep fanotify -B5 | grep -e PID -e FD -e fanotify
PID: 1579 TASK: ffff8dfa283c4200 CPU: 2 COMMAND: "vsep"
FD FILE DENTRY INODE TYPE PATH
2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1584 TASK: ffff8dfcea9d3180 CPU: 3 COMMAND: "pool"
FD FILE DENTRY INODE TYPE PATH
2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1585 TASK: ffff8dfcea9d0000 CPU: 1 COMMAND: "pool"
FD FILE DENTRY INODE TYPE PATH
2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1586 TASK: ffff8dfa2fbc2100 CPU: 3 COMMAND: "pool"
FD FILE DENTRY INODE TYPE PATH
2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1587 TASK: ffff8dfce0bc1080 CPU: 3 COMMAND: "pool"
FD FILE DENTRY INODE TYPE PATH
2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1588 TASK: ffff8dfce0bc0000 CPU: 2 COMMAND: "pool"
FD FILE DENTRY INODE TYPE PATH
2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1589 TASK: ffff8dfce0bc4200 CPU: 1 COMMAND: "pool"
FD FILE DENTRY INODE TYPE PATH
2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1590 TASK: ffff8dfa39ba4200 CPU: 2 COMMAND: "pool"
FD FILE DENTRY INODE TYPE PATH
2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1591 TASK: ffff8dfa2bbe0000 CPU: 3 COMMAND: "pool"
FD FILE DENTRY INODE TYPE PATH
2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1592 TASK: ffff8dfa2bbe1080 CPU: 0 COMMAND: "pool"
FD FILE DENTRY INODE TYPE PATH
2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1593 TASK: ffff8dfa2bbe2100 CPU: 0 COMMAND: "pool"
FD FILE DENTRY INODE TYPE PATH
2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1594 TASK: ffff8dfa2bbe3180 CPU: 0 COMMAND: "pool"
FD FILE DENTRY INODE TYPE PATH
2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1595 TASK: ffff8dfa2bbe4200 CPU: 1 COMMAND: "pool"
FD FILE DENTRY INODE TYPE PATH
2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1596 TASK: ffff8dfa2bbe5280 CPU: 0 COMMAND: "pool"
FD FILE DENTRY INODE TYPE PATH
2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1597 TASK: ffff8dfa2bbe6300 CPU: 1 COMMAND: "pool"
FD FILE DENTRY INODE TYPE PATH
2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1598 TASK: ffff8dfa2bbd8000 CPU: 0 COMMAND: "pool"
FD FILE DENTRY INODE TYPE PATH
2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1599 TASK: ffff8dfa2bbd9080 CPU: 2 COMMAND: "pool"
FD FILE DENTRY INODE TYPE PATH
2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1600 TASK: ffff8dfa2bbda100 CPU: 3 COMMAND: "pool"
FD FILE DENTRY INODE TYPE PATH
2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1601 TASK: ffff8dfa2bbdb180 CPU: 1 COMMAND: "pool"
FD FILE DENTRY INODE TYPE PATH
2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1602 TASK: ffff8dfa2bbdc200 CPU: 0 COMMAND: "pool"
FD FILE DENTRY INODE TYPE PATH
2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1603 TASK: ffff8dfa2bbdd280 CPU: 3 COMMAND: "pool"
FD FILE DENTRY INODE TYPE PATH
2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1604 TASK: ffff8dfa2bbde300 CPU: 1 COMMAND: "vsep"
FD FILE DENTRY INODE TYPE PATH
2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1605 TASK: ffff8dfa2bbf0000 CPU: 0 COMMAND: "vsep"
FD FILE DENTRY INODE TYPE PATH
2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1606 TASK: ffff8dfa2bbf1080 CPU: 3 COMMAND: "vsep"
FD FILE DENTRY INODE TYPE PATH
2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1607 TASK: ffff8dfa2bbf2100 CPU: 1 COMMAND: "vsep"
FD FILE DENTRY INODE TYPE PATH
2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
PID: 1608 TASK: ffff8dfa2bbf3180 CPU: 0 COMMAND: "vsep"
FD FILE DENTRY INODE TYPE PATH
2 ffff8dfcdf451a00 ffff8dfce1231ec0 ffff8dfced125c80 UNKN [fanotify]
crash> struct file.private_data ffff8dfcdf451a00
private_data = 0xffff8dfcdf451b00
The latest issue happened with Twistlock Defender
KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-1160.36.2.el7.x86_64/vmlinux
DUMPFILE: /cores/retrace/tasks/224240356/crash/vmcore.vmss
LOAD AVERAGE: 1106.86, 1106.33, 1104.16
RELEASE: 3.10.0-1160.36.2.el7.x86_64
DMI_BIOS_VENDOR: Phoenix Technologies LTD
DMI_BIOS_VERSION: 6.00
DMI_BIOS_DATE: 05/28/2020
DMI_SYS_VENDOR: VMware, Inc.
DMI_PRODUCT_NAME: VMware Virtual Platform
crash> foreach UN ps -m | tail
[3 16:44:00.903] [UN] PID: 4093 TASK: ffff92470f726300 CPU: 2 COMMAND: "PatrolAgent"
[3 16:44:01.884] [UN] PID: 4091 TASK: ffff92470f725280 CPU: 3 COMMAND: "PatrolAgent"
[3 16:44:01.911] [UN] PID: 4090 TASK: ffff92470f724200 CPU: 2 COMMAND: "PatrolAgent"
[3 16:44:02.898] [UN] PID: 4089 TASK: ffff92470f723180 CPU: 1 COMMAND: "PatrolAgent"
[3 16:44:06.527] [UN] PID: 3918 TASK: ffff9245ba51a100 CPU: 0 COMMAND: "yum"
[3 16:44:06.943] [UN] PID: 4084 TASK: ffff92470f722100 CPU: 2 COMMAND: "PatrolAgent"
[3 16:44:06.925] [UN] PID: 4076 TASK: ffff9249214cd280 CPU: 3 COMMAND: "grep"
[3 16:44:06.951] [UN] PID: 3090 TASK: ffff92484eae1080 CPU: 2 COMMAND: "Broker KM Evnt"
[3 16:44:06.951] [UN] PID: 4083 TASK: ffff92483ab2a100 CPU: 2 COMMAND: "PatrolAgent"
[3 16:44:06.951] [UN] PID: 3751 TASK: ffff9247e4202100 CPU: 2 COMMAND: "yum"
Huge number of tasks are stuck calling fanotify_handle_event():
crash> foreach UN bt | awk '/#[2-3] /{print $3,$5}' | paste - - | sort | uniq -c | sort -nr
1106 fanotify_handle_event ffffffffa8a9b1b5 fsnotify ffffffffa8a97588
1 schedule_timeout ffffffffa8f86da8 mfe_fileaccess_set_new_entry_scan_state ffffffffc065e695
with the backtraces just like this:
PID: 3751 TASK: ffff9247e4202100 CPU: 2 COMMAND: "yum"
#0 [ffff92477c4bf8b0] __schedule at ffffffffa8f88dc8
#1 [ffff92477c4bf918] schedule at ffffffffa8f89199
#2 [ffff92477c4bf928] fanotify_handle_event at ffffffffa8a9b1b5
#3 [ffff92477c4bf9a0] fsnotify at ffffffffa8a97588
#4 [ffff92477c4bfa50] security_file_open at ffffffffa8b0954e
#5 [ffff92477c4bfa70] do_dentry_open at ffffffffa8a4bb49
#6 [ffff92477c4bfab8] vfs_open at ffffffffa8a4bdea
#7 [ffff92477c4bfae0] do_last at ffffffffa8a5e1d6
#8 [ffff92477c4bfb90] path_openat at ffffffffa8a5f3ed
#9 [ffff92477c4bfc28] do_filp_open at ffffffffa8a6163d
#10 [ffff92477c4bfd00] do_sys_open at ffffffffa8a4d3d4
#11 [ffff92477c4bfd60] sys_openat at ffffffffa8a4d504
#12 [ffff92477c4bfd70] mfe_aac_sys_openat_64_bit at ffffffffc058c1a2 [mfe_aac_100705153]
#13 [ffff92477c4bff08] mfe_aac_sys_open_64_bit at ffffffffc058c32a [mfe_aac_100705153]
#14 [ffff92477c4bff18] mfe_fileaccess_sys_open_64_bit at ffffffffc065c043 [mfe_fileaccess_100705153]
#15 [ffff92477c4bff50] tracesys at ffffffffa8f96226 (via system_call)
RIP: 00007f9d52be4efd RSP: 00007fff8b085650 RFLAGS: 00000293
RAX: ffffffffffffffda RBX: 0000000002cb7cc0 RCX: ffffffffffffffff
RDX: 00000000000001b6 RSI: 0000000000000241 RDI: 0000000002cb7cc0
RBP: 0000000000000241 R8: 0000000000000000 R9: 0000000000000010
R10: 00007f9d5223f390 R11: 0000000000000293 R12: 0000000002cb7cc0
R13: 00000000000001b6 R14: 0000000000000012 R15: 0000000002cd8590
ORIG_RAX: 0000000000000002 CS: 0033 SS: 002b
The 3rd party security module namely mfe_aac_100705153 is involed in the
stack because the module made changes on the original sys_call_table for
syscall hooking.
crash> sys -c |awk '$1==2||$1==257'
2 mfe_fileaccess_sys_open_64_bit
257 mfe_fileaccess_sys_openat_64_bit
crash> px ((struct file *)0xffff924893a48d00)->f_path
$5 = {
mnt = 0xffff92467599e5a0,
dentry = 0xffff924785808900
}
The above task is trying to open this file and waiting for the fanotify
permission event response from the userspace:
crash> files -d 0xffff924785808900
DENTRY INODE SUPERBLK TYPE PATH
ffff924785808900 ffff92476f7eb5c8 ffff9245366b3000 REG /usr/lib64/gconv/IBM280.so;6190359f
crash> mount | awk 'NR==1||$2~/ffff9245366b3000/'
MOUNT SUPERBLK TYPE DEVNAME DIRNAME
ffff92467599e580 ffff9245366b3000 xfs /dev/mapper/rootvg-lv_usr /usr
crash> px ((struct file *)0xffff924893a48d00)->f_path.dentry->d_inode
$6 = (struct inode *) 0xffff92476f7eb5c8
The notification_list:
crash> px &((struct fsnotify_group *)0xffff92492b3c7900)->notification_list
$8 = (struct list_head *) 0xffff92492b3c7918
crash> list -o fanotify_perm_event_info.fae -H 0xffff92492b3c7918 | wc -l
1104
The access_list:
crash> px &((struct fsnotify_group *)0xffff92492b3c7900)->fanotify_data.access_list
$9 = (struct list_head *) 0xffff92492b3c79a8
crash> list -o fanotify_perm_event_info.fae -s fanotify_perm_event_info -H 0xffff92492b3c79a8
ffff924920c425c0
struct fanotify_perm_event_info {
fae = {
fse = {
list = {
next = 0xffff924920c42640,
prev = 0xffff92492b3c79a8
},
inode = 0xffff924785828148,
mask = 65536
},
path = {
mnt = 0xffff92467599e5a0,
dentry = 0xffff924785833c80
},
tgid = 0xffff92492319d100
},
response = 0,
fd = 10
}
ffff924920c42640
struct fanotify_perm_event_info {
fae = {
fse = {
list = {
next = 0xffff92492b3c79a8,
prev = 0xffff924920c425c0
},
inode = 0xffff92476f7eb5c8,
mask = 65536
},
path = {
mnt = 0xffff92467599e5a0,
dentry = 0xffff924785808900
},
tgid = 0xffff92484ea60180
},
response = 0,
fd = 12
}
crash> files -d 0xffff924785833c80
DENTRY INODE SUPERBLK TYPE PATH
ffff924785833c80 ffff924785828148 ffff9245366b3000 REG /usr/share/xxxx/node_modules/handlebars/dist/cjs/handlebars/compiler/whitespace-control.js
crash> files -d 0xffff924785808900
DENTRY INODE SUPERBLK TYPE PATH
ffff924785808900 ffff92476f7eb5c8 ffff9245366b3000 REG /usr/lib64/gconv/IBM280.so;6190359f
1106 tasks are all waiting for the fanotify permission event response from
the userspace:
crash> px &((struct fsnotify_group *)0xffff92492b3c7900)->fanotify_data.access_waitq
$10 = (wait_queue_head_t *) 0xffff92492b3c79b8
crash> waitq 0xffff92492b3c79b8
PID: 8506 TASK: ffff924761202100 CPU: 2 COMMAND: "(journald)"
PID: 8130 TASK: ffff92475ef7d280 CPU: 0 COMMAND: "(journald)"
PID: 7801 TASK: ffff92475ef7e300 CPU: 1 COMMAND: "sshd"
PID: 6708 TASK: ffff92476d7c6300 CPU: 3 COMMAND: "containerd-shim"
PID: 7750 TASK: ffff92475ef79080 CPU: 0 COMMAND: "(journald)"
PID: 7374 TASK: ffff92475ef7b180 CPU: 0 COMMAND: "(journald)"
PID: 6997 TASK: ffff92476d7c3180 CPU: 2 COMMAND: "(journald)"
PID: 3448 TASK: ffff924783013180 CPU: 3 COMMAND: "containerd-shim"
PID: 6618 TASK: ffff92476d7c2100 CPU: 1 COMMAND: "(journald)"
PID: 6459 TASK: ffff92476d7c5280 CPU: 1 COMMAND: "rscd"
PID: 6240 TASK: ffff9247715f6300 CPU: 2 COMMAND: "(journald)"
PID: 5864 TASK: ffff92454daed280 CPU: 0 COMMAND: "(journald)"
PID: 121245 TASK: ffff92454c3fa100 CPU: 1 COMMAND: "containerd-shim"
... Too long. Skipping ...
crash> waitq 0xffff92492b3c79b8 | wc -l
1106
We have just one task having the fanotify type file handle at this time:
crash> files 2743
PID: 2743 TASK: ffff9248938e4200 CPU: 3 COMMAND: "fsmon"
ROOT: / CWD: /
FD FILE DENTRY INODE TYPE PATH
0 ffff924921282800 ffff9245b96cf200 ffff92488f450000 FIFO
1 ffff92489b459800 ffff92488f4226c0 ffff92488f459e10 FIFO
2 ffff92489b458100 ffff9248fcefa000 ffff92488f45a2b0 FIFO
3 ffff92489b45be00 ffff92488f422600 ffff92488f45a060 FIFO
4 ffff92492b3c7700 ffff9248fcfb09c0 ffff92488f41d990 REG
5 ffff92492b3c7e00 ffff9248fcfb0900 ffff92492d11f150 UNKN [fanotify] <<------------
6 ffff92491a02a900 ffff9248fcfb0a80 ffff92492d11f150 UNKN inotify
10 ffff9249232dd000 ffff924785833c80 ffff924785828148 REG /usr/share/xxxx/node_modules/handlebars/dist/cjs/handlebars/compiler/whitespace-control.js
11 ffff92491c2e7200 ffff92476f75c3c0 ffff924785838508 REG /usr/lib64/gconv/IBM278.so
12 ffff9249217fbf00 ffff924785808900 ffff92476f7eb5c8 REG /usr/lib64/gconv/IBM280.so;6190359f
The fanotify type file handle matches the fsnotify_group:
crash> struct file.private_data ffff92492b3c7e00
private_data = 0xffff92492b3c7900
The aforementioned 1106 tasks are waiting for the fanotify permission event
response from this task namely fsmon, which is a child process of Twistlock
Defender:
crash> ps -p 2743
PID: 0 TASK: ffffffffa9418480 CPU: 0 COMMAND: "swapper/0"
PID: 1 TASK: ffff9246763a8000 CPU: 0 COMMAND: "systemd"
PID: 1818 TASK: ffff92490aad3180 CPU: 0 COMMAND: "containerd"
PID: 2372 TASK: ffff9248f1d4c200 CPU: 0 COMMAND: "containerd-shim"
PID: 2672 TASK: ffff9248e543b180 CPU: 3 COMMAND: "defender"
PID: 2743 TASK: ffff9248938e4200 CPU: 3 COMMAND: "fsmon" <<------------
fsmon is stuck in write(1, 0x1e58c60, 174) (syscall arguments are RDI, RSI, RDX, ...)
PID: 2743 TASK: ffff9248938e4200 CPU: 3 COMMAND: "fsmon"
#0 [ffff92489396fca0] __schedule at ffffffffa8f88dc8
#1 [ffff92489396fd08] schedule at ffffffffa8f89199
#2 [ffff92489396fd18] pipe_wait at ffffffffa8a57950
#3 [ffff92489396fd68] pipe_write at ffffffffa8a57bf9
#4 [ffff92489396fdf0] do_sync_write at ffffffffa8a4da53
#5 [ffff92489396fec8] vfs_write at ffffffffa8a4e540
#6 [ffff92489396ff08] sys_write at ffffffffa8a4f31f
#7 [ffff92489396ff50] tracesys at ffffffffa8f96226 (via system_call)
RIP: 00000000004580c1 RSP: 00007ffc9ba0e508 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 00000000000000ae RCX: ffffffffffffffff
RDX: 00000000000000ae RSI: 0000000001e58c60 RDI: 0000000000000001
RBP: 0000000001e58c60 R8: 0000000000abadf0 R9: 0000000001e57880
R10: 00007ffc9ba0fad8 R11: 0000000000000246 R12: 00000000006cb4a0
R13: 00000000000000ae R14: 00000000006ccd60 R15: 00000000006cd1e0
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
crash> files 2743
PID: 2743 TASK: ffff9248938e4200 CPU: 3 COMMAND: "fsmon"
ROOT: / CWD: /
FD FILE DENTRY INODE TYPE PATH
0 ffff924921282800 ffff9245b96cf200 ffff92488f450000 FIFO
1 ffff92489b459800 ffff92488f4226c0 ffff92488f459e10 FIFO
2 ffff92489b458100 ffff9248fcefa000 ffff92488f45a2b0 FIFO
3 ffff92489b45be00 ffff92488f422600 ffff92488f45a060 FIFO
4 ffff92492b3c7700 ffff9248fcfb09c0 ffff92488f41d990 REG
5 ffff92492b3c7e00 ffff9248fcfb0900 ffff92492d11f150 UNKN [fanotify]
6 ffff92491a02a900 ffff9248fcfb0a80 ffff92492d11f150 UNKN inotify
10 ffff9249232dd000 ffff924785833c80 ffff924785828148 REG /usr/share/xxxx/node_modules/handlebars/dist/cjs/handlebars/compiler/whitespace-control.js
11 ffff92491c2e7200 ffff92476f75c3c0 ffff924785838508 REG /usr/lib64/gconv/IBM278.so
12 ffff9249217fbf00 ffff924785808900 ffff92476f7eb5c8 REG /usr/lib64/gconv/IBM280.so;6190359f
Looking at fd=1.
crash> struct file.private_data ffff92489b459800
private_data = 0xffff92489b5e5500
crash> struct pipe_inode_info.buffers,nrbufs,curbuf,bufs 0xffff92489b5e5500
buffers = 16
nrbufs = 16
curbuf = 3
bufs = 0xffff924929395c00
crash> p *((struct pipe_buffer *) 0xffff924929395c00 + 2)
$16 = {
page = 0xffffd84dcaa48c80,
offset = 0,
len = 3985,
ops = 0xffffffffa902ebc0,
flags = 0,
private = 0
}
crash> p 3985+174
$18 = 4159
Pipe is full, fsmon is stuck writing to the pipe.
Who is supposed to be reading the pipe's other end? Since the inode is common to both ends, look for it:
crash> foreach files -R ffff92488f459e10
PID: 2418 TASK: ffff9248f1e20000 CPU: 1 COMMAND: "defender"
ROOT: / CWD: /
FD FILE DENTRY INODE TYPE PATH
41 ffff92489b458f00 ffff92488f4226c0 ffff92488f459e10 FIFO
[repeated for each thread of 2418]
PID: 2743 TASK: ffff9248938e4200 CPU: 3 COMMAND: "fsmon"
ROOT: / CWD: /
FD FILE DENTRY INODE TYPE PATH
1 ffff92489b459800 ffff92488f4226c0 ffff92488f459e10 FIFO
[...]
crash> foreach defender bt
[cutting details, summary below]
Waiting on a futex:
2418 2472 2474 2475 2476 2487 2757 2765 28937
Waiting on epoll:
2473 2494 2742
Blocked in fanotify_handle_event (i.e. recursion):
2672 2764
Look for read side pipe in epoll instances (omitting 2473 and 2742 since
they do not contain the pipe fd):
crash> bt 2494
PID: 2494 TASK: ffff9248e543e300 CPU: 2 COMMAND: "defender"
#0 [ffff9245b6bd3cd8] __schedule at ffffffffa8f88dc8
#1 [ffff9245b6bd3d40] schedule at ffffffffa8f89199
#2 [ffff9245b6bd3d50] schedule_hrtimeout_range_clock at ffffffffa8f88562
#3 [ffff9245b6bd3de8] schedule_hrtimeout_range at ffffffffa8f88613
#4 [ffff9245b6bd3df8] ep_poll at ffffffffa8a9d74e
#5 [ffff9245b6bd3eb0] sys_epoll_wait at ffffffffa8a9ec2d
#6 [ffff9245b6bd3f00] sys_epoll_pwait at ffffffffa8a9ed75
#7 [ffff9245b6bd3f50] tracesys at ffffffffa8f96226 (via system_call)
RIP: 0000000000472360 RSP: 00007f14118b2658 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 0000000000000039 RCX: ffffffffffffffff
RDX: 0000000000000080 RSI: 00007f14118b26a8 RDI: 0000000000000003
RBP: 00007f14118b2ca8 R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000039 R11: 0000000000000246 R12: 0000000000000003
R13: 000000c000102600 R14: 0000000000000000 R15: 000000c0011caa20
ORIG_RAX: 0000000000000119 CS: 0033 SS: 002b
crash> files 2494 | grep "^ *3 "
3 ffff9249077a4c00 ffff9248fcebee40 ffff92492d11f150 UNKN [eventpoll]
crash> struct file.private_data ffff9249077a4c00
private_data = 0xffff9249077b9500
crash> eventpoll.rbr 0xffff9249077b9500
rbr = {
rb_node = 0xffff924599985400
}
crash> tree -t rbtree -o epitem.rbn -s epitem.ffd.fd -N 0xffff924599985400 | grep -B1 "= 41"
ffff924599985c80
ffd.fd = 41
crash> struct -x epitem.event.events ffff924599985c80
event.events = 0x8000201d,
0x8000201d == POLLIN | POLLOUT | POLLERR | POLLHUP | POLLRDHUP | EPOLLET
So it does poll the read end of the pipe, alas edge-triggered only, and that
is why it's blocking despite the pipe having available data to read.
kernel is seemingly doing everything fine, bug is in userspace.
- Below was caused by McAfee, but with different process name. This system had
mfetpd
instead offsmon
.
crash> bt
PID: 1 TASK: ffff905003712840 CPU: 2 COMMAND: "systemd"
#0 [ffffb57ac3147b30] __schedule at ffffffff9cda0731
#1 [ffffb57ac3147bc0] schedule at ffffffff9cda0cc5
#2 [ffffb57ac3147bd0] fanotify_handle_event at ffffffff9c790008
#3 [ffffb57ac3147c38] fsnotify at ffffffff9c78c213
#4 [ffffb57ac3147cd0] do_dentry_open at ffffffff9c73ad0e
#5 [ffffb57ac3147d00] path_openat at ffffffff9c74f72e
#6 [ffffb57ac3147dd8] do_filp_open at ffffffff9c751a43
#7 [ffffb57ac3147ee0] do_sys_open at ffffffff9c73c5e4
#8 [ffffb57ac3147f38] do_syscall_64 at ffffffff9c40430b
#9 [ffffb57ac3147f50] entry_SYSCALL_64_after_hwframe at ffffffff9ce000ad
RIP: 00007f488efe7301 RSP: 00007fff8db0ec10 RFLAGS: 00000293
RAX: ffffffffffffffda RBX: 0000560b858fe820 RCX: 00007f488efe7301
RDX: 0000000000080000 RSI: 00007fff8db0ed60 RDI: 00000000ffffff9c
RBP: 0000000000000008 R8: 0000000000000000 R9: 0000000000000001
R10: 0000000000000000 R11: 0000000000000293 R12: 00007f4890754ee3
R13: 00007f4890754ee3 R14: 0000000000000001 R15: 0000000000000000
ORIG_RAX: 0000000000000101 CS: 0033 SS: 002b
crash> fsnotify_group 0xffff90513a03da00 -ox
struct fsnotify_group {
[ffff90513a03da00] const struct fsnotify_ops *ops;
[ffff90513a03da08] refcount_t refcnt;
[ffff90513a03da0c] spinlock_t notification_lock;
[ffff90513a03da10] struct list_head notification_list;
[ffff90513a03da20] wait_queue_head_t notification_waitq;
[ffff90513a03da38] unsigned int q_len;
[ffff90513a03da3c] unsigned int max_events;
[ffff90513a03da40] unsigned int priority;
[ffff90513a03da44] bool shutdown;
[ffff90513a03da48] struct mutex mark_mutex;
[ffff90513a03da68] atomic_t num_marks;
[ffff90513a03da6c] atomic_t user_waits;
[ffff90513a03da70] struct list_head marks_list;
[ffff90513a03da80] struct fasync_struct *fsn_fa;
[ffff90513a03da88] struct fsnotify_event *overflow_event;
[ffff90513a03da90] struct mem_cgroup *memcg;
union {
[ffff90513a03da98] void *private;
[ffff90513a03da98] struct inotify_group_private_data inotify_data;
[ffff90513a03da98] struct fanotify_group_private_data fanotify_data;
};
}
SIZE: 0xd8
- This fanotify was owned and supposed to be handled by
mfetpd
process as we can see in the below.
crash> epstree -p -t 2089
mfetpd(2089) -+- Scan Factory(2938) -+- ScanAction Mgr(3239968)
`- OAS Manager(3240000)
crash> files | grep fanotify
34 ffff9051217a4500 ffff90505ce13000 ffff905004d2ccd0 UNKN [fanotify]
crash> struct file.private_data ffff9051217a4500
private_data = 0xffff90513a03da00,
- This
mfetpd
and related processes were fromMcAfee
.
crash> fsinfo -f 0xffff905005a24dc0
== File Info ==
DENTRY INODE SUPERBLK TYPE PATH
ffff90504093a3e8 ffff905040946730 ffff905007b37000 REG /opt/McAfee/ens/tp/bin/mfetpd
file operations = ffffffffc0854b00 (B) xfs_file_operations [xfs]
file open mode = Read-Only (0x8020)
file size = 2136272 bytes, ino = 4214918, link count = 1
uid = 0, gid = 0
== Mount Info ==
ffff9050386d7a40 ffff905007b37000 xfs /dev/mapper/vg00-lv_opt /opt
- Among the threads in this mfetpd process, below was the one was trying to do communication using local unix socket.
crash> bt
PID: 3813 TASK: ffff90513a20d080 CPU: 0 COMMAND: "OAS Req Br->Mgr"
#0 [ffffb57ac7a23a78] __schedule at ffffffff9cda0731
#1 [ffffb57ac7a23b08] schedule at ffffffff9cda0cc5
#2 [ffffb57ac7a23b18] schedule_timeout at ffffffff9cda5424
#3 [ffffb57ac7a23bb0] sock_alloc_send_pskb at ffffffff9cbbd18f
#4 [ffffb57ac7a23c30] unix_dgram_sendmsg at ffffffff9cd0086d
#5 [ffffb57ac7a23cc8] sock_sendmsg at ffffffff9cbb753c
#6 [ffffb57ac7a23ce0] ____sys_sendmsg at ffffffff9cbb785b
#7 [ffffb57ac7a23d58] ___sys_sendmsg at ffffffff9cbb9b4c
#8 [ffffb57ac7a23eb0] __sys_sendmsg at ffffffff9cbb9c17
#9 [ffffb57ac7a23f38] do_syscall_64 at ffffffff9c40430b
#10 [ffffb57ac7a23f50] entry_SYSCALL_64_after_hwframe at ffffffff9ce000ad
RIP: 00007f363529a20c RSP: 00007f35f8ff1a50 RFLAGS: 00000293
RAX: ffffffffffffffda RBX: 00007f35f8ff1af0 RCX: 00007f363529a20c
RDX: 0000000000000000 RSI: 00007f35f8ff1af0 RDI: 0000000000000004
RBP: 00007f35f8ff1acc R8: 0000000000000000 R9: 00007f35e0287728
R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000004
R13: 00007f35f8ff1af0 R14: 0000000000000000 R15: 0000000010358940
ORIG_RAX: 000000000000002e CS: 0033 SS: 002b
crash> unix_sock.peer,addr,path 0xffff90500fbda140 <-- The socket from this process
peer = 0xffff90500fbddf00,
addr = 0x0,
path = {
mnt = 0x0,
dentry = 0x0
},
crash> unix_sock.peer,addr,path 0xffff90500fbddf00 <-- peer socket
peer = 0xffff90500fbda140,
addr = 0x0,
path = {
mnt = 0x0,
dentry = 0x0
},
crash> sock.sk_wmem_alloc,sk_sndbuf,sk_sndtimeo 0xffff90500fbda140
sk_wmem_alloc = {
refs = {
counter = 0x1c701
}
},
sk_sndbuf = 0x34000,
sk_sndtimeo = 0x7fffffffffffffff,
crash> pd 0x7fffffffffffffff/1000
$14 = 9223372036854775 <-- seconds
- It looks like once this process went to sleep, it didn't wake up unless someone does as timeout value is very long.
vmcore analysis in case of Airlock (airlock-client.service) contributing to the issue
Lots of UN state tasks calling fanotify_handle_event() and waiting.
crash> foreach UN bt | grep '#2 ' | sort -k3 | uniq -c -f2 | sort -nr | head
5 #2 [ffff935d33013a98] fanotify_handle_event at ffffffff9beaa8a5
Processes in UN state:
List of processes in uninterruptible state.
crash> ps -m|grep UN
[0 00:00:30.777] [UN] PID: 127882 TASK: ffff935d022b3180 CPU: 3 COMMAND: "rhsmcertd-worke"
[0 00:00:30.778] [UN] PID: 100106 TASK: ffff935e18b1c200 CPU: 1 COMMAND: "systemd-journal"
[0 00:00:30.778] [UN] PID: 99304 TASK: ffff935df1e5e300 CPU: 1 COMMAND: "db2fm"
[0 00:00:30.778] [UN] PID: 52402 TASK: ffff935d67e06300 CPU: 1 COMMAND: "db2sysc"
[0 00:00:30.778] [UN] PID: 3104 TASK: ffff935efa353180 CPU: 1 COMMAND: "db2sysc"
crash> set 3104
PID: 3104
COMMAND: "db2sysc"
TASK: ffff935efa353180 [THREAD_INFO: ffff935f2f5bc000]
CPU: 1
STATE: TASK_UNINTERRUPTIBLE
Let's check what is holding the fanotify type file handle, that is associated with the fsnotify_group, are "airlock" thread group:
crash> foreach files -R fanotify |head
PID: 1287 TASK: ffff935f1b518000 CPU: 1 COMMAND: "airlock"
ROOT: / CWD: /
FD FILE DENTRY INODE TYPE PATH
15 ffff935f30d97600 ffff935f0b7afe00 ffff935f3111f150 UNKN [fanotify]
PID: 1303 TASK: ffff935f1e1be300 CPU: 0 COMMAND: "airlock"
ROOT: / CWD: /
FD FILE DENTRY INODE TYPE PATH
15 ffff935f30d97600 ffff935f0b7afe00 ffff935f3111f150 UNKN [fanotify]
crash> foreach files -R fanotify | grep COMMAND | awk {'print $8'} | sort | uniq -c
13 "airlock"
Let's verify the 'airlock' task (1287):
crash> ps 1287
PID PPID CPU TASK ST %MEM VSZ RSS COMM
1287 1 1 ffff935f1b518000 IN 0.5 1563812 70372 airlock
crash> set 1287
PID: 1287
COMMAND: "airlock"
TASK: ffff935f1b518000 [THREAD_INFO: ffff935f1b52c000]
CPU: 1
STATE: TASK_INTERRUPTIBLE
crash> bt 1287
PID: 1287 TASK: ffff935f1b518000 CPU: 1 COMMAND: "airlock"
#0 [ffff935f1b52fc48] __schedule at ffffffff9c3b78d8
#1 [ffff935f1b52fcb0] schedule at ffffffff9c3b7ca9
#2 [ffff935f1b52fcc0] futex_wait_queue_me at ffffffff9bd19856
#3 [ffff935f1b52fd00] futex_wait at ffffffff9bd1a65b
#4 [ffff935f1b52fe48] do_futex at ffffffff9bd1c356
#5 [ffff935f1b52fed8] sys_futex at ffffffff9bd1c7a0
#6 [ffff935f1b52ff50] system_call_fastpath at ffffffff9c3c539a
RIP: 000055fafc966383 RSP: 00007ffee31cb1d0 RFLAGS: 00000293
RAX: 00000000000000ca RBX: 0000000000000000 RCX: 0000000000000004
RDX: 0000000000000000 RSI: 0000000000000080 RDI: 000055fafd89e2a8
RBP: 00007ffee31cc438 R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000286 R12: 00007ffee31cc448
R13: 0000000000000001 R14: 000055fafd89da20 R15: 00007f97506a2b10
ORIG_RAX: 00000000000000ca CS: 0033 SS: 002b
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments