How to track who/what is sending SIGKILL to a process?
Environment
o Red Hat Enterprise Linux 4
o Red Hat Enterprise Linux 5
o Red Hat Enterprise Linux 6
o Red Hat Enterprise Linux 7
o Red Hat Enterprise Linux 8
o Red Hat Enterprise Linux 9
Issue
We are investigating an issue with processes that are suddenly dying, and we have determined that the process receives a SIGKILL
signal.
However, there is no log message explaining more about the reason for the kill. How can we check who sends the kill signal to the process?
# less /var/log/messages
Jul 5 11:44:45 RHEL9 systemd[1]: httpd.service: Main process exited, code=killed, status=9/KILL
Jul 5 11:44:45 RHEL9 systemd[1]: httpd.service: Killing process 15201 (httpd) with signal SIGKILL.
Jul 5 11:44:45 RHEL9 systemd[1]: httpd.service: Killing process 15202 (httpd) with signal SIGKILL.
Jul 5 11:44:45 RHEL9 systemd[1]: httpd.service: Killing process 15203 (httpd) with signal SIGKILL.
[..]
Jul 5 11:44:45 RHEL9 systemd[1]: httpd.service: Killing process 15413 (n/a) with signal SIGKILL.
Jul 5 11:44:45 RHEL9 systemd[1]: httpd.service: Killing process 15414 (n/a) with signal SIGKILL.
Jul 5 11:44:45 RHEL9 systemd[1]: httpd.service: Killing process 15415 (n/a) with signal SIGKILL.
Jul 5 11:44:45 RHEL9 systemd[1]: httpd.service: Failed with result 'signal'.
Resolution
Using audit
It's possible to configure an audit rule for capturing kill signals which helps us to identify which user has initiated the signal.
The rule should be something like this:
# auditctl -a exit,always -F arch=b64 -F a1=9 -S kill
# auditctl -a exit,always -F arch=b64 -F a1=9 -S tkill
# auditctl -a exit,always -F arch=b64 -F a2=9 -S tgkill
# auditctl -a exit,always -F arch=b32 -F a1=9 -S kill
# auditctl -a exit,always -F arch=b32 -F a1=9 -S tkill
# auditctl -a exit,always -F arch=b32 -F a2=9 -S tgkill
How to use audit to monitor a specific SYSCALL?
Using SystemTap
What is systemtap
?
SystemTap is a tracing and probing tool that allows users to study and monitor the activities of the operating system (particularly, the kernel) in fine detail. It provides information similar to the output of tools like netstat, ps, top, and iostat; however, SystemTap is designed to provide more filtering and analysis options for collected information.
How to monitor kill signal 2, 9 and 15 using systemtap script?
Using bcc-tools
bcc-tools
is available from RHEL 7.6 or later, 8, and 9. The package contains a lot of pre-built tracing scripts, that includes killsnoop
script to trace what process call kill()
syscalls.
Diagnostic Steps
We have the HTTPD service
running with the PID 15200
.
# systemctl status httpd.service
● httpd.service - The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2022-07-05 11:32:43 IST; 10min ago
Docs: man:httpd.service(8)
Main PID: 15200 (httpd)
Status: "Total requests: 0; Idle/Busy workers 100/0;Requests/sec: 0; Bytes served/sec: 0 B/sec"
Tasks: 213 (limit: 11106)
Memory: 28.9M
CPU: 596ms
CGroup: /system.slice/httpd.service
├─15200 /usr/sbin/httpd -DFOREGROUND
├─15201 /usr/sbin/httpd -DFOREGROUND
├─15202 /usr/sbin/httpd -DFOREGROUND
├─15203 /usr/sbin/httpd -DFOREGROUND
└─15204 /usr/sbin/httpd -DFOREGROUND
Jul 05 11:32:42 RHEL9.example.com systemd[1]: Starting The Apache HTTP Server...
Jul 05 11:32:43 RHEL9.example.com httpd[15200]: Server configured, listening on: port 80
Jul 05 11:32:43 RHEL9.example.com systemd[1]: Started The Apache HTTP Server.
[root@RHEL9 ~]#
[root@RHEL9 ~]#
Now try to kill a process with the killall
command.
# killall -9 <procname>
[root@RHEL9 ~]# killall -9 httpd
[root@RHEL9 ~]#
[root@RHEL9 ~]# systemctl status httpd.service
× httpd.service - The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
Active: failed (Result: signal) since Tue 2022-07-05 11:44:45 IST; 2s ago
Docs: man:httpd.service(8)
Process: 15200 ExecStart=/usr/sbin/httpd $OPTIONS -DFOREGROUND (code=killed, signal=KILL)
Main PID: 15200 (code=killed, signal=KILL)
Status: "Total requests: 0; Idle/Busy workers 100/0;Requests/sec: 0; Bytes served/sec: 0 B/sec"
CPU: 655ms
Jul 05 11:44:45 RHEL9.example.com systemd[1]: httpd.service: Killing process 15407 (n/a) with signal SIGKILL.
Jul 05 11:44:45 RHEL9.example.com systemd[1]: httpd.service: Killing process 15408 (n/a) with signal SIGKILL.
Jul 05 11:44:45 RHEL9.example.com systemd[1]: httpd.service: Killing process 15409 (n/a) with signal SIGKILL.
Jul 05 11:44:45 RHEL9.example.com systemd[1]: httpd.service: Killing process 15410 (n/a) with signal SIGKILL.
Jul 05 11:44:45 RHEL9.example.com systemd[1]: httpd.service: Killing process 15411 (n/a) with signal SIGKILL.
Jul 05 11:44:45 RHEL9.example.com systemd[1]: httpd.service: Killing process 15412 (n/a) with signal SIGKILL.
Jul 05 11:44:45 RHEL9.example.com systemd[1]: httpd.service: Killing process 15413 (n/a) with signal SIGKILL.
Jul 05 11:44:45 RHEL9.example.com systemd[1]: httpd.service: Killing process 15414 (n/a) with signal SIGKILL.
Jul 05 11:44:45 RHEL9.example.com systemd[1]: httpd.service: Killing process 15415 (n/a) with signal SIGKILL.
Jul 05 11:44:45 RHEL9.example.com systemd[1]: httpd.service: Failed with result 'signal'.
[root@RHEL9 ~]#
This produces an output in /var/log/audit/audit.log
like the following:
time->Tue Jul 5 11:44:45 2022
type=PROCTITLE msg=audit(1657001685.727:558): proctitle=6B696C6C616C6C002D39006874747064
type=OBJ_PID msg=audit(1657001685.727:558): opid=15200 oauid=-1 ouid=0 oses=-1 obj=system_u:system_r:httpd_t:s0 ocomm="httpd"
type=SYSCALL msg=audit(1657001685.727:558): arch=c000003e syscall=62 success=yes exit=0 a0=3b60 a1=9 a2=0 a3=a items=0 ppid=1264 pid=15588 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=3 comm="killall" exe="/usr/bin/killall" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null)
Where:
syscall=62
represents the code of the syscall intercepted by the audit which is a sys_kill
.
pid=15588
is the PID of the process doing the call of the sys_kill
function to kill the HTTPD service
.
Soruce code
Let's try to understand from the source how things are running in the background.
o SYSCALL_DEFINE2
function accepts the signal and process ID which needs to be killed and then it is passing the argument to the function prepare_kill_siginfo
and later on returning to the function kill_something_info
where PID is first to get validated and calling the final function kill_proc_info
to kill the process.
3755 static inline void prepare_kill_siginfo(int sig, struct kernel_siginfo *info)
3756 {
3757 clear_siginfo(info);
3758 info->si_signo = sig; <<===
3759 info->si_errno = 0;
3760 info->si_code = SI_USER;
3761 info->si_pid = task_tgid_vnr(current); <<==
3762 info->si_uid = from_kuid_munged(current_user_ns(), current_uid());
3763 }
3764 /**
3765 * sys_kill - send a signal to a process
3766 * @pid: the PID of the process
3767 * @sig: signal to be sent
3768 */
3769 SYSCALL_DEFINE2(kill, pid_t, pid, int, sig)
3770 {
3772 struct kernel_siginfo info;
3773
3774 prepare_kill_siginfo(sig, &info); <<== (parsing the arguments)
3775
3776 return kill_something_info(sig, &info, pid); <<=== (returning to the function)
3777 }
1586 static int kill_something_info(int sig, struct kernel_siginfo *info, pid_t pid)
1587 {
1587 int ret;
1588
1588 if (pid > 0) <<===
1589 return kill_proc_info(sig, info, pid); <<== (retuning to the function)
1590
1591 /* -INT_MIN is undefined. Exclude this case to avoid a UBSAN warning */
1592 if (pid == INT_MIN)
1593 return -ESRCH;
1594
1595 read_lock(&tasklist_lock);
1490 static int kill_proc_info(int sig, struct kernel_siginfo *info, pid_t pid)
1491 {
1492 int error;
1493 rcu_read_lock();
1494 error = kill_pid_info(sig, info, find_vpid(pid));
1495 rcu_read_unlock();
1496 return error;
1497 }
A list of the available syscalls and their numbers on RHEL4 are contained respectively in:
/usr/src/kernels/<version>/include/asm-i386/unistd.h
for i386 systems
/usr/src/kernels/<version>/include/asm-x86_64/unistd.h
for x86_64 systems
The above files are included in the kernel-devel rpm package.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments