Not able to start Auditd service with pid file exists error

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 7
  • audit

Issue

  • Auditd service can't be started with this error:
$ systemctl status auditd.service
● auditd.service - Security Auditing Service
   Loaded: loaded (/etc/systemd/system/auditd.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Fri 2020-09-11 13:21:51 CEST; 4s ago
     Docs: man:auditd(8)
           https://github.com/linux-audit/audit-documentation
  Process: 16697 ExecStart=/sbin/auditd (code=exited, status=1/FAILURE)

Sep 11 13:21:51 srv systemd[1]: Starting Security Auditing Service...
Sep 11 13:21:51 srv auditd[16698]: Started dispatcher: /sbin/audispd pid: 16700
Sep 11 13:21:51 srv auditd[16698]: Error setting audit daemon pid (File exists)
Sep 11 13:21:51 srv auditd[16698]: Unable to set audit pid, exiting
Sep 11 13:21:51 srv auditd[16698]: The audit daemon is exiting.
Sep 11 13:21:51 srv systemd[1]: auditd.service: control process exited, code=exited status=1
Sep 11 13:21:51 srv systemd[1]: Failed to start Security Auditing Service.
Sep 11 13:21:51 srv systemd[1]: Unit auditd.service entered failed state.
Sep 11 13:21:51 srv systemd[1]: auditd.service failed.
  • Although there is no pid file /var/run/auditd.pid.

Resolution

  • Another tool than auditd has been registered to the kernel.

  • This can be found with auditctl -s:

    # auditctl -s
        enabled 1
        failure 0
        => pid 9307
        rate_limit 10000
        backlog_limit 4096
        lost 12071
        backlog 0
        loginuid_immutable 0 unlocked
    
  • => the pid 9307 is registered on the NETLINK_AUDIT and is not auditd, thus /var/run/auditd.pid is empty:

    root       9307  2.2  0.4 462256 80052 ?        SNl  Aug26 874:02 /usr/bin/osqueryd
    
  • This tools is not provided by Red Hat, but we can found this information: (External Link !): https://osquery.readthedocs.io/en/stable/deployment/process-auditing/

    Linux process auditing
    
    osquery uses the Linux Audit System to collect and process audit events from the kernel. It accomplishes this by monitoring the execve() syscall. Auditd should not be running when using osquery's process auditing, as it will conflict with osqueryd over access to the audit netlink socket. You should also ensure auditd is not configured to start at boot.
    

    => Thus, you can't use auditd with osquery enabled.

  • Please disable osquery to be able to enable auditd.

Root Cause

  • Auditd is not able to register its pid to the kernel (audit_replace(requesting_pid)) because a pid is already registered by another tool.
  • The message File exists is about the PID present in the kernel
  • The source code corresponding on auditd daemon is:
/*
 * This function returns -1 on error and 1 on success.
 */
int audit_set_pid(int fd, uint32_t pid, rep_wait_t wmode)
{
        struct audit_status s;
        struct audit_reply rep;
        struct pollfd pfd[1];
        int rc;

        memset(&s, 0, sizeof(s));
        s.mask    = AUDIT_STATUS_PID;
        s.pid     = pid;
=>         rc = audit_send(fd, AUDIT_SET, &s, sizeof(s));  <=  it sends the struct s with AUDIT_STATUS_PID and the pid of auditd
        if (rc < 0) {
                audit_msg(audit_priority(errno),
=>                        "Error setting audit daemon pid (%s)", <= it receives an error as file exists
                        strerror(-rc));
                return rc;
        }
  • On the kernel side:
                if (s.mask & AUDIT_STATUS_PID) {
                        /* NOTE: we are using task_tgid_vnr() below because
                         *       the s.pid value is relative to the namespace
                         *       of the caller; at present this doesn't matter
                         *       much since you can really only run auditd
                         *       from the initial pid namespace, but something
                         *       to keep in mind if this changes */
                        int new_pid = s.pid;
                        pid_t requesting_pid = task_tgid_vnr(current);

                        if ((!new_pid) && (requesting_pid != audit_pid))
                                return -EACCES;
 =>                      if (audit_pid && new_pid &&
                            audit_replace(requesting_pid) != -ECONNREFUSED)
 =>                               return -EEXIST;                       <= the return message EEXIST
                        if (audit_enabled != AUDIT_OFF)
                                audit_log_config_change("audit_pid", new_pid, audit_pid, 1);
                        audit_pid = new_pid;
                        audit_nlk_portid = NETLINK_CB(skb).portid;
                }

Diagnostic Steps

  • in audit log (/var/log/audit/audit.log), we can see that the set-pid is failing:
type=DAEMON_START msg=audit(1600773483.093:8108): op=start ver=2.8.5 format=raw kernel=3.10.0-862.6.3.el7.x86_64 auid=4294967295 pid=73492 uid=0 ses=4294967295 res=success
type=DAEMON_ABORT msg=audit(1600773483.093:8109): op=set-pid auid=4294967295 pid=73492 uid=0 ses=4294967295 res=failed
  • If we strace the process, we can see that auditd is able to open and write the pid file:
76036 1600774118.371816 socket(AF_NETLINK, SOCK_RAW, NETLINK_AUDIT) = 3<socket:[681464412]> <0.000027>
76036 1600774118.371889 fcntl(3<socket:[681464412]>, F_SETFD, FD_CLOEXEC) = 0 <0.000030>
76036 1600774118.371970 open("/var/run/auditd.pid", O_WRONLY|O_CREAT|O_TRUNC|O_NOFOLLOW, 0644) = 4</run/auditd.pid> <0.000040>
76036 1600774118.372062 write(4</run/auditd.pid>, "76036\n", 6) = 6 <0.000032>
76036 1600774118.372140 close(4</run/auditd.pid>) = 0 <0.000024>
  • But when it tries to send the info to the kernel through the socket, it gets an error:
76036 1600774118.374116 sendto(3<socket:[681464412]>, "\20\0\0\0\350\3\5\0\1\0\0\0\0\0\0\0", 16, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 16 <0.000088>
76036 1600774118.374263 poll([{fd=3<socket:[681464412]>, events=POLLIN}], 1, 500) = 1 ([{fd=3, revents=POLLIN}]) <0.000026>
76036 1600774118.374352 recvfrom(3<socket:[681464412]>, "$\0\0\0\2\0\0\0\1\0\0\0\4)\1\0\0\0\0\0\20\0\0\0\350\3\5\0\1\0\0\0\0\0\0\0", 8988, MSG_PEEK|MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36 <0.000027>
76036 1600774118.374432 recvfrom(3<socket:[681464412]>, "$\0\0\0\2\0\0\0\1\0\0\0\4)\1\0\0\0\0\0\20\0\0\0\350\3\5\0\1\0\0\0\0\0\0\0", 8988, MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36 <0.000027>
76036 1600774118.374523 poll([{fd=3<socket:[681464412]>, events=POLLIN}], 1, 100) = 1 ([{fd=3, revents=POLLIN}]) <0.000028>
76036 1600774118.374602 recvfrom(3<socket:[681464412]>, "4\0\0\0\350\3\0\0\1\0\0\0\4)\1\0\0\0\0\0\1\0\0\0\0\0\0\0[$\0\0\20'\0\0\0\20\0\0'/\0\0\0\0\0\0=\0\0\0", 8988, MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 52 <0.000023>
76036 1600774118.374684 sendto(3<socket:[681464412]>, "4\0\0\0\351\3\5\0\2\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 52, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 52 <0.000041>
76036 1600774118.374779 poll([{fd=3<socket:[681464412]>, events=POLLIN}], 1, 500) = 1 ([{fd=3, revents=POLLIN}]) <0.000025>
76036 1600774118.374852 recvfrom(3<socket:[681464412]>, "$\0\0\0\2\0\0\0\2\0\0\0\4)\1\0\0\0\0\0004\0\0\0\351\3\5\0\2\0\0\0\0\0\0\0", 8988, MSG_PEEK|MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36 <0.000025>
76036 1600774118.374928 recvfrom(3<socket:[681464412]>, "$\0\0\0\2\0\0\0\2\0\0\0\4)\1\0\0\0\0\0004\0\0\0\351\3\5\0\2\0\0\0\0\0\0\0", 8988, MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36 <0.000020>
76036 1600774118.375006 sendto(3<socket:[681464412]>, "4\0\0\0\351\3\5\0\3\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\4)\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 52, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 52 <0.000069>
76036 1600774118.375131 poll([{fd=3<socket:[681464412]>, events=POLLIN}], 1, 500) = 1 ([{fd=3, revents=POLLIN}]) <0.000023>
76036 1600774118.375216 recvfrom(3<socket:[681464412]>, "H\0\0\0\2\0\0\0\3\0\0\0\4)\1\0\357\377\377\3774\0\0\0\351\3\5\0\3\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\4)\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 8988, MSG_PEEK|MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 72 <0.000026>
76036 1600774118.375302 recvfrom(3<socket:[681464412]>, "H\0\0\0\2\0\0\0\3\0\0\0\4)\1\0\357\377\377\3774\0\0\0\351\3\5\0\3\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\4)\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 8988, MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 72 <0.000028>
76036 1600774118.375424 write(2</dev/pts/0>, "Error setting audit daemon pid (File exists)", 44) = 44 <0.000029>
76036 1600774118.375503 write(2</dev/pts/0>, "\n", 1) = 1 <0.000040>
  • Even if there is no pid file /var/run/auditd.pid.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

4 Comments

auditctl needs flags to give the output listed in the kb. You're looking for:

auditctl -s

It would be helpful to include the commands to find the logs that are mentioned in the article. Specifically, the log outputs in the diagnostic steps.

Thank you for your feedback, the log file is /var/log/audit/audit.log.

Actually, I was talking about the strace commands to find the information, but that might be out of the scope of this article.