pam_tally2 process consuming high cpu on RHEL5
Environment
Red Hat Enterprise Linux 5.9
pam-0.99.6.2-12.el5
Issue
The pam_tally2 process is using 80% cpu.
Resolution
The tallylog file format is used by a few utilities, and changing it will result in wide spread breakage. Bugs have been filed about this issue, but the consensus is that this file format is not going to change in RHEL6 or RHEL7 either.
Having said that, it is possible to workaround this issue as the pam_tally2 command is capable of seeking to any individual user's data without having to read the whole file, using pam_tally2's -u option. A simple shell script that avoids this problem, yet emulates the default pam_tally2 behaviour, using the pam_tally2's -u option is as follows:
#!/bin/bash
# root gets special treatment - just to print the header
pam_tally2 -u root
for i in `getent passwd|cut -f 1 -d :|grep -v root && # local users
ipa user-find|grep login | cut -f 5 -d " " ` # IPA users
do
pam_tally2 -u $i | grep -v Login
done
Root Cause
The tallylog file, used by pam_tally2 is a sparse file. It uses an offset of the uid of the user which makes it possible for the file to get very ‘large’:
root@nykpsr3752 PROD # ls -lh /var/log/tallylog
-rw------- 1 root root 5.4G Oct 4 12:29 /var/log/tallylog
root@host # du -sh /var/log/tallylog
40K /var/log/tallylog
A strace shows the amount of time spent in reading the tallylog:
root@host # strace -c pam_tally2
Login Failures Latest failure From
user1 1 07/21/12 22:54:36 unknown
user2 1 10/03/12 15:01:15 unknown
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.013377 0 1406423 read
0.00 0.000000 0 3 write
0.00 0.000000 0 12 open
0.00 0.000000 0 14 close
0.00 0.000000 0 6 stat
0.00 0.000000 0 14 fstat
0.00 0.000000 0 1 lseek
0.00 0.000000 0 23 mmap
0.00 0.000000 0 9 mprotect
0.00 0.000000 0 7 munmap
0.00 0.000000 0 3 brk
0.00 0.000000 0 1 1 access
0.00 0.000000 0 2 socket
0.00 0.000000 0 2 2 connect
0.00 0.000000 0 1 execve
0.00 0.000000 0 6 fcntl
0.00 0.000000 0 1 umask
0.00 0.000000 0 1 arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00 0.013377 1406529 3 total
This stems from the struct that faillog uses to update /var/log/faillog with failed user attempts:
The file is fixed length record, indexed by numerical UID.
The problem is that because IPA has large uidnumbers, a few failed logins for someone with uid 23983838 builds a large file which takes a significant amount of time to parse:
$ ls -lh /var/log/faillog
-rw------- 1 root root 15G Mar 12 17:28 /var/log/faillog
$ du -sh /var/log/faillog
36K /var/log/faillog
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
