Cron jobs are not running according to its schedule and with no error messages
Environment
- Red Hat Enterprise Linux (
RHEL
) 6 - cronie-1.4.4-16.el6_8.2
Dell
Authentication Services (a.k.aQAS
orVAS
)- vasclnt-4.1.0-21267.x86_64
- vasgp-4.1.0-21267.x86_64
or SMB
domain authentication
Issue
- Production system presenting random inconsistency with
cron
job execution without leaving traces on system logs. - Seeing
vasd
related errors on/var/log/messages
. - Observed also when executing jobs for
SMB
accounts
Resolution
Fix the authentication backend issue, e.g.
- Stop vasd
daemon;
- Monitor cron
jobs execution;
- Optionally start vasd
daemon.
There is a bugzilla 1630855 against cronie
requesting backport of patch fixing sensitivity to SIGPIPE
caught during authentication.
Root Cause
Before version 1.4.7
cronie
was known to be sensitive to SIGPIPE
. This was fixed by upstream commit ee4cbe7.
- System is experiencing failures in the PAM stack in interaction with
- the
VASd
(third party) caching daemon. For further analysis onVASd
issue, please engage theAuthentication Services
vendor support team.
or SMB
domain authentication backend
- the
Diagnostic Steps
- Review the
/var/log/messes
system log file forVASd
errors:
Sep 6 03:50:04 server .vgptool[61551]: [ERROR vashostaccess.cpp:107] Result: process wait_for failed: Error: No child processes#012 failed to wait for process: /opt/quest/libexec/vas/vasd/vasac_helper
Sep 6 06:50:19 server .vgptool[22573]: [ERROR vashostaccess.cpp:107] Result: process wait_for failed: Error: No child processes#012 failed to wait for process: /opt/quest/libexec/vas/vasd/vasac_helper
Sep 6 11:20:43 server .vgptool[43715]: [ERROR vashostaccess.cpp:107] Result: process wait_for failed: Error: No child processes#012 failed to wait for process: /opt/quest/libexec/vas/vasd/vasac_helper
- Run
crond
in debug mode:
# service crond stop
# strace -ttffo /tmp/crond.strace /usr/sbin/crond -x ext,sch,proc,pars,load,misc,test,bit
- Review the
strace
output generated files:
. . .
16:50:01.965611 open("/opt/quest/lib64/tls/x86_64/librt.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)
16:50:01.965661 stat("/opt/quest/lib64/tls/x86_64", 0x7fffbb237e90) = -1 ENOENT (No such file or directory)
16:50:01.965719 open("/opt/quest/lib64/tls/librt.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)
16:50:01.965768 stat("/opt/quest/lib64/tls", 0x7fffbb237e90) = -1 ENOENT (No such file or directory)
16:50:01.965811 open("/opt/quest/lib64/x86_64/librt.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)
16:50:01.965853 stat("/opt/quest/lib64/x86_64", 0x7fffbb237e90) = -1 ENOENT (No such file or directory)
16:50:01.965909 open("/opt/quest/lib64/librt.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)
16:50:01.965949 stat("/opt/quest/lib64", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
. . .
16:50:01.973671 connect(3, {sa_family=AF_FILE, path="/var/opt/quest/vas/vasd/.vasd40_ipc_sock"}, 110) = 0
16:50:01.973762 fcntl(3, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
16:50:01.973795 fcntl(3, F_SETFL, O_RDWR) = 0
16:50:01.973825 umask(0) = 022
16:50:01.973862 pipe([6, 7]) = 0
16:50:01.973896 poll([{fd=3, events=POLLPRI|POLLOUT}], 1, 4294967295) = 1 ([{fd=3, revents=POLLOUT}])
16:50:01.973935 sendmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{"\1", 1}], msg_controllen=24, {cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, {7}}, msg_flags=0}, MSG_NOSIGNAL) = 1
16:50:01.974020 poll([{fd=3, events=POLLOUT|POLLWRBAND}], 1, 4294967295) = 1 ([{fd=3, revents=POLLOUT|POLLWRBAND|POLLHUP}])
16:50:01.974103 write(3, "\3\0\0\0 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0VIPC\1\0\0\0\f\0\0\0", 32) = -1 EPIPE (Broken pipe)
16:50:01.974202 --- SIGPIPE (Broken pipe) @ 0 (0) ---
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments