'su -c' which is executed in shell script falls into 'T' status.
Environment
- Red Hat Enterprise Linux 5.5
- coreutils-5.97-23.el5_4.2.x86_64
Issue
- 'su -c' command which is executed in shell script via third party software falls into 'T' status and then the script processing stops.
2013-12-24 17:01:05 PID PPID USER TTY PR NI nFLT VIRT RES SHR nDRT WCHAN Flags S %CPU %MEM TIME P COMMAND
2013-12-24 17:01:05 22797 1 root ? 15 0 0 6316 2220 1520 0 stext ..4.214. S 0.0 0.0 0:00 1 /opt/jp1ajs2/bin/jajs_spmd
2013-12-24 17:01:05 22801 22797 root ? 18 0 0 11208 4032 3456 0 stext ..4..... S 0.0 0.0 0:00 1 jpqmon
2013-12-24 17:01:05 22802 22801 root ? 25 0 3 112m 5204 4000 0 184467440 ..4..1.. S 0.0 0.0 0:01 1 jpqagt 22801
2013-12-24 17:01:05 22804 22802 root ? 15 0 0 8088 3856 3284 0 stext ..4.2... S 0.0 0.0 0:00 0 jpqagtdmn 11 12 19 22 22802 10 1
2013-12-24 17:01:05 25246 22804 root ? 16 0 0 9296 4628 3752 0 wait ..4..... S 0.0 0.0 0:00 1 jpqagtchild 11 12 5 8 4947991 6087 10
2013-12-24 17:01:05 5013 25246 root ? 35 19 0 63856 1156 960 0 wait ..4.2... S 0.0 0.0 0:00 1 /bin/bash /crash/work/20131224/ZD_K_SWITCHDB.sh
2013-12-24 17:01:05 12196 5013 root ? 37 19 0 102m 1412 1096 0 finish_st ..4.21.. T 0.0 0.0 0:00 1 /bin/su -c date > /crash/work/20131224/out.log oracle1
- This phenomenon occurs only when the script is executed by a third party software and does not occur if it is executed the script directly.
-
We tried to collect strace information, but could not beecause it works without any problems during collection information with strace.
-
Does redhat know any possible cause? Also, could you please tell me what we should do to narrow down the cause?
Resolution
- Update coreutiles package to 5.97-23.el5_6.4 or later.
- An ERRATA has been released RHBA-2011:0188-1
* The "su" utility, which switches the user, does not return exit code of the
child process command, if the child process is terminated by a signal. Returned
exit code 0 - which means exit success - could be confusing for scripts. With
this updated package, correct exit code is returned, thus resolving the issue.
(BZ#672863)
Root Cause
- The cause was that kill() in line 643 was called although pid was -1.
Diagnostic Steps
- core file analysis
(gdb) bt
#0 0x00002b7fc0ccc6f7 in kill () from /lib64/libc.so.6
#1 0x00002b7fc002f5c8 in run_shell (shell=0x2b7fd16deb40 "/bin/bash",
command=<value optimized out>, additional_args=0x7fff18620fd8,
n_additional_args=<value optimized out>, pw=<value optimized out>)
at su.c:643
#2 0x00002b7fc002fec1 in main (argc=4, argv=0x7fff18620fb8) at su.c:919
(gdb) frame 1
#1 0x00002b7fc002f5c8 in run_shell (shell=0x2b7fd16deb40 "/bin/bash",
command=<value optimized out>, additional_args=0x7fff18620fd8,
n_additional_args=<value optimized out>, pw=<value optimized out>)
at su.c:643
643 kill(getpid(), SIGSTOP);
(gdb) list
638 int pid;
639
640 pid = waitpid(-1, &status, WUNTRACED);
641
642 if (WIFSTOPPED(status)) {
643 kill(getpid(), SIGSTOP);
644 /* once we get here, we must have resumed */
645 kill(pid, SIGCONT);
646 }
647 } while (WIFSTOPPED(status));
(gdb) p pid
$1 = -1
- The latest source code of
coreutils-5.97-34.el5_8.1
640 pid = waitpid(-1, &status, WUNTRACED);
641
642 if (((pid_t)-1 != pid) && (0 != WIFSTOPPED (status))) {
643 /* tcsh sends SIGTSTP to the process group, and so is already pending */
644 kill(getpid(), WSTOPSIG(status));
645 if (WSTOPSIG(status) != SIGSTOP) {
646 sigemptyset(&blockset);
647 sigaddset(&blockset, WSTOPSIG(status));
648 sigprocmask(SIG_UNBLOCK, &blockset, &ourset);
649 /* signal taken here */
650 sigprocmask(SIG_SETMASK, &ourset, NULL);
651 }
652 /* once we get here, we must have resumed */
653 kill(pid, SIGCONT);
654 }
655 } while (0 != WIFSTOPPED(status));
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments