'su -c' which is executed in shell script falls into 'T' status.

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux 5.5
  • coreutils-5.97-23.el5_4.2.x86_64

Issue

  • 'su -c' command which is executed in shell script via third party software falls into 'T' status and then the script processing stops.
2013-12-24 17:01:05   PID  PPID USER     TTY       PR  NI nFLT  VIRT  RES  SHR nDRT WCHAN     Flags    S %CPU %MEM   TIME P COMMAND
2013-12-24 17:01:05 22797     1 root     ?         15   0    0  6316 2220 1520    0 stext     ..4.214. S  0.0  0.0   0:00 1 /opt/jp1ajs2/bin/jajs_spmd
2013-12-24 17:01:05 22801 22797 root     ?         18   0    0 11208 4032 3456    0 stext     ..4..... S  0.0  0.0   0:00 1 jpqmon
2013-12-24 17:01:05 22802 22801 root     ?         25   0    3  112m 5204 4000    0 184467440 ..4..1.. S  0.0  0.0   0:01 1 jpqagt 22801
2013-12-24 17:01:05 22804 22802 root     ?         15   0    0  8088 3856 3284    0 stext     ..4.2... S  0.0  0.0   0:00 0 jpqagtdmn 11 12 19 22 22802 10 1
2013-12-24 17:01:05 25246 22804 root     ?         16   0    0  9296 4628 3752    0 wait      ..4..... S  0.0  0.0   0:00 1 jpqagtchild 11 12 5 8 4947991 6087 10  
2013-12-24 17:01:05  5013 25246 root     ?         35  19    0 63856 1156  960    0 wait      ..4.2... S  0.0  0.0   0:00 1 /bin/bash /crash/work/20131224/ZD_K_SWITCHDB.sh
2013-12-24 17:01:05 12196  5013 root     ?         37  19    0  102m 1412 1096    0 finish_st ..4.21.. T  0.0  0.0   0:00 1 /bin/su -c date > /crash/work/20131224/out.log oracle1   
  • This phenomenon occurs only when the script is executed by a third party software and does not occur if it is executed the script directly.
  • We tried to collect strace information, but could not beecause it works without any problems during collection information with strace.

  • Does redhat know any possible cause? Also, could you please tell me what we should do to narrow down the cause?

Resolution

  • Update coreutiles package to 5.97-23.el5_6.4 or later.
  • An ERRATA has been released RHBA-2011:0188-1
* The "su" utility, which switches the user, does not return exit code of the
child process command, if the child process is terminated by a signal. Returned
exit code 0 - which means exit success - could be confusing for scripts. With
this updated package, correct exit code is returned, thus resolving the issue.
(BZ#672863)

Root Cause

  • The cause was that kill() in line 643 was called although pid was -1.

Diagnostic Steps

  • core file analysis
(gdb) bt
#0  0x00002b7fc0ccc6f7 in kill () from /lib64/libc.so.6
#1  0x00002b7fc002f5c8 in run_shell (shell=0x2b7fd16deb40 "/bin/bash", 
    command=<value optimized out>, additional_args=0x7fff18620fd8, 
    n_additional_args=<value optimized out>, pw=<value optimized out>)
    at su.c:643
#2  0x00002b7fc002fec1 in main (argc=4, argv=0x7fff18620fb8) at su.c:919
(gdb) frame 1
#1  0x00002b7fc002f5c8 in run_shell (shell=0x2b7fd16deb40 "/bin/bash", 
    command=<value optimized out>, additional_args=0x7fff18620fd8, 
    n_additional_args=<value optimized out>, pw=<value optimized out>)
    at su.c:643
643               kill(getpid(), SIGSTOP);
(gdb) list
638           int pid;
639
640           pid = waitpid(-1, &status, WUNTRACED);
641
642           if (WIFSTOPPED(status)) {
643               kill(getpid(), SIGSTOP);
644               /* once we get here, we must have resumed */
645               kill(pid, SIGCONT);
646           }
647         } while (WIFSTOPPED(status));
(gdb) p pid
$1 = -1
  • The latest source code of coreutils-5.97-34.el5_8.1
640       pid = waitpid(-1, &status, WUNTRACED);
641 
642       if (((pid_t)-1 != pid) && (0 != WIFSTOPPED (status))) {
643       /* tcsh sends SIGTSTP to the process group, and so is already pending */
644           kill(getpid(),  WSTOPSIG(status));
645           if (WSTOPSIG(status) != SIGSTOP) {
646             sigemptyset(&blockset);
647             sigaddset(&blockset, WSTOPSIG(status));
648             sigprocmask(SIG_UNBLOCK, &blockset, &ourset);
649             /* signal taken here */
650             sigprocmask(SIG_SETMASK, &ourset, NULL);
651           }
652           /* once we get here, we must have resumed */
653           kill(pid, SIGCONT);
654       }
655     } while (0 != WIFSTOPPED(status));

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments