Ksh scripts segfault in job lob list code
Environment
- Red Hat Enterprise Linux (RHEL) 6
- ksh 20100621-16.el6
- ksh 20100621-12.el6_2.1
- ksh 20100621-6.el6
Issue
-
We have seen a number of segmentation faults being hit by
ksh
scripts. The relevant logs in/var/log/messages
are similar to the below:kernel: scriptname[12476]: segfault at 38 ip 0000000000422520 sp 00007fff12e5e4d0 error 4 in ksh93[400000+13a000]
-
Backtraces of the crashed
ksh
differs from crash to crash:#0 0x00000000004238cb in job_subrestore (ptr=0x1787d10) at /usr/src/debug/ksh-20100621/src/cmd/ksh93/sh/jobs.c:1820 #1 0x0000000000449746 in sh_subshell (t=0x17a04a0, flags=5, comsub=1) at /usr/src/debug/ksh-20100621/src/cmd/ksh93/sh/subshell.c:634 ...
#0 bestfree (vm=0x74a5a0, data=0x2848000) at /usr/src/debug/ksh-20100621/src/lib/libast/vmalloc/vmbest.c:883 #1 0x00000000004225bc in job_chksave (pid=<value optimized out>) at /usr/src/debug/ksh-20100621/src/cmd/ksh93/sh/jobs.c:1793 ...
#0 job_chksave (pid=12555) at /usr/src/debug/ksh-20100621/src/cmd/ksh93/sh/jobs.c:1763 #1 0x00000000004225f4 in jobsave_create (pid=12555) at /usr/src/debug/ksh-20100621/src/cmd/ksh93/sh/jobs.c:236 ...
Resolution
- RHEL 6.6.0: Upgrade to
ksh-20120801-21.el6
from Errata RHBA-2014-1381 - RHEL 6.5.z: Upgrade to
ksh-20120801-10.el6_5.5
from Errata RHBA-2014:0521
An update to ksh was recently with the errata below: - RHEL 6.7 Upgrade to
ksh-20120801-28.el6
from Errata RHBA-2015:1450
Workarounds
-
Try running the script(s) with the following taskset/nice configuration (NOTE: The command below will restrict the process to CPU 3)
# taskset -c 3 nice -n19 SCRIPTS_AND_ARGUMENTS
-
Customer reported that when the scripts were run with increased priority, such as
nice -n19
, and influencing its CPU affinities using cgroups (cgrulesengd in this case) the segfaults were no longer seen.
Root Cause
Due to a race condition in the job list code, the ksh
shell could terminate
unexpectedly with a segmentation fault when the user had run custom scripts on
their system. With this update, the race condition has been fixed, and
segmentation faults in ksh
no longer occur.
Diagnostic Steps
In this case at least three different stack traces were observed.
(gdb) bt
#0 0x00000000004238cb in job_subrestore (ptr=0x1787d10) at /usr/src/debug/ksh-20100621/src/cmd/ksh93/sh/jobs.c:1820
#1 0x0000000000449746 in sh_subshell (t=0x17a04a0, flags=5, comsub=1)
at /usr/src/debug/ksh-20100621/src/cmd/ksh93/sh/subshell.c:634
#2 0x00000000004303b5 in comsubst (mp=0x1725110, t=<value optimized out>, type=<value optimized out>)
at /usr/src/debug/ksh-20100621/src/cmd/ksh93/sh/macro.c:2007
(gdb) bt
#0 bestfree (vm=0x74a5a0, data=0x2848000) at /usr/src/debug/ksh-20100621/src/lib/libast/vmalloc/vmbest.c:883
#1 0x00000000004225bc in job_chksave (pid=<value optimized out>) at /usr/src/debug/ksh-20100621/src/cmd/ksh93/sh/jobs.c:1793
#2 0x0000000000424b7f in job_post (pid=16538, join=<value optimized out>)
at /usr/src/debug/ksh-20100621/src/cmd/ksh93/sh/jobs.c:1225
(gdb) bt
#0 job_chksave (pid=12555) at /usr/src/debug/ksh-20100621/src/cmd/ksh93/sh/jobs.c:1763
#1 0x00000000004225f4 in jobsave_create (pid=12555) at /usr/src/debug/ksh-20100621/src/cmd/ksh93/sh/jobs.c:236
#2 0x00000000004235b2 in job_reap (sig=17) at /usr/src/debug/ksh-20100621/src/cmd/ksh93/sh/jobs.c:330
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments