Ksh scripts segfault in job lob list code

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux (RHEL) 6
  • ksh 20100621-16.el6
  • ksh 20100621-12.el6_2.1
  • ksh 20100621-6.el6

Issue

  • We have seen a number of segmentation faults being hit by ksh scripts. The relevant logs in /var/log/messages are similar to the below:

    kernel: scriptname[12476]: segfault at 38 ip 0000000000422520 sp 00007fff12e5e4d0 error 4 in ksh93[400000+13a000]
    
  • Backtraces of the crashed ksh differs from crash to crash:

    #0  0x00000000004238cb in job_subrestore (ptr=0x1787d10) at /usr/src/debug/ksh-20100621/src/cmd/ksh93/sh/jobs.c:1820
    #1  0x0000000000449746 in sh_subshell (t=0x17a04a0, flags=5, comsub=1)
    at /usr/src/debug/ksh-20100621/src/cmd/ksh93/sh/subshell.c:634
    ...
    
    #0  bestfree (vm=0x74a5a0, data=0x2848000) at /usr/src/debug/ksh-20100621/src/lib/libast/vmalloc/vmbest.c:883
    #1  0x00000000004225bc in job_chksave (pid=<value optimized out>) at /usr/src/debug/ksh-20100621/src/cmd/ksh93/sh/jobs.c:1793
    ...
    
    #0  job_chksave (pid=12555) at /usr/src/debug/ksh-20100621/src/cmd/ksh93/sh/jobs.c:1763
    #1  0x00000000004225f4 in jobsave_create (pid=12555) at /usr/src/debug/ksh-20100621/src/cmd/ksh93/sh/jobs.c:236
    ...
    

Resolution

Workarounds

  • Try running the script(s) with the following taskset/nice configuration (NOTE: The command below will restrict the process to CPU 3)

    # taskset -c 3 nice -n19  SCRIPTS_AND_ARGUMENTS
    
  • Customer reported that when the scripts were run with increased priority, such as nice -n19, and influencing its CPU affinities using cgroups (cgrulesengd in this case) the segfaults were no longer seen.

Root Cause

Due to a race condition in the job list code, the ksh shell could terminate
unexpectedly with a segmentation fault when the user had run custom scripts on
their system. With this update, the race condition has been fixed, and
segmentation faults in ksh no longer occur.

Diagnostic Steps

In this case at least three different stack traces were observed.

(gdb) bt
#0  0x00000000004238cb in job_subrestore (ptr=0x1787d10) at /usr/src/debug/ksh-20100621/src/cmd/ksh93/sh/jobs.c:1820
#1  0x0000000000449746 in sh_subshell (t=0x17a04a0, flags=5, comsub=1)
    at /usr/src/debug/ksh-20100621/src/cmd/ksh93/sh/subshell.c:634
#2  0x00000000004303b5 in comsubst (mp=0x1725110, t=<value optimized out>, type=<value optimized out>)
    at /usr/src/debug/ksh-20100621/src/cmd/ksh93/sh/macro.c:2007

(gdb) bt
#0  bestfree (vm=0x74a5a0, data=0x2848000) at /usr/src/debug/ksh-20100621/src/lib/libast/vmalloc/vmbest.c:883
#1  0x00000000004225bc in job_chksave (pid=<value optimized out>) at /usr/src/debug/ksh-20100621/src/cmd/ksh93/sh/jobs.c:1793
#2  0x0000000000424b7f in job_post (pid=16538, join=<value optimized out>)
    at /usr/src/debug/ksh-20100621/src/cmd/ksh93/sh/jobs.c:1225

(gdb) bt
#0  job_chksave (pid=12555) at /usr/src/debug/ksh-20100621/src/cmd/ksh93/sh/jobs.c:1763
#1  0x00000000004225f4 in jobsave_create (pid=12555) at /usr/src/debug/ksh-20100621/src/cmd/ksh93/sh/jobs.c:236
#2  0x00000000004235b2 in job_reap (sig=17) at /usr/src/debug/ksh-20100621/src/cmd/ksh93/sh/jobs.c:330
  • Component
  • ksh

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments