How do I use strace to trace system calls made by a command?

Updated -

Table of Contents

Introduction

The command strace is used to trace system calls and signals. It is a great tool to use when debugging an application and is often requested by Red Hat support engineers to examine certain issues which may arise within the operating system and its programs.

Usage

  • To record the system calls issued by running command and output trace information to a file called /tmp/strace.txt, run the following:

    • RHEL 6.7+ and RHEL 7 with strace 4.7 or above:
    # strace -fvttTyy -s 4096 -o /tmp/strace.txt command
    
    • Older RHEL releases with strace below version 4.7. This includes all releases of RHEL 4, RHEL 5, and RHEL 6.6 or below:
    # strace -fvttT -s 4096 -o /tmp/strace.txt command
    
  • You may also attach to an already-running process with the -p option and the PID of the process as follows:

    • RHEL 6.7+ and RHEL 7 with strace 4.7 or above:
    # strace -fvttTyy -s 4096 -o /tmp/strace.txt -p PID
    
    • Older RHEL releases with strace below version 4.7. This includes all releases of RHEL 4, RHEL 5, and RHEL 6.6 or below:
    # strace -fvttT -s 4096 -o /tmp/strace.txt -p PID
    

    (where PID is replaced with the PID number of the process)

Options

The options used above are added to provide more accuracy and completeness to traces.

The options explained as follows:

  • -f causes child processes to be captured into the same file, separate processes are prepended with their PID number in the trace output
  • -v prints unabbreviated versions of environment, stat, and similar calls
  • -tt prints the start time of each line, including microseconds
  • -T shows the time between the beginning and end of the system call
  • -yy prints paths associated with file descriptors and sockets (available in strace 4.7 and above)
  • -s 4096 prints the first 4096 characters of any strings, the default value of 32 often causes useful information to be missed
  • -o /tmp/strace.txt outputs the trace to a text file for retrospective analysis, as the output is often large with long lines which makes live analysis difficult to impossible
  • -e trace=set allows for the tracing of specific system calls, i.e. -e trace=open,close,read,write, instead of the default of -e trace=all

Performance Analysis

  • strace can also be used to provide a count and call-time summary of all system calls over the data collection period.

  • This mode is enabled with the -c flag.

  • Output follows the format of this example:

# strace -c -p PID
Process PID attached - interrupt to quit
^CProcess PID detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.010116          66       153           select
  0.00    0.000000           0       165           gettimeofday
------ ----------- ----------- --------- --------- ----------------
100.00    0.010116                   318           total

Compressing Output

  • In the case of very verbose output, it may be desirable to compress the trace output on the fly.

  • This can be done with the following command in the bash shell.

# strace -fvttTyy -s 4096 command 3>&1 1>&2 2>&3 3>&- | gzip -9c > strace.txt.gz
  • If this redirection does not work in other shells like ksh, zsh, fish, or csh, this alternate command using named pipes also works:
# mkfifo /tmp/catcher; cat /tmp/catcher | gzip -9c > /tmp/strace.txt.gz & strace -fvttTyy -s 4096 -o /tmp/catcher command
  • Remove named pipe when finished.
# rm -f /tmp/catcher
  • In both of these examples, an already-running process can be traced by replacing command with -p PID.

Looping Collection

strace itself does not have the capability to "roll over" into multiple log files, however a script can do this, as described at:

Note that depending on the issue being investigated, rolling over the strace file may remove a file descriptor creation and make the issue more difficult or impossible to troubleshoot. The linked script should only be used in situations where an FD is open for a shorter amount of time than the loop run.

1 Comments

We can gather the information but this is not giving us any hope in a resolution. Can yall provide any insite that would be useful to report back to the team or ideas on a cause? T

Since we getting the message after 5pm our time obviously we will gather and send tomorrow.

Thanks

Donny