3.9. Using the ftrace Utility for Tracing Latencies

One of the diagnostic facilities provided with the Red Hat Enterprise Linux for Real Time kernel is ftrace, which is used by developers to analyze and debug latency and performance issues that occur outside of user-space. The ftrace utility has a variety of options that allow you to use the utility in a number of different ways. It can be used to trace context switches, measure the time it takes for a high-priority task to wake up, the length of time interrupts are disabled, or list all the kernel functions executed during a given period.
Some tracers, such as the function tracer, will produce exceedingly large amounts of data, which can turn trace log analysis into a time-consuming task. However, it is possible to instruct the tracer to begin and end only when the application reaches critical code paths.
The ftrace utility can be set up once the trace variant of the Red Hat Enterprise Linux for Real Time kernel is installed and in use.

Procedure 3.3. Using the ftrace Utility

  1. In the /sys/kernel/debug/tracing/ directory, there is a file named available_tracers. This file contains all the available tracers for ftrace. To see the list of available tracers, use the cat command to view the contents of the file:
    ~]# cat /sys/kernel/debug/tracing/available_tracers
    function_graph wakeup_rt wakeup preemptirqsoff preemptoff irqsoff function nop
    
    The user interface for ftrace is a series of files within debugfs. The ftrace files are also located in the /sys/kernel/debug/tracing/ directory. Enter it:
    ~]# cd /sys/kernel/debug/tracing
    The files in this directory can only be modified by theroot user, as enabling tracing can have an impact on the performance of the system.
    Ftrace Files

    The main files within this directory are:

    trace
    The file that shows the output of a ftrace trace. This is really a snapshot of the trace in time, as it stops tracing as this file is read, and it does not consume the events read. That is, if the user disabled tracing and read this file, it will always report the same thing every time its read.
    trace_pipe
    Like "trace" but is used to read the trace live. It is a producer / consumer trace, where each read will consume the event that is read. But this can be used to see an active trace without stopping the trace as it is read.
    available_tracers
    A list of ftrace tracers that have been compiled into the kernel.
    current_tracer
    Enables or disables a ftrace tracer.
    events
    A directory that contains events to trace and can be used to enable or disable events as well as set filters for the events.
    tracing_on
    Disable and enable recording to the ftrace buffer. Disabling tracing via the tracing_on file does not disable the actual tracing that is happening inside the kernel. It only disables writing to the buffer. The work to do the trace still happens, but the data does not go anywhere.
    Tracers

    Depending on how the kernel was configured, not all tracers may be available for a given kernel. For the Red Hat Enterprise Linux for Real Time kernels, the trace and debug kernels have different tracers than the production kernel does. This is because some of the tracers have a noticeable overhead when the tracer is configured into the kernel but not active. Those tracers are only enabled for the trace and debug kernels.

    function
    One of the most widely applicable tracers. Traces the function calls within the kernel. Can cause noticeable overhead depending on the quantity of functions traced. Creates little overhead when not active.
    function_graph
    The function_graph tracer is designed to present results in a more visually appealing format. This tracer also traces the exit of the function, displaying a flow of function calls in the kernel.
    Note that this tracer has more overhead than the function tracer when enabled, but the same low overhead when disabled.
    wakeup
    A full CPU tracer that reports the activity happening across all CPUs. Records the time that it takes to wake up the highest priority task in the system, whether that task is a real time task or not. Recording the max time it takes to wake up a non real time task will hide the times it takes to wake up a real time task.
    wakeup_rt
    A full CPU tracer that reports the activity happening across all CPUs. Records the time that it takes from the current highests priority task to wake up to the time it is scheduled. Only records the time for real time tasks.
    preemptirqsoff
    Traces the areas that disable pre-emption or interrupts, and records the maximum amount of time for which pre-emption or interrupts were disabled.
    preemptoff
    Similar to the preemptirqsoff tracer but traces only the maximum interval for which pre-emption was disabled.
    irqsoff
    Similar to the preemptirqsoff tracer but traces only the maximum interval for which interrupts were disabled.
    nop
    The default tracer. It does not provide any tracing facility itself, but as events may interleave into any tracer, the nop tracer is used for specific interest in tracing events.
  2. To manually start a tracing session, first select the tracer you wish to use from the list in available_tracers and then use the echo command to insert the name of the tracer into /sys/kernel/debug/tracing/current_tracer:
    ~]# echo preemptoff > /sys/kernel/debug/tracing/current_tracer
  3. To check if function and function_graph tracing is enabled, use the cat command to view the /sys/kernel/debug/tracing/options/function-trace file. A value of 1 indicates that this is enabled, and 0 indicates that it has been disabled.
    ~]# cat /sys/kernel/debug/tracing/options/function-trace
    1
    
    By default, function and function_graph tracing is enabled. To turn this feature on or off, echo the appropriate value to the /sys/kernel/debug/tracing/options/function-trace file.
    ~]# echo 0 > /sys/kernel/debug/tracing/options/function-trace
    ~]# echo 1 > /sys/kernel/debug/tracing/options/function-trace

    Important

    When using the echo command, ensure you place a space character in between the value and the > character. At the shell prompt, using 0>, 1>, and 2> (without a space character) refers to standard input, standard output and standard error. Using them by mistake could result in unexpected trace output.
    The function-trace option is useful because tracing latencies with wakeup_rt, preemptirqsoff and so on automatically enables function tracing, which may exaggerate the overhead.
  4. Adjust details and parameters of the tracers by changing the values for the various files in the /debugfs/tracing/ directory. Some examples are:
    The irqsoff, preemptoff, preempirqsoff, and wakeup tracers continuously monitor latencies. When they record a latency greater than the one recorded in tracing_max_latency the trace of that latency is recorded, and tracing_max_latency is updated to the new maximum time. In this way, tracing_max_latency will always show the highest recorded latency since it was last reset.
    To reset the maximum latency, echo 0 into the tracing_max_latency file. To see only latencies greater than a set amount, echo the amount in microseconds:
    ~]# echo 0 > /sys/kernel/debug/tracing/tracing_max_latency
    When the tracing threshold is set, it overrides the maximum latency setting. When a latency is recorded that is greater than the threshold, it will be recorded regardless of the maximum latency. When reviewing the trace file, only the last recorded latency is shown.
    To set the threshold, echo the number of microseconds above which latencies must be recorded:
    ~]# echo 200 > /sys/kernel/debug/tracing/tracing_thresh
  5. View the trace logs:
    ~]# cat /sys/kernel/debug/tracing/trace
  6. To store the trace logs, copy them to another file:
    ~]# cat /sys/kernel/debug/tracing/trace > /tmp/lat_trace_log
  7. Function tracing can be filtered by altering the settings in the /sys/kernel/debug/tracing/set_ftrace_filter file. If no filters are specified in the file, all functions are traced. Use the cat to view the current filters:
    ~]# cat /sys/kernel/debug/tracing/set_ftrace_filter
  8. To change the filters, echo the name of the function to be traced. The filter allows the use of a * wildcard at the beginning or end of a search term.
    The * wildcard can also be used at both the beginning and end of a word. For example: *irq* will select all functions that contain irq in the name. The wildcard cannot, however, be used inside a word.
    Encasing the search term and the wildcard character in double quotation marks ensures that the shell will not attempt to expand the search to the present working directory.
    Some examples of filters are:
    • Trace only the schedule function:
      ~]# echo schedule > /sys/kernel/debug/tracing/set_ftrace_filter
    • Trace all functions that end with lock:
      ~]# echo "*lock" > /sys/kernel/debug/tracing/set_ftrace_filter
    • Trace all functions that start with spin_:
      ~]# echo "spin_*" > /sys/kernel/debug/tracing/set_ftrace_filter
    • Trace all functions with cpu in the name:
      ~]# echo "*cpu*" > /sys/kernel/debug/tracing/set_ftrace_filter

    Note

    If you use a single > with the echo command, it will override any existing value in the file. If you wish to append the value to the file, use >> instead.