Chapter 19. Recording and analyzing performance profiles with perf

The perf tool allows you to record performance data and analyze it at a later time.

Prerequisites

  • You have the perf user space tool installed as described in Installing perf.

19.1. The purpose of perf record

The perf record command samples performance data and stores it in a file, perf.data, which can be read and visualized with other perf commands. perf.data is generated in the current directory and can be accessed at a later time, possibly on a different machine.

If you do not specify a command for perf record to record during, it will record until you manually stop the process by pressing Ctrl+C. You can attach perf record to specific processes by passing the -p option followed by one or more process IDs. You can run perf record without root access, however, doing so will only sample performance data in the user space. In the default mode, perf record uses CPU cycles as the sampling event and operates in per-thread mode with inherit mode enabled.

19.2. Recording a performance profile without root access

You can use perf record without root access to sample and record performance data in the user-space only.

Prerequisites

  • You have the perf user space tool installed as described in Installing perf.

Procedure

  • Sample and record the performance data:

    $ perf record command

    Replace command with the command you want to sample data during. If you do not specify a command, then perf record will sample data until you manually stop it by pressing Ctrl+C.

Additional resources

  • perf-record(1) man page

19.3. Recording a performance profile with root access

You can use perf record with root access to sample and record performance data in both the user-space and the kernel-space simultaneously.

Prerequisites

  • You have the perf user space tool installed as described in Installing perf.
  • You have root access.

Procedure

  • Sample and record the performance data:

    # perf record command

    Replace command with the command you want to sample data during. If you do not specify a command, then perf record will sample data until you manually stop it by pressing Ctrl+C.

Additional resources

  • perf-record(1) man page

19.4. Recording a performance profile in per-CPU mode

You can use perf record in per-CPU mode to sample and record performance data in both and user-space and the kernel-space simultaneously across all threads on a monitored CPU. By default, per-CPU mode monitors all online CPUs.

Prerequisites

  • You have the perf user space tool installed as described in Installing perf.

Procedure

  • Sample and record the performance data:

    # perf record -a command

    Replace command with the command you want to sample data during. If you do not specify a command, then perf record will sample data until you manually stop it by pressing Ctrl+C.

Additional resources

  • perf-record(1) man page

19.5. Capturing call graph data with perf record

You can configure the perf record tool so that it records which function is calling other functions in the performance profile. This helps to identify a bottleneck if several processes are calling the same function.

Prerequisites

  • You have the perf user space tool installed as described in Installing perf.

Procedure

  • Sample and record performance data with the --call-graph option:

    $ perf record --call-graph method command
    • Replace command with the command you want to sample data during. If you do not specify a command, then perf record will sample data until you manually stop it by pressing Ctrl+C.
    • Replace method with one of the following unwinding methods:

      fp
      Uses the frame pointer method. Depending on compiler optimization, such as with binaries built with the GCC option --fomit-frame-pointer, this may not be able to unwind the stack.
      dwarf
      Uses DWARF Call Frame Information to unwind the stack.
      lbr
      Uses the last branch record hardware on Intel processors.

Additional resources

  • perf-record(1) man page

19.6. Analyzing perf.data with perf report

You can use perf report to display and analyze a perf.data file.

Prerequisites

  • You have the perf user space tool installed as described in Installing perf.
  • There is a perf.data file in the current directory.
  • If the perf.data file was created with root access, you need to run perf report with root access too.

Procedure

  • Display the contents of the perf.data file for further analysis:

    # perf report

    This command displays output similar to the following:

    Samples: 2K of event 'cycles', Event count (approx.): 235462960
    Overhead  Command          Shared Object                     Symbol
       2.36%  kswapd0          [kernel.kallsyms]                 [k] page_vma_mapped_walk
       2.13%  sssd_kcm         libc-2.28.so                      [.] memset_avx2_erms 2.13% perf [kernel.kallsyms] [k] smp_call_function_single 1.53% gnome-shell libc-2.28.so [.] strcmp_avx2
       1.17%  gnome-shell      libglib-2.0.so.0.5600.4           [.] g_hash_table_lookup
       0.93%  Xorg             libc-2.28.so                      [.] memmove_avx_unaligned_erms 0.89% gnome-shell libgobject-2.0.so.0.5600.4 [.] g_object_unref 0.87% kswapd0 [kernel.kallsyms] [k] page_referenced_one 0.86% gnome-shell libc-2.28.so [.] memmove_avx_unaligned_erms
       0.83%  Xorg             [kernel.kallsyms]                 [k] alloc_vmap_area
       0.63%  gnome-shell      libglib-2.0.so.0.5600.4           [.] g_slice_alloc
       0.53%  gnome-shell      libgirepository-1.0.so.1.0.0      [.] g_base_info_unref
       0.53%  gnome-shell      ld-2.28.so                        [.] _dl_find_dso_for_object
       0.49%  kswapd0          [kernel.kallsyms]                 [k] vma_interval_tree_iter_next
       0.48%  gnome-shell      libpthread-2.28.so                [.] pthread_getspecific 0.47% gnome-shell libgirepository-1.0.so.1.0.0 [.] 0x0000000000013b1d 0.45% gnome-shell libglib-2.0.so.0.5600.4 [.] g_slice_free1 0.45% gnome-shell libgobject-2.0.so.0.5600.4 [.] g_type_check_instance_is_fundamentally_a 0.44% gnome-shell libc-2.28.so [.] malloc 0.41% swapper [kernel.kallsyms] [k] apic_timer_interrupt 0.40% gnome-shell ld-2.28.so [.] _dl_lookup_symbol_x 0.39% kswapd0 [kernel.kallsyms] [k] raw_callee_save___pv_queued_spin_unlock

Additional resources

  • perf-report(1) man page

19.7. Interpretation of perf report output

The table displayed by running the perf report command sorts the data into several columns:

The 'Overhead' column
Indicates what percentage of overall samples were collected in that particular function.
The 'Command' column
Tells you which process the samples were collected from.
The 'Shared Object' column
Displays the name of the ELF image where the samples come from (the name [kernel.kallsyms] is used when the samples come from the kernel).
The 'Symbol' column
Displays the function name or symbol.

In default mode, the functions are sorted in descending order with those with the highest overhead displayed first.

19.8. Generating a perf.data file that is readable on a different device

You can use the perf tool to record performance data into a perf.data file to be analyzed on a different device.

Prerequisites

Procedure

  1. Capture performance data you are interested in investigating further:

    # perf record -a --call-graph fp sleep seconds

    This example would generate a perf.data over the entire system for a period of seconds seconds as dictated by the use of the sleep command. It would also capture call graph data using the frame pointer method.

  2. Generate an archive file containing debug symbols of the recorded data:

    # perf archive

Verification steps

  • Verify that the archive file has been generated in your current active directory:

    # ls perf.data*

    The output will display every file in your current directory that begins with perf.data. The archive file will be named either:

    perf.data.tar.gz

    or

    perf.data.tar.bz2

19.9. Analyzing a perf.data file that was created on a different device

You can use the perf tool to analyze a perf.data file that was generated on a different device.

Prerequisites

  • You have the perf user space tool installed as described in Installing perf.
  • A perf.data file and associated archive file generated on a different device are present on the current device being used.

Procedure

  1. Copy both the perf.data file and the archive file into your current active directory.
  2. Extract the archive file into ~/.debug:

    # mkdir -p ~/.debug
    # tar xf perf.data.tar.bz2 -C ~/.debug
    Note

    The archive file might also be named perf.data.tar.gz.

  3. Open the perf.data file for further analysis:

    # perf report

19.10. Why perf displays some function names as raw function addresses

For kernel functions, perf uses the information from the /proc/kallsyms file to map the samples to their respective function names or symbols. For functions executed in the user space, however, you might see raw function addresses because the binary is stripped.

The debuginfo package of the executable must be installed or, if the executable is a locally developed application, the application must be compiled with debugging information turned on (the -g option in GCC) to display the function names or symbols in such a situation.

Note

It is not necessary to re-run the perf record command after installing the debuginfo associated with an executable. Simply re-run the perf report command.

19.11. Enabling debug and source repositories

A standard installation of Red Hat Enterprise Linux does not enable the debug and source repositories. These repositories contain information needed to debug the system components and measure their performance.

Procedure

  • Enable the source and debug information package channels: The $(uname -i) part is automatically replaced with a matching value for architecture of your system:

    Architecture nameValue

    64-bit Intel and AMD

    x86_64

    64-bit ARM

    aarch64

    IBM POWER

    ppc64le

    64-bit IBM Z

    s390x

19.12. Getting debuginfo packages for an application or library using GDB

Debugging information is required to debug code. For code that is installed from a package, the GNU Debugger (GDB) automatically recognizes missing debug information, resolves the package name and provides concrete advice on how to get the package.

Prerequisites

  • The application or library you want to debug must be installed on the system.
  • GDB and the debuginfo-install tool must be installed on the system. For details, see Setting up to debug applications.
  • Repositories providing debuginfo and debugsource packages must be configured and enabled on the system. For details, see Enabling debug and source repositories.

Procedure

  1. Start GDB attached to the application or library you want to debug. GDB automatically recognizes missing debugging information and suggests a command to run.

    $ gdb -q /bin/ls
    Reading symbols from /bin/ls...Reading symbols from .gnu_debugdata for /usr/bin/ls...(no debugging symbols found)...done.
    (no debugging symbols found)...done.
    Missing separate debuginfos, use: dnf debuginfo-install coreutils-8.30-6.el8.x86_64
    (gdb)
  2. Exit GDB: type q and confirm with Enter.

    (gdb) q
  3. Run the command suggested by GDB to install the required debuginfo packages:

    # dnf debuginfo-install coreutils-8.30-6.el8.x86_64

    The dnf package management tool provides a summary of the changes, asks for confirmation and once you confirm, downloads and installs all the necessary files.

  4. In case GDB is not able to suggest the debuginfo package, follow the procedure described in Getting debuginfo packages for an application or library manually.

Additional resources