Chapter 13. Monitoring application performance with perf

This section describes how to use the perf tool to monitor application performance.

13.1. Attaching perf record to a running process

Prerequisites

You can attach perf record to a running process. This will instruct perf record to only sample and record performance data in the specified processes.

Prerequisites

  • The perf user space tool is installed. For more information, see Installing perf.

Procedure

  • Attach perf record to a running process:

    $ perf record -p ID1,ID2 sleep seconds

    The previous example samples and records performance data of the processes with the process ID’s ID1 and ID2 for a time period of seconds seconds as dictated by using the sleep command. You can also configure perf to record events in specific threads:

    $ perf record -t ID1,ID2 sleep seconds
    Note

    When using the -t flag and stipulating thread ID’s, perf disables inheritance by default. You can enable inheritance by adding the --inherit option.

13.2. Capturing call graph data with perf record

You can configure the perf record tool so that it records which function is calling other functions in the performance profile. This helps to identify a bottleneck if several processes are calling the same function.

Prerequisites

  • The perf user space tool is installed. For more information, see Installing perf.

Procedure

  • Sample and record performance data with the --call-graph option:

    $ perf record --call-graph method command
    • Replace command with the command you want to sample data during. If you do not specify a command, then perf record will sample data until you manually stop it by pressing Ctrl+C.
    • Replace method with one of the following unwinding methods:

      fp
      Uses the frame pointer method. Depending on compiler optimization, such as with binaries built with the GCC option --fomit-frame-pointer, this may not be able to unwind the stack.
      dwarf
      Uses DWARF Call Frame Information to unwind the stack.
      lbr
      Uses the last branch record hardware on Intel processors.

Additional resources

  • The perf-record(1) man page.

13.3. Analyzing perf.data with perf report

You can use perf report to display and analyze a perf.data file.

Prerequisites

  • The perf user space tool is installed. For more information, see Installing perf.
  • There is a perf.data file in the current directory.
  • If the perf.data file was created with root access, you need to run perf report with root access too.

Procedure

  • Display the contents of the perf.data file for further analysis:

    # perf report

    Example 13.1. Example output

    Samples: 2K of event 'cycles', Event count (approx.): 235462960
    Overhead  Command          Shared Object                     Symbol
       2.36%  kswapd0          [kernel.kallsyms]                 [k] page_vma_mapped_walk
       2.13%  sssd_kcm         libc-2.28.so                      [.] __memset_avx2_erms
       2.13%  perf             [kernel.kallsyms]                 [k] smp_call_function_single
       1.53%  gnome-shell      libc-2.28.so                      [.] __strcmp_avx2
       1.17%  gnome-shell      libglib-2.0.so.0.5600.4           [.] g_hash_table_lookup
       0.93%  Xorg             libc-2.28.so                      [.] __memmove_avx_unaligned_erms
       0.89%  gnome-shell      libgobject-2.0.so.0.5600.4        [.] g_object_unref
       0.87%  kswapd0          [kernel.kallsyms]                 [k] page_referenced_one
       0.86%  gnome-shell      libc-2.28.so                      [.] __memmove_avx_unaligned_erms
       0.83%  Xorg             [kernel.kallsyms]                 [k] alloc_vmap_area
       0.63%  gnome-shell      libglib-2.0.so.0.5600.4           [.] g_slice_alloc
       0.53%  gnome-shell      libgirepository-1.0.so.1.0.0      [.] g_base_info_unref
       0.53%  gnome-shell      ld-2.28.so                        [.] _dl_find_dso_for_object
       0.49%  kswapd0          [kernel.kallsyms]                 [k] vma_interval_tree_iter_next
       0.48%  gnome-shell      libpthread-2.28.so                [.] __pthread_getspecific
       0.47%  gnome-shell      libgirepository-1.0.so.1.0.0      [.] 0x0000000000013b1d
       0.45%  gnome-shell      libglib-2.0.so.0.5600.4           [.] g_slice_free1
       0.45%  gnome-shell      libgobject-2.0.so.0.5600.4        [.] g_type_check_instance_is_fundamentally_a
       0.44%  gnome-shell      libc-2.28.so                      [.] malloc
       0.41%  swapper          [kernel.kallsyms]                 [k] apic_timer_interrupt
       0.40%  gnome-shell      ld-2.28.so                        [.] _dl_lookup_symbol_x
       0.39%  kswapd0          [kernel.kallsyms]                 [k] __raw_callee_save___pv_queued_spin_unlock

Additonal resources

  • The perf-report(1) man page.