Chapter 28. Analyzing application performance

Perf is a performance analysis tool. It provides a simple command line interface and abstracts the CPU hardware difference in Linux performance measurements. Perf is based on the perf_events interface exported by the kernel.

One advantage of perf is that it is both kernel and architecture neutral. The analysis data can be reviewed without requiring a specific system configuration.

Prerequisites

  • The perf package must be installed on the system.
  • You have administrator privileges.

28.1. Collecting system-wide statistics

The perf record command is used for collecting system-wide statistics. It can be used in all processors.

Procedure

  • Collect system-wide performance statistics.

    # perf record -a
    ^C[ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.725 MB perf.data (~31655 samples) ]

    In this example, all CPUs are denoted with the -a option, and the process was terminated after a few seconds. The results show that it collected 0.725 MB of data and stored it to a newly-created perf.data file.

Verification

  • Ensure that the results file was created.

    # ls
    perf.data

28.2. Archiving performance analysis results

You can analyze the results of the perf on other systems using the perf archive command. This may not be necessary, if:

  • Dynamic Shared Objects (DSOs), such as binaries and libraries, are already present in the analysis system, such as the ~/.debug/ cache.
  • Both systems have the same set of binaries.

Procedure

  1. Create an archive of the results from the perf command.

    # perf archive
  2. Create a tarball from the archive.

    # tar cvf perf.data.tar.bz2 -C ~/.debug

28.3. Analyzing performance analysis results

The data from the perf record feature can now be investigated directly using the perf report command.

Procedure

  • Analyze the results directly from the perf.data file or from an archived tarball.

    # perf report

    The output of the report is sorted according to the maximum CPU usage in percentage by the application. It shows if the sample has occurred in the kernel or user space of the process.

    The report shows information about the module from which the sample was taken:

    • A kernel sample that did not take place in a kernel module is marked with the notation [kernel.kallsyms].
    • A kernel sample that took place in the kernel module is marked as [module], [ext4].
    • For a process in user space, the results might show the shared library linked with the process.

      The report denotes whether the process also occurs in kernel or user space.

    • The result [.] indicates user space.
    • The result [k] indicates kernel space.

    Finer grained details are available for review, including data appropriate for experienced perf developers.

28.4. Listing pre-defined events

There are a range of available options to get the hardware tracepoint activity.

Procedure

  • List pre-defined hardware and software events:

    # perf list
    List of pre-defined events (to be used in -e):
      cpu-cycles OR cycles                               [Hardware event]
      stalled-cycles-frontend OR idle-cycles-frontend    [Hardware event]
      stalled-cycles-backend OR idle-cycles-backend      [Hardware event]
      instructions                                       [Hardware event]
      cache-references                                   [Hardware event]
      cache-misses                                       [Hardware event]
      branch-instructions OR branches                    [Hardware event]
      branch-misses                                      [Hardware event]
      bus-cycles                                         [Hardware event]
    
      cpu-clock                                          [Software event]
      task-clock                                         [Software event]
      page-faults OR faults                              [Software event]
      minor-faults                                       [Software event]
      major-faults                                       [Software event]
      context-switches OR cs                             [Software event]
      cpu-migrations OR migrations                       [Software event]
      alignment-faults                                   [Software event]
      emulation-faults                                   [Software event]
      ...[output truncated]...

28.5. Getting statistics about specified events

You can view specific events using the perf stat command.

Procedure

  1. View the number of context switches with the perf stat feature:

    # perf stat -e context-switches -a sleep 5
    ^Performance counter stats for 'sleep 5':
    
                15,619 context-switches
    
           5.002060064 seconds time elapsed

    The results show that in 5 seconds, 15619 context switches took place.

  2. View file system activity by running a script. The following shows an example script:

    # for i in {1..100}; do touch /tmp/$i; sleep 1; done
  3. In another terminal run the perf stat command:

    # perf stat -e ext4:ext4_request_inode -a sleep 5
     Performance counter stats for 'sleep 5':
    
                     5 ext4:ext4_request_inode
    
           5.002253620 seconds time elapsed

    The results show that in 5 seconds the script asked to create 5 files, indicating that there are 5 inode requests.

28.6. Additional resources

  • perf help COMMAND
  • perf(1) man page