Chapter 25. Performance Counters for Linux (PCL) Tools and perf
Performance Counters for Linux (PCL) is a kernel-based subsystem that provides a framework for collecting and analyzing performance data. Red Hat Enterprise Linux 7 includes this kernel subsystem to collect data and the user-space tool
perf to analyze the collected performance data. The PCL subsystem can be used to measure hardware events, including retired instructions and processor clock cycles. It can also measure software events, including major page faults and context switches. For example, PCL counters can compute Instructions Per Clock (IPC) from a process’s counts of instructions retired and processor clock cycles. A low IPC ratio indicates the code makes poor use of CPU. Other hardware events can also be used to diagnose poor CPU performance.
Performance counters can also be configured to record samples. The relative amount of samples can be used to identify which regions of code have the greatest impact on performance.
25.1. Perf Tool Commands
- perf stat
perfcommand provides overall statistics for common performance events, including instructions executed and clock cycles consumed. Options allow the selection of events other than the default measurement events.
- perf record
perfcommand records performance data into a file which can be later analyzed using
- perf report
perfcommand reads the performance data from a file and analyzes the recorded data.
- perf list
perfcommand lists the events available on a particular machine. These events will vary based on performance monitoring hardware and software configuration of the system.
perf help to obtain a complete list of
perf commands. To retrieve
man page information on each
perf command, use
perf help command.
25.2. Using Perf
To collect statistics on
make and its children, use the following command:
# perf stat -- make all
perf command collects a number of different hardware and software counters. It then prints the following information:
Performance counter stats for 'make all': 244011.782059 task-clock-msecs # 0.925 CPUs 53328 context-switches # 0.000 M/sec 515 CPU-migrations # 0.000 M/sec 1843121 page-faults # 0.008 M/sec 789702529782 cycles # 3236.330 M/sec 1050912611378 instructions # 1.331 IPC 275538938708 branches # 1129.203 M/sec 2888756216 branch-misses # 1.048 % 4343060367 cache-references # 17.799 M/sec 428257037 cache-misses # 1.755 M/sec 263.779192511 seconds time elapsed
perf tool can also record samples. For example, to record data on the
make command and its children, use:
# perf record -- make all
This prints out the file in which the samples are stored, along with the number of samples collected:
[ perf record: Woken up 42 times to write data ] [ perf record: Captured and wrote 9.753 MB perf.data (~426109 samples) ]
Performance Counters for Linux (PCL) Tools conflict with OProfile
Both OProfile and Performance Counters for Linux (PCL) use the same hardware Performance Monitoring Unit (PMU). If OProfile is running while attempting to use the PCL
perf command, an error message like the following occurs when starting OProfile:
Error: open_counter returned with 16 (Device or resource busy). /usr/bin/dmesg may provide additional information. Fatal: Not all events could be opened.
To use the
perf command, first shut down OProfile:
# opcontrol --deinit
You can then analyze
perf.data to determine the relative frequency of samples. The report output includes the command, object, and function for the samples. Use
perf report to output an analysis of
perf.data. For example, the following command produces a report of the executable that consumes the most time:
# perf report --sort=comm
The resulting output:
# Samples: 1083783860000 # # Overhead Command # ........ ............... # 48.19% xsltproc 44.48% pdfxmltex 6.01% make 0.95% perl 0.17% kernel-doc 0.05% xmllint 0.05% cc1 0.03% cp 0.01% xmlto 0.01% sh 0.01% docproc 0.01% ld 0.01% gcc 0.00% rm 0.00% sed 0.00% git-diff-files 0.00% bash 0.00% git-diff-index
The column on the left shows the relative amount of samples. This output shows that
make spends most of this time in
pdfxmltex. To reduce the time for
make to complete, focus on
pdfxmltex. To list functions executed by
# perf report -n --comm=xsltproc
comm: xsltproc # Samples: 472520675377 # # Overhead Samples Shared Object Symbol # ........ .......... ............................. ...... # 45.54%215179861044 libxml2.so.2.7.6 [.] xmlXPathCmpNodesExt 11.63%54959620202 libxml2.so.2.7.6 [.] xmlXPathNodeSetAdd__internal_alias 8.60%40634845107 libxml2.so.2.7.6 [.] xmlXPathCompOpEval 4.63%21864091080 libxml2.so.2.7.6 [.] xmlXPathReleaseObject 2.73%12919672281 libxml2.so.2.7.6 [.] xmlXPathNodeSetSort__internal_alias 2.60%12271959697 libxml2.so.2.7.6 [.] valuePop 2.41%11379910918 libxml2.so.2.7.6 [.] xmlXPathIsNaN__internal_alias 2.19%10340901937 libxml2.so.2.7.6 [.] valuePush__internal_alias