Red Hat Training

A Red Hat training course is available for Red Hat Enterprise Linux

7.2. Monitoring and Diagnosing Performance Problems

Red Hat Enterprise Linux 7 provides a number of tools that are useful for monitoring system performance and diagnosing performance problems related to system memory. This section outlines the available tools and gives examples of how to use them to monitor and diagnose memory related performance issues.

7.2.1. Monitoring Memory Usage with vmstat

Vmstat, provided by the procps-ng package, outputs reports on your system's processes, memory, paging, block input/output, interrupts, and CPU activity. It provides an instantaneous report of the average of these events since the machine was last booted, or since the previous report.
The following command displays a table of various event counters and memory statistics.
$ vmstat -s
For further details of how to use vmstat, see Section A.8, “vmstat” or the man page:
$ man vmstat

7.2.2. Profiling Application Memory Usage with Valgrind

Valgrind is a framework that provides instrumentation to user-space binaries. It ships with a number of tools that can be used to profile and analyze program performance. The valgrind tools outlined in this section can help you to detect memory errors such as uninitialized memory use and improper memory allocation or deallocation.
To use valgrind or any of its tools, install the valgrind package:
# yum install valgrind

7.2.2.1. Profiling Memory Usage with Memcheck

Memcheck is the default valgrind tool. It detects and reports on a number of memory errors that can be difficult to detect and diagnose, such as:
  • Memory access that should not occur
  • Undefined or uninitialized value use
  • Incorrectly freed heap memory
  • Pointer overlap
  • Memory leaks

Note

Memcheck can only report these errors; it cannot prevent them from occurring. If your program accesses memory in a way that would normally cause a segmentation fault, the segmentation fault still occurs. However, memcheck will log an error message immediately prior to the fault.
Because memcheck uses instrumentation, applications executed with memcheck run ten to thirty times slower than usual.
To run memcheck on an application, execute the following command:
# valgrind --tool=memcheck application
You can also use the following options to focus memcheck output on specific types of problem.
--leak-check
After the application finishes executing, memcheck searches for memory leaks. The default value is --leak-check=summary, which prints the number of memory leaks found. You can specify --leak-check=yes or --leak-check=full to output details of each individual leak. To disable, specify --leak-check=no.
--undef-value-errors
The default value is --undef-value-errors=yes, which reports errors when undefined values are used. You can also specify --undef-value-errors=no, which will disable this report and slightly speed up Memcheck.
--ignore-ranges
Specifies one or more ranges that memcheck should ignore when checking for memory addressability, for example, --ignore-ranges=0xPP-0xQQ,0xRR-0xSS.
For a full list of memcheck options, see the documentation included at /usr/share/doc/valgrind-version/valgrind_manual.pdf.

7.2.2.2. Profiling Cache Usage with Cachegrind

Cachegrind simulates application interaction with a system's cache hierarchy and branch predictor. It tracks usage of the simulated first level instruction and data caches to detect poor code interaction with this level of cache. It also tracks the last level of cache (second or third level) in order to track memory access. As such, applications executed with cachegrind run twenty to one hundred times slower than usual.
Cachegrind gathers statistics for the duration of application execution and outputs a summary to the console. To run cachegrind on an application, execute the following command:
# valgrind --tool=cachegrind application
You can also use the following options to focus cachegrind output on a specific problem.
--I1
Specifies the size, associativity, and line size of the first level instruction cache, like so: --I1=size,associativity,line_size.
--D1
Specifies the size, associativity, and line size of the first level data cache, like so: --D1=size,associativity,line_size.
--LL
Specifies the size, associativity, and line size of the last level cache, like so: --LL=size,associativity,line_size.
--cache-sim
Enables or disables the collection of cache access and miss counts. This is enabled (--cache-sim=yes) by default. Disabling both this and --branch-sim leaves cachegrind with no information to collect.
--branch-sim
Enables or disables the collection of branch instruction and incorrect prediction counts. This is enabled (--branch-sim=yes) by default. Disabling both this and --cache-sim leaves cachegrind with no information to collect.
Cachegrind writes detailed profiling information to a per-process cachegrind.out.pid file, where pid is the process identifier. This detailed information can be further processed by the companion cg_annotate tool, like so:
# cg_annotate cachegrind.out.pid
Cachegrind also provides the cg_diff tool, which makes it easier to chart program performance before and after a code change. To compare output files, execute the following command, replacing first with the initial profile output file, and second with the subsequent profile output file.
# cg_diff first second
The resulting output file can be viewed in greater detail with the cg_annotate tool.
For a full list of cachegrind options, see the documentation included at /usr/share/doc/valgrind-version/valgrind_manual.pdf.

7.2.2.3. Profiling Heap and Stack Space with Massif

Massif measures the heap space used by a specified application. It measures both useful space and any additional space allocated for bookkeeping and alignment purposes. massif helps you understand how you can reduce your application's memory use to increase execution speed and reduce the likelihood that your application will exhaust system swap space. Applications executed with massif run about twenty times slower than usual.
To run massif on an application, execute the following command:
# valgrind --tool=massif application
You can also use the following options to focus massif output on a specific problem.
--heap
Specifies whether massif profiles the heap. The default value is --heap=yes. Heap profiling can be disabled by setting this to --heap=no.
--heap-admin
Specifies the number of bytes per block to use for administration when heap profiling is enabled. The default value is 8 bytes.
--stacks
Specifies whether massif profiles the stack. The default value is --stack=no, as stack profiling can greatly slow massif. Set this option to --stack=yes to enable stack profiling. Note that massif assumes that the main stack starts with a size of zero in order to better indicate the changes in stack size that relate to the application being profiled.
--time-unit
Specifies the interval at which massif gathers profiling data. The default value is i (instructions executed). You can also specify ms (milliseconds, or realtime) and B (bytes allocated or deallocated on the heap and stack). Examining bytes allocated is useful for short run applications and for testing purposes, as it is most reproducible across different hardware.
Massif outputs profiling data to a massif.out.pid file, where pid is the process identifier of the specified application. The ms_print tool graphs this profiling data to show memory consumption over the execution of the application, as well as detailed information about the sites responsible for allocation at points of peak memory allocation. To graph the data from the massif.out.pid file, execute the following command:
# ms_print massif.out.pid
For a full list of Massif options, see the documentation included at /usr/share/doc/valgrind-version/valgrind_manual.pdf.