Red Hat Training

A Red Hat training course is available for RHEL 8

Chapter 25. Profiling memory accesses with perf mem

You can use the perf mem command to sample memory accesses on your system.

25.1. The purpose of perf mem

The mem subcommand of the perf tool enables the sampling of memory accesses (loads and stores). The perf mem command provides information about memory latency, types of memory accesses, functions causing cache hits and misses, and, by recording the data symbol, the memory locations where these hits and misses occur.

25.2. Sampling memory access with perf mem

This procedure describes how to use the perf mem command to sample memory accesses on your system. The command takes the same options as perf record and perf report as well as some options exclusive to the mem subcommand. The recorded data is stored in a perf.data file in the current directory for later analysis.

Prerequisites

  • You have the perf user space tool installed as described in Installing perf.

Procedure

  1. Sample the memory accesses:

    # perf mem record -a sleep seconds

    This example samples memory accesses across all CPUs for a period of seconds seconds as dictated by the sleep command. You can replace the sleep command for any command during which you want to sample memory access data. By default, perf mem samples both memory loads and stores. You can select only one memory operation by using the -t option and specifying either "load" or "store" between perf mem and record. For loads, information over the memory hierarchy level, TLB memory accesses, bus snoops, and memory locks is captured.

  2. Open the perf.data file for analysis:

    # perf mem report

    If you have used the example commands, the output is:

    Available samples
    35k cpu/mem-loads,ldlat=30/P
    54k cpu/mem-stores/P

    The cpu/mem-loads,ldlat=30/P line denotes data collected over memory loads and the cpu/mem-stores/P line denotes data collected over memory stores. Highlight the category of interest and press Enter to view the data:

    Samples: 35K of event 'cpu/mem-loads,ldlat=30/P', Event count (approx.): 4067062
    Overhead       Samples  Local Weight  Memory access             Symbol                                                                 Shared Object                 Data Symbol                                                     Data Object                            Snoop         TLB access              Locked
       0.07%            29  98            L1 or L1 hit              [.] 0x000000000000a255                                                 libspeexdsp.so.1.5.0          [.] 0x00007f697a3cd0f0                                          anon                                   None          L1 or L2 hit            No
       0.06%            26  97            L1 or L1 hit              [.] 0x000000000000a255                                                 libspeexdsp.so.1.5.0          [.] 0x00007f697a3cd0f0                                          anon                                   None          L1 or L2 hit            No
       0.06%            25  96            L1 or L1 hit              [.] 0x000000000000a255                                                 libspeexdsp.so.1.5.0          [.] 0x00007f697a3cd0f0                                          anon                                   None          L1 or L2 hit            No
       0.06%             1  2325          Uncached or N/A hit       [k] pci_azx_readl                                                      [kernel.kallsyms]             [k] 0xffffb092c06e9084                                          [kernel.kallsyms]                      None          L1 or L2 hit            No
       0.06%             1  2247          Uncached or N/A hit       [k] pci_azx_readl                                                      [kernel.kallsyms]             [k] 0xffffb092c06e8164                                          [kernel.kallsyms]                      None          L1 or L2 hit            No
       0.05%             1  2166          L1 or L1 hit              [.] 0x00000000038140d6                                                 libxul.so                     [.] 0x00007ffd7b84b4a8                                          [stack]                                None          L1 or L2 hit            No
       0.05%             1  2117          Uncached or N/A hit       [k] check_for_unclaimed_mmio                                           [kernel.kallsyms]             [k] 0xffffb092c1842300                                          [kernel.kallsyms]                      None          L1 or L2 hit            No
       0.05%            22  95            L1 or L1 hit              [.] 0x000000000000a255                                                 libspeexdsp.so.1.5.0          [.] 0x00007f697a3cd0f0                                          anon                                   None          L1 or L2 hit            No
       0.05%             1  1898          L1 or L1 hit              [.] 0x0000000002a30e07                                                 libxul.so                     [.] 0x00007f610422e0e0                                          anon                                   None          L1 or L2 hit            No
       0.05%             1  1878          Uncached or N/A hit       [k] pci_azx_readl                                                      [kernel.kallsyms]             [k] 0xffffb092c06e8164                                          [kernel.kallsyms]                      None          L2 miss                 No
       0.04%            18  94            L1 or L1 hit              [.] 0x000000000000a255                                                 libspeexdsp.so.1.5.0          [.] 0x00007f697a3cd0f0                                          anon                                   None          L1 or L2 hit            No
       0.04%             1  1593          Local RAM or RAM hit      [.] 0x00000000026f907d                                                 libxul.so                     [.] 0x00007f3336d50a80                                          anon                                   Hit           L2 miss                 No
       0.03%             1  1399          L1 or L1 hit              [.] 0x00000000037cb5f1                                                 libxul.so                     [.] 0x00007fbe81ef5d78                                          libxul.so                              None          L1 or L2 hit            No
       0.03%             1  1229          LFB or LFB hit            [.] 0x0000000002962aad                                                 libxul.so                     [.] 0x00007fb6f1be2b28                                          anon                                   None          L2 miss                 No
       0.03%             1  1202          LFB or LFB hit            [.] __pthread_mutex_lock                                               libpthread-2.29.so            [.] 0x00007fb75583ef20                                          anon                                   None          L1 or L2 hit            No
       0.03%             1  1193          Uncached or N/A hit       [k] pci_azx_readl                                                      [kernel.kallsyms]             [k] 0xffffb092c06e9164                                          [kernel.kallsyms]                      None          L2 miss                 No
       0.03%             1  1191          L1 or L1 hit              [k] azx_get_delay_from_lpib                                            [kernel.kallsyms]             [k] 0xffffb092ca7efcf0                                          [kernel.kallsyms]                      None          L1 or L2 hit            No

    Alternatively, you can sort your results to investigate different aspects of interest when displaying the data. For example, to sort data over memory loads by type of memory accesses occurring during the sampling period in descending order of overhead they account for:

    # perf mem -t load report --sort=mem

    For example, the output can be:

    Samples: 35K of event 'cpu/mem-loads,ldlat=30/P', Event count (approx.): 40670
    Overhead       Samples  Memory access
      31.53%          9725  LFB or LFB hit
      29.70%         12201  L1 or L1 hit
      23.03%          9725  L3 or L3 hit
      12.91%          2316  Local RAM or RAM hit
       2.37%           743  L2 or L2 hit
       0.34%             9  Uncached or N/A hit
       0.10%            69  I/O or N/A hit
       0.02%           825  L3 miss

Additional resources

  • perf-mem(1) man page.

25.3. Interpretation of perf mem report output

The table displayed by running the perf mem report command without any modifiers sorts the data into several columns:

The 'Overhead' column
Indicates percentage of overall samples collected in that particular function.
The 'Samples' column
Displays the number of samples accounted for by that row.
The 'Local Weight' column
Displays the access latency in processor core cycles.
The 'Memory Access' column
Displays the type of memory access that occurred.
The 'Symbol' column
Displays the function name or symbol.
The 'Shared Object' column
Displays the name of the ELF image where the samples come from (the name [kernel.kallsyms] is used when the samples come from the kernel).
The 'Data Symbol' column
Displays the address of the memory location that row was targeting.
Important

Oftentimes, due to dynamic allocation of memory or stack memory being accessed, the 'Data Symbol' column will display a raw address.

The "Snoop" column
Displays bus transactions.
The 'TLB Access' column
Displays TLB memory accesses.
The 'Locked' column
Indicates if a function was or was not memory locked.

In default mode, the functions are sorted in descending order with those with the highest overhead displayed first.