Red Hat Training

A Red Hat training course is available for RHEL 8

Chapter 23. Detecting false sharing

False sharing occurs when a processor core on a Symmetric Multi Processing (SMP) system modifies data items on the same cache line that is in use by other processors to access other data items that are not being shared between the processors.

This initial modification requires that the other processors using the cache line invalidate their copy and request an updated one despite the processors not needing, or even necessarily having access to, an updated version of the modified data item.

You can use the perf c2c command to detect false sharing.

23.1. The purpose of perf c2c

The c2c subcommand of the perf tool enables Shared Data Cache-to-Cache (C2C) analysis. You can use the perf c2c command to inspect cache-line contention to detect both true and false sharing.

Cache-line contention occurs when a processor core on a Symmetric Multi Processing (SMP) system modifies data items on the same cache line that is in use by other processors. All other processors using this cache-line must then invalidate their copy and request an updated one. This can lead to degraded performance.

The perf c2c command provides the following information:

  • Cache lines where contention has been detected
  • Processes reading and writing the data
  • Instructions causing the contention
  • The Non-Uniform Memory Access (NUMA) nodes involved in the contention

23.2. Detecting cache-line contention with perf c2c

Use the perf c2c command to detect cache-line contention in a system.

The perf c2c command supports the same options as perf record as well as some options exclusive to the c2c subcommand. The recorded data is stored in a perf.data file in the current directory for later analysis.

Prerequisites

  • The perf user space tool is installed. For more information, see installing perf.

Procedure

  • Use perf c2c to detect cache-line contention:

    # perf c2c record -a sleep seconds

    This example samples and records cache-line contention data across all CPU’s for a period of seconds as dictated by the sleep command. You can replace the sleep command with any command you want to collect cache-line contention data over.

Additional resources

  • perf-c2c(1) man page

23.3. Visualizing a perf.data file recorded with perf c2c record

This procedure describes how to visualize the perf.data file, which is recorded using the perf c2c command.

Prerequisites

Procedure

  1. Open the perf.data file for further analysis:

    # perf c2c report --stdio

    This command visualizes the perf.data file into several graphs within the terminal:

    =================================================
               Trace Event Information
    =================================================
     Total records                     :     329219
     Locked Load/Store Operations      :      14654
     Load Operations                   :      69679
     Loads - uncacheable               :          0
     Loads - IO                        :          0
     Loads - Miss                      :       3972
     Loads - no mapping                :          0
     Load Fill Buffer Hit              :      11958
     Load L1D hit                      :      17235
     Load L2D hit                      :         21
     Load LLC hit                      :      14219
     Load Local HITM                   :       3402
     Load Remote HITM                  :      12757
     Load Remote HIT                   :       5295
     Load Local DRAM                   :        976
     Load Remote DRAM                  :       3246
     Load MESI State Exclusive         :       4222
     Load MESI State Shared            :          0
     Load LLC Misses                   :      22274
     LLC Misses to Local DRAM          :        4.4%
     LLC Misses to Remote DRAM         :       14.6%
     LLC Misses to Remote cache (HIT)  :       23.8%
     LLC Misses to Remote cache (HITM) :       57.3%
     Store Operations                  :     259539
     Store - uncacheable               :          0
     Store - no mapping                :         11
     Store L1D Hit                     :     256696
     Store L1D Miss                    :       2832
     No Page Map Rejects               :       2376
     Unable to parse data source       :          1
    
    =================================================
       Global Shared Cache Line Event Information
    =================================================
     Total Shared Cache Lines          :         55
     Load HITs on shared lines         :      55454
     Fill Buffer Hits on shared lines  :      10635
     L1D hits on shared lines          :      16415
     L2D hits on shared lines          :          0
     LLC hits on shared lines          :       8501
     Locked Access on shared lines     :      14351
     Store HITs on shared lines        :     109953
     Store L1D hits on shared lines    :     109449
     Total Merged records              :     126112
    
    =================================================
                     c2c details
    =================================================
     Events                            : cpu/mem-loads,ldlat=30/P
    	                                    : cpu/mem-stores/P
     Cachelines sort on                : Remote HITMs
     Cacheline data groupping          : offset,pid,iaddr
    
    =================================================
    	   Shared Data Cache Line Table
    =================================================
    #
    #                              Total      Rmt  ----- LLC Load Hitm -----  ---- Store Reference ----  --- Load Dram ----      LLC    Total  ----- Core Load Hit -----  -- LLC Load Hit --
    # Index           Cacheline  records     Hitm    Total      Lcl      Rmt    Total    L1Hit   L1Miss       Lcl       Rmt  Ld Miss    Loads       FB       L1       L2       Llc       Rmt
    # .....  ..................  .......  .......  .......  .......  .......  .......  .......  .......  ........  ........  .......  .......  .......  .......  .......  ........  ........
    #
          0            0x602180   149904   77.09%    12103     2269     9834   109504   109036      468       727      2657    13747    40400     5355    16154        0      2875       529
          1            0x602100    12128   22.20%     3951     1119     2832        0        0        0        65       200     3749    12128     5096      108        0      2056       652
          2  0xffff883ffb6a7e80      260    0.09%       15        3       12      161      161        0         1         1       15       99       25       50        0         6         1
          3  0xffffffff81aec000      157    0.07%        9        0        9        1        0        1         0         7       20      156       50       59        0        27         4
          4  0xffffffff81e3f540      179    0.06%        9        1        8      117       97       20         0        10       25       62       11        1        0        24         7
    
    =================================================
          Shared Cache Line Distribution Pareto
    =================================================
    #
    #        ----- HITM -----  -- Store Refs --        Data address                               ---------- cycles ----------       cpu                                     Shared
    #   Num      Rmt      Lcl   L1 Hit  L1 Miss              Offset      Pid        Code address  rmt hitm  lcl hitm      load       cnt               Symbol                Object                  Source:Line  Node{cpu list}
    # .....  .......  .......  .......  .......  ..................  .......  ..................  ........  ........  ........  ........  ...................  ....................  ...........................  ....
    #
      -------------------------------------------------------------
          0     9834     2269   109036      468            0x602180
      -------------------------------------------------------------
              65.51%   55.88%   75.20%    0.00%                 0x0    14604            0x400b4f     27161     26039     26017         9  [.] read_write_func  no_false_sharing.exe  false_sharing_example.c:144   0{0-1,4}  1{24-25,120}  2{48,54}  3{169}
    	   0.41%    0.35%    0.00%    0.00%                 0x0    14604            0x400b56     18088     12601     26671         9  [.] read_write_func  no_false_sharing.exe  false_sharing_example.c:145   0{0-1,4}  1{24-25,120}  2{48,54}  3{169}
    	   0.00%    0.00%   24.80%  100.00%                 0x0    14604            0x400b61         0         0         0         9  [.] read_write_func  no_false_sharing.exe  false_sharing_example.c:145   0{0-1,4}  1{24-25,120}  2{48,54}  3{169}
    	   7.50%    9.92%    0.00%    0.00%                0x20    14604            0x400ba7      2470      1729      1897         2  [.] read_write_func  no_false_sharing.exe  false_sharing_example.c:154   1{122}  2{144}
    	  17.61%   20.89%    0.00%    0.00%                0x28    14604            0x400bc1      2294      1575      1649         2  [.] read_write_func  no_false_sharing.exe  false_sharing_example.c:158   2{53}  3{170}
    	   8.97%   12.96%    0.00%    0.00%                0x30    14604            0x400bdb      2325      1897      1828         2  [.] read_write_func  no_false_sharing.exe  false_sharing_example.c:162   0{96}  3{171}
    
      -------------------------------------------------------------
          1     2832     1119        0        0            0x602100
      -------------------------------------------------------------
    	  29.13%   36.19%    0.00%    0.00%                0x20    14604            0x400bb3      1964      1230      1788         2  [.] read_write_func  no_false_sharing.exe  false_sharing_example.c:155   1{122}  2{144}
    	  43.68%   34.41%    0.00%    0.00%                0x28    14604            0x400bcd      2274      1566      1793         2  [.] read_write_func  no_false_sharing.exe  false_sharing_example.c:159   2{53}  3{170}
    	  27.19%   29.40%    0.00%    0.00%                0x30    14604            0x400be7      2045      1247      2011         2  [.] read_write_func  no_false_sharing.exe  false_sharing_example.c:163   0{96}  3{171}

23.4. Interpretation of perf c2c report output

This section describes how to interpret the output of the perf c2c report command.

The visualization displayed by running the perf c2c report --stdio command sorts the data into several tables:

Trace Events Information
This table provides a high level summary of all the load and store samples, which are collected by the perf c2c record command.
Global Shared Cache Line Event Information
This table provides statistics over the shared cache lines.
c2c Details
This table provides information about what events were sampled and how the perf c2c report data is organized within the visualization.
Shared Data Cache Line Table
This table provides a one line summary for the hottest cache lines where false sharing is detected and is sorted in descending order by the amount of remote Hitm detected per cache line by default.
Shared Cache Line Distribution Pareto

This tables provides a variety of information about each cache line experiencing contention:

  • The cache lines are numbered in the NUM column, starting at 0.
  • The virtual address of each cache line is contained in the Data address Offset column and followed subsequently by the offset into the cache line where different accesses occurred.
  • The Pid column contains the process ID.
  • The Code Address column contains the instruction pointer code address.
  • The columns under the cycles label show average load latencies.
  • The cpu cnt column displays how many different CPUs samples came from (essentially, how many different CPUs were waiting for the data indexed at that given location).
  • The Symbol column displays the function name or symbol.
  • The Shared Object column displays the name of the ELF image where the samples come from (the name [kernel.kallsyms] is used when the samples come from the kernel).
  • The Source:Line column displays the source file and line number.
  • The Node{cpu list} column displays which specific CPUs samples came from for each node.

23.5. Detecting false sharing with perf c2c

This procedure describes how to detect false sharing using the perf c2c command.

Prerequisites

Procedure

  1. Open the perf.data file for further analysis:

    # perf c2c report --stdio

    This opens the perf.data file in the terminal.

  2. In the "Trace Event Information" table, locate the row containing the values for LLC Misses to Remote Cache (HITM):

    The percentage in the value column of the LLC Misses to Remote Cache (HITM) row represents the percentage of LLC misses that were occurring across NUMA nodes in modified cache-lines and is a key indicator false sharing has occurred.

    =================================================
                Trace Event Information
    =================================================
      Total records                     :     329219
      Locked Load/Store Operations      :      14654
      Load Operations                   :      69679
      Loads - uncacheable               :          0
      Loads - IO                        :          0
      Loads - Miss                      :       3972
      Loads - no mapping                :          0
      Load Fill Buffer Hit              :      11958
      Load L1D hit                      :      17235
      Load L2D hit                      :         21
      Load LLC hit                      :      14219
      Load Local HITM                   :       3402
      Load Remote HITM                  :      12757
      Load Remote HIT                   :       5295
      Load Local DRAM                   :        976
      Load Remote DRAM                  :       3246
      Load MESI State Exclusive         :       4222
      Load MESI State Shared            :          0
      Load LLC Misses                   :      22274
      LLC Misses to Local DRAM          :        4.4%
      LLC Misses to Remote DRAM         :       14.6%
      LLC Misses to Remote cache (HIT)  :       23.8%
      LLC Misses to Remote cache (HITM) : 57.3%
      Store Operations                  :     259539
      Store - uncacheable               :          0
      Store - no mapping                :         11
      Store L1D Hit                     :     256696
      Store L1D Miss                    :       2832
      No Page Map Rejects               :       2376
      Unable to parse data source       :          1
  3. Inspect the Rmt column of the LLC Load Hitm field of the the Shared Data Cache Line Table:

      =================================================
                 Shared Data Cache Line Table
      =================================================
      #
      #                              Total      Rmt  ----- LLC Load Hitm -----  ---- Store Reference ----  --- Load Dram ----      LLC    Total  ----- Core Load Hit -----  -- LLC Load Hit --
      # Index           Cacheline  records     Hitm    Total      Lcl      Rmt    Total    L1Hit   L1Miss       Lcl       Rmt  Ld Miss    Loads       FB       L1       L2       Llc       Rmt
      # .....  ..................  .......  .......  .......  .......  .......  .......  .......  .......  ........  ........  .......  .......  .......  .......  .......  ........  ........
      #
            0            0x602180   149904   77.09%    12103     2269     9834   109504   109036      468       727      2657    13747    40400     5355    16154        0      2875       529
            1            0x602100    12128   22.20%     3951     1119     2832        0        0        0        65       200     3749    12128     5096      108        0      2056       652
            2  0xffff883ffb6a7e80      260    0.09%       15        3       12      161      161        0         1         1       15       99       25       50        0         6         1
            3  0xffffffff81aec000      157    0.07%        9        0        9        1        0        1         0         7       20      156       50       59        0        27         4
            4  0xffffffff81e3f540      179    0.06%        9        1        8      117       97       20         0        10       25       62       11        1        0        24         7

    This table is sorted in descending order by the amount of remote Hitm detected per cache line. A high number in the Rmt column of the LLC Load Hitm section indicates false sharing and requires further inspection of the cache line on which it occurred to debug the false sharing activity.