Memory allocation latency in RHEL

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux
  • ProLiant BL660c Gen8 (not hardware dependent)
  • Likely more pronounced symptoms on numa systems

Issue

  • Allocations usually occur quickly in under 1 second. Sporadically the length of time required to successfully allocate large chunks of memory will jump more than 90 times as great.
  • Allocation's are usually met in under 0.8 seconds. During the issue they spike to as high as 90 seconds.
  • During the issue there are no symptoms of explicit memory pressure (no swapping/highly efficient reclaim).
  • This latency usually occurs when large amount of memory is shown as free in /proc/meminfo (+30GB)

Resolution

  • Disable Transparent Huge Pages

  • In runtime:

# echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
# echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag

Root Cause

As memory utilization rises there is an increased chance of fragmentation (available memory is not physically contiguous). On a numa system this is more pronounced as free memory is divided between each node and memory management is complicated. With the default Transparent Hugepage settings enabled the kernel will attempt to meet allocations with THP. For this purpose we need contiguous 2MB segments of memory. When contiguous segments are not available the kernel will attempt to de-fragment each node in order to allocate out THP. This introduces overhead which greatly increases allocation times once memory is sufficiently fragmented.

Diagnostic Steps

In this case a test program which consumed 1GB chunks of memory repeatedly was used to reproduce the systems for diagnostic purposes. During the reproducer run numastat -m and numastat -n were collected on 10 second intervals.
- In this instance each of four numa nodes has a total of 130GB of memory.
- We can see the following trend - after some time running memory allocations jump suddenly from <1 second to 18 then 90 seconds.

Allocated 1GB in 0.273802 seconds - 1 of 150GB completed @ Tue Aug 30 16:22:15 2016
Allocated 1GB in 0.269167 seconds - 2 of 150GB completed @ Tue Aug 30 16:22:15 2016
Allocated 1GB in 0.296457 seconds - 3 of 150GB completed @ Tue Aug 30 16:22:16 2016
Allocated 1GB in 0.295495 seconds - 4 of 150GB completed @ Tue Aug 30 16:22:16 2016
.....
Allocated 1GB in 0.532781 seconds - 97 of 150GB completed @ Tue Aug 30 16:23:00 2016
Allocated 1GB in 0.594629 seconds - 98 of 150GB completed @ Tue Aug 30 16:23:01 2016
Allocated 1GB in 0.624580 seconds - 99 of 150GB completed @ Tue Aug 30 16:23:01 2016
Allocated 1GB in 18.709271 seconds - 100 of 150GB completed @ Tue Aug 30 16:23:20 2016
Allocated 1GB in 91.496636 seconds - 101 of 150GB completed @ Tue Aug 30 16:24:52 2016
  • Total free memory is 30GB when the slowdown occurs. The lowest numa node shows 4.5GB free
                          Node 0          Node 1          Node 2          Node 3           Total
MemFree                 51398.35         5330.07        27345.72        49570.01       133644.15 @ 16:22:10
MemFree                 51391.96         5304.66         8750.24        49116.57       114563.43
MemFree                 51386.55         5306.64         8671.13        28538.54        93902.86
MemFree                 48630.79         5299.53         8589.34        10270.43        72790.09
MemFree                 28437.08         5303.61         8510.51        10271.92        52523.11 @ 16:22:51 <-- 1< sec 
MemFree                  8401.12         5303.48         8430.48        10267.62        32402.70 @ 16:23:01
MemFree                  8141.76         4617.78         8076.02        10261.34        31096.89
MemFree                  8122.81         4556.40         8049.53        10268.28        30997.03 @ 16:23:21 <-- 18 sec 
MemFree                  8084.59         4561.52         7992.60        10272.86        30911.57
MemFree                  8074.67         4534.86         7960.04        10272.52        30842.08
MemFree                  8051.23         4534.80         7954.95        10272.55        30813.52  
MemFree                  7772.16         4527.84         7953.45        10265.32        30518.77
MemFree                  7774.00         4517.62         7900.20        10270.86        30462.68
MemFree                  7773.64         4462.41         7853.45        10269.21        30358.71
MemFree                  7636.59         4342.98         7801.96        10266.70        30048.23
MemFree                  7560.95         4281.33         7774.72        10266.32        29883.32
MemFree                  7502.08         4308.79         7792.46        10267.94        29871.27 @ 16:24:52  <-- 91 sec 
  • We note the test program allocations are met with Transparent Hugepages (large increase in THP during run). The rate of THP creation slows significantly from 1.9736GB/s to 0.0124GB/s during the issue.
nr_anon_transparent_hugepages 47385 Tue Aug 30 16:22:10 <-- Start
nr_anon_transparent_hugepages 57016
nr_anon_transparent_hugepages 67167
nr_anon_transparent_hugepages 77772
nr_anon_transparent_hugepages 87805 Tue Aug 30 16:22:51 <--  78.9453 GiB increase in THP
nr_anon_transparent_hugepages 97825 
nr_anon_transparent_hugepages 98440
nr_anon_transparent_hugepages 98489 Tue Aug 30 16:23:21 <-- 1.2968 GiB increase in THP
nr_anon_transparent_hugepages 98543
nr_anon_transparent_hugepages 98577
nr_anon_transparent_hugepages 98592
nr_anon_transparent_hugepages 98730
nr_anon_transparent_hugepages 98772
nr_anon_transparent_hugepages 98806 
nr_anon_transparent_hugepages 98932
nr_anon_transparent_hugepages 99017
nr_anon_transparent_hugepages 99061 Tue Aug 30 16:24:52 <-- 1.1171 GiB increase in THP

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments