Memory allocation latency in RHEL
Environment
- Red Hat Enterprise Linux
- ProLiant BL660c Gen8 (not hardware dependent)
- Likely more pronounced symptoms on numa systems
Issue
- Allocations usually occur quickly in under 1 second. Sporadically the length of time required to successfully allocate large chunks of memory will jump more than 90 times as great.
- Allocation's are usually met in under 0.8 seconds. During the issue they spike to as high as 90 seconds.
- During the issue there are no symptoms of explicit memory pressure (no swapping/highly efficient reclaim).
- This latency usually occurs when large amount of memory is shown as
free
in/proc/meminfo
(+30GB)
Resolution
-
Disable Transparent Huge Pages
-
In runtime:
# echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
# echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag
- Persistently, add
transparent_hugepage=never
to the kernel command line in/boot/grub/grub.conf
. - Additional info on Transparent Hugepages and configurations
Root Cause
As memory utilization rises there is an increased chance of fragmentation (available memory is not physically contiguous). On a numa system this is more pronounced as free memory is divided between each node and memory management is complicated. With the default Transparent Hugepage settings enabled the kernel will attempt to meet allocations with THP. For this purpose we need contiguous 2MB segments of memory. When contiguous segments are not available the kernel will attempt to de-fragment each node in order to allocate out THP. This introduces overhead which greatly increases allocation times once memory is sufficiently fragmented.
Diagnostic Steps
In this case a test program which consumed 1GB chunks of memory repeatedly was used to reproduce the systems for diagnostic purposes. During the reproducer run numastat -m
and numastat -n
were collected on 10 second intervals.
- In this instance each of four numa nodes has a total of 130GB of memory.
- We can see the following trend - after some time running memory allocations jump suddenly from <1 second to 18 then 90 seconds.
Allocated 1GB in 0.273802 seconds - 1 of 150GB completed @ Tue Aug 30 16:22:15 2016
Allocated 1GB in 0.269167 seconds - 2 of 150GB completed @ Tue Aug 30 16:22:15 2016
Allocated 1GB in 0.296457 seconds - 3 of 150GB completed @ Tue Aug 30 16:22:16 2016
Allocated 1GB in 0.295495 seconds - 4 of 150GB completed @ Tue Aug 30 16:22:16 2016
.....
Allocated 1GB in 0.532781 seconds - 97 of 150GB completed @ Tue Aug 30 16:23:00 2016
Allocated 1GB in 0.594629 seconds - 98 of 150GB completed @ Tue Aug 30 16:23:01 2016
Allocated 1GB in 0.624580 seconds - 99 of 150GB completed @ Tue Aug 30 16:23:01 2016
Allocated 1GB in 18.709271 seconds - 100 of 150GB completed @ Tue Aug 30 16:23:20 2016
Allocated 1GB in 91.496636 seconds - 101 of 150GB completed @ Tue Aug 30 16:24:52 2016
- Total free memory is 30GB when the slowdown occurs. The lowest numa node shows 4.5GB free
Node 0 Node 1 Node 2 Node 3 Total
MemFree 51398.35 5330.07 27345.72 49570.01 133644.15 @ 16:22:10
MemFree 51391.96 5304.66 8750.24 49116.57 114563.43
MemFree 51386.55 5306.64 8671.13 28538.54 93902.86
MemFree 48630.79 5299.53 8589.34 10270.43 72790.09
MemFree 28437.08 5303.61 8510.51 10271.92 52523.11 @ 16:22:51 <-- 1< sec
MemFree 8401.12 5303.48 8430.48 10267.62 32402.70 @ 16:23:01
MemFree 8141.76 4617.78 8076.02 10261.34 31096.89
MemFree 8122.81 4556.40 8049.53 10268.28 30997.03 @ 16:23:21 <-- 18 sec
MemFree 8084.59 4561.52 7992.60 10272.86 30911.57
MemFree 8074.67 4534.86 7960.04 10272.52 30842.08
MemFree 8051.23 4534.80 7954.95 10272.55 30813.52
MemFree 7772.16 4527.84 7953.45 10265.32 30518.77
MemFree 7774.00 4517.62 7900.20 10270.86 30462.68
MemFree 7773.64 4462.41 7853.45 10269.21 30358.71
MemFree 7636.59 4342.98 7801.96 10266.70 30048.23
MemFree 7560.95 4281.33 7774.72 10266.32 29883.32
MemFree 7502.08 4308.79 7792.46 10267.94 29871.27 @ 16:24:52 <-- 91 sec
- We note the test program allocations are met with Transparent Hugepages (large increase in THP during run). The rate of THP creation slows significantly from 1.9736GB/s to 0.0124GB/s during the issue.
nr_anon_transparent_hugepages 47385 Tue Aug 30 16:22:10 <-- Start
nr_anon_transparent_hugepages 57016
nr_anon_transparent_hugepages 67167
nr_anon_transparent_hugepages 77772
nr_anon_transparent_hugepages 87805 Tue Aug 30 16:22:51 <-- 78.9453 GiB increase in THP
nr_anon_transparent_hugepages 97825
nr_anon_transparent_hugepages 98440
nr_anon_transparent_hugepages 98489 Tue Aug 30 16:23:21 <-- 1.2968 GiB increase in THP
nr_anon_transparent_hugepages 98543
nr_anon_transparent_hugepages 98577
nr_anon_transparent_hugepages 98592
nr_anon_transparent_hugepages 98730
nr_anon_transparent_hugepages 98772
nr_anon_transparent_hugepages 98806
nr_anon_transparent_hugepages 98932
nr_anon_transparent_hugepages 99017
nr_anon_transparent_hugepages 99061 Tue Aug 30 16:24:52 <-- 1.1171 GiB increase in THP
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments