Warning message

Log in to add comments.

Examining Huge Pages or Transparent Huge Pages Performance

William Cohen published on 2014-03-26T20:35:16+00:00, last updated 2014-04-11T19:06:40+00:00

All modern processors use page-based mechanisms to translate the user-space processes virtual addresses into physical addresses for RAM. The pages are commonly 4KB in size, and the processor can hold a limited number of virtual-to-physical address mappings in the Translation Lookaside Buffers (TLB). The number of TLB entries ranges from tens to hundreds of mappings. This limits a processor to a few megabytes of memory it can address without changing the TLB entries. When a virtual-to-physical address mapping is not in the TLB, the processor must do an expensive computation to generate a new virtual-to-physical address mapping.

To increase the amount of memory the processor can address without performing the expensive TLB updates, many processors allow larger page sizes to be used. On x86_64 processors, huge pages are 2MB, which is 512 times larger than regular 4KB pages. In ideal situations, huge pages can decrease the overhead of the TLB updates (misses). However, huge-page use can increase memory pressure, add latency for minor pages faults, and add overhead when splitting huge pages or coalescing normal-sized pages into huge pages.

There are two mechanisms available for huge pages in Linux: the HugePages and Transparent Huge Pages (THP). Explicit configuration is required for the original HugePages mechanism. The newer THP mechanism will automatically use larger pages for dynamically allocated memory in Red Hat Enterprise Linux 6.

To determine whether the newer THP or the older HugePages mechanism are being used, look at the output of /proc/meminfo as below:

$ cat /proc/meminfo|grep Huge
AnonHugePages:   3049472 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

The AnonHugePages entry lists the number of pages that the newer THP mechanism currently has in use. For this machine, there are 309472kB: 1489 huge pages each 2048kB in size.

In this case, there are zero pages in the pool of the older HugePage mechanism, as shown by HugePages_Total of 0. The HugePages_Free shows how many pages are still available for allocation, which is going to be less than or equal to HugePages_Total. The number of HugePages in use can be computed as HugePages_Total-HugePagesFree. For more information about the configuration of HugePages, see Tuning and Optimizing Red Hat Enterprise Linux for Oracle 9i and 10g Databases.

Determining Whether Page-Fault Latency is Due to Use of Huge Pages

Huge-page use can reduce the number of TLB updates required to access large regions of memory and reduce the overall cost of TLB updates, but it increases costs and latency for other operations. When a user-space application is given a range of addresses for a memory allocation, the assignment of a physical page is deferred until the first time the page is accessed. To prevent information leakage from the previous user of the page, the kernel writes zeros in the entire page. For a 4096 byte page, this is a relatively short operation and will only take a couple of microseconds. The x86 huge pages are 2MB in size, 512 times larger than the normal page. Thus, the operation might take hundreds of microseconds and impact the operation of latency-sensitive code. Below is a simple SystemTap command line script to show which applications have huge pages zeroed out and how long those operations take. It will run until Ctl-C is pressed.

stap  -e 'global huge_clear probe kernel.function("clear_huge_page").return {
  huge_clear [execname(), pid()] <<< (gettimeofday_us() - @entry(gettimeofday_us()))}'

The script will output a list sorted from the executable name and process with the most huge-page clears to the least. The @count is the number of times that process encountered a huge-page clear operation. Following that information is time statistics displayed in microseconds of wall-clock time. The @min and the @max are the minimum and the maximum time respectively to clear out a page. The @sum is the total wall-clock time.

Originally posted at http://developerblog.redhat.com/2014/03/10/examining-huge-pages-or-transparent-huge-pages-performance/

English

About The Author

WC Red Hat Community Member 55 points

William Cohen

William Cohen has been a developer of performance tools at Red Hat for over a decade and has worked on a number of the performance tools in Red Hat Enterprise Linux and Fedora such as OProfile, PAPI, SystemTap, and Dyninst.