9.2. Large Page Memory

The default memory page size in most operating systems is 4 kilobytes (kb). For a 32-bit operating system the maximum amount of memory is 4 GB, which equates to 1,048,576 ((1024*1024*1024*4)/4096) memory pages. A 64-bit operating system can address 18 Exabytes of memory in theory which equates to a huge number of memory pages. The overhead of managing such a large number of memory pages is significant, regardless of the operating system. The largest heap size used for tests covered in this book was 20 GB, which equates to 5,242,880 memory pages, a five-fold increase over 4 GB.
Large memory pages are pages of memory which are significantly larger than 4 kb, usually 2 Mb. In some instances it's configurable, from 2MB to 256MB. For the systems used in the tests for this book, the page size is 2MB. With 2MB pages, the number of pages for a 20GB heap memory drops from 5,242,880 to 10,240! A 99.8% reduction in the number of memory pages means significantly less management overhead for the underlying operating system.
Large memory pages are locked in memory, and cannot be swapped to disk like regular memory pages which has both advantages and disadvantages. The advantage is that if the heap is using large page memory it can not be paged or swapped to disk so it's always readily available. For Linux the disadvantage is that for applications to use it they have to attach to it using the correct flag for the shmget() system call, also they need to have the proper security permissions for the memlock() system call. For any application that does not have the ability to use large page memory, the server will look and behave as if the large page memory does not exist, which could be a major problem. Care must be taken when configuring large page memory, depending on what else is running on your server besides the JVM.
To enable large page memory, add the following option to the command-line used to start the platform:
-XX:+UseLargePages
This option applies to OpenJDK, and the Oracle proprietary HotSpot-based JVM but there are similar options for IBM's and Oracle's JRockit JVMs. Refer to their documentation for further details. It's also necessary to change the following parameters:
kernel.shmmax = n
Where n is equal to the number of bytes of the maximum shared memory segment allowed on the system. You should set it at least to the size of the largest heap size you want to use for the JVM, or alternatively you can set it to the total amount of memory in the system.
vm.nr_hugepages = n
Where n is equal to the number of large pages. You will need to look up the large page size in /proc/meminfo.
vm.huge_tlb_shm_group = gid
Where gid is a shared group id for the users you want to have access to the large pages.
The next step is adding the following in /etc/security/limits.conf:
  <username>     soft     memlock     n
  <username>     hard     memlock     n
Where <username> is the runtime user of the JVM and n is the number of pages from vm.nr_hugepages * the page size in KB from /proc/meminfo.
Instead of setting n to a specific value, this can be set to unlimited, which reduces maintenance.
Enter the command sysctl -p and these settings will be persistent. To confirm that this had taken effect, check that in the statistics available via /proc/meminfo, HugePages_Total is greater than 0. If HugePages_Total is either zero (0) or less than the value you configured, there are one of two things that could be wrong:
  • the specified number of memory pages was greater than was available;
  • there were not enough contiguous memory pages available.
When large page memory is allocated by the operating system, it must be in contiguous space. While the operating system is running, memory pages get fragmented. If the request failed because of this it may be necessary to reboot, so that the allocation of memory pages occurs before applications are started.
With the release of Red Hat Enterprise Linux 6, a new operating system capability called transparent huge pages (huge pages are the same as large pages) is available. This feature gives the operating system the ability to combine standard memory pages and make them large pages dynamically and without configuration. It enables the option of using large page memory for any application, even if it's not directly supported. Consider using this option since it reduces the configuration effort at a relatively small performance cost. Consult the Red Hat Enterprise Linux 6 documentation for specific configuration instructions.
The graph below shows the performance difference of the standard workload used in the 32-bit vs. 64-bit JVM comparison section.
JVM Throughput - 32-bit versus 64-bit

Figure 9.3. JVM Throughput - 32-bit versus 64-bit

The peak line that was created with the 16GB heap is included to illustrate the difference. All heap sizes, even down to 4GB were substantially faster than the best without large page memory. The peak was actually the 18GB heap size run, which had 6.58% higher throughput than the 4GB result. This result was also 17.48% more throughput than the 16GB test run without large page memory. In these results it's evident that using large page memory is worthwhile, even for rather small heap sizes.
JVM Throughput - comparison of with and without large pages enabled

Figure 9.4. JVM Throughput - comparison of with and without large pages enabled

This graph compares two runs, with and without large page memory, using the EJB 3 OLTP application that has been referenced throughout this book. The results are similar to what we saw in the Java only workload. When executing this test, using the same 12GB heap size but without large page memory, the result was just under 3,900 transactions per second (3,899.02 to be exact). With large page memory enabled the result was over 4,100 transactions per second (4.143.25 to be exact). This represents a 6.26% throughput improvement which is not as large as the Java-only workload but this is to be expected, as this is a more complex workload, with a fairly large percentage of the time not even in the JVM itself. It's still significant however, because it equates to 244+ transactions per second more or 14,000+ extra transactions per minute. In real terms this means more work (processing) can be done in the same amount of time.