How do I troubleshoot Minor Long GC Pauses ?
Environment
- Red Hat JBoss Enterprise Application Platform (EAP) 6.1.1
- VMware
- Rhel Virtual box
Issue
2015-05-18T02:30:34.235+1000: 317248.797: [GC2015-05-18T02:30:34.236+1000: 317248.797: [ParNew: 224625K->2047K(306688K), 7.7793540 secs] 2975927K->2896668K(4160256K), 7.7803710 secs] [Times: user=23.07 sys=0.01, real=7.78 secs]
2015-05-18T02:30:42.677+1000: 317257.238: [GC2015-05-18T02:30:42.678+1000: 317257.239: [ParNew: 254844K->2224K(306688K), 9.7654650 secs] 3149464K->3074110K(4160256K), 9.7663640 secs] [Times: user=28.32 sys=0.18, real=9.76 secs]
..........
2015-05-15T04:00:37.728+1000: 63452.289: [GC2015-05-15T04:00:37.728+1000: 63452.289: [ParNew: 272498K->31856K(306688K), 7.8617690 secs] 3553329K->3479932K(4160256K), 7.8626360 secs] [Times: user=21.13 sys=0.08, real=7.87 secs]
2015-05-15T04:00:45.614+1000: 63460.175: [GC [1 CMS-initial-mark: 3448076K(3853568K)] 3541769K(4160256K), 0.1105370 secs] [Times: user=0.02 sys=0.00, real=0.11 secs]
2015-05-15T04:00:45.726+1000: 63460.287: [CMS-concurrent-mark-start]
2015-05-15T04:00:45.991+1000: 63460.553: [GC2015-05-15T04:00:45.992+1000: 63460.553: [ParNew (promotion failed): 304496K->306688K(306688K), 0.6682420 secs]2015-05-15T04:00:46.660+1000: 63461.221: [CMS2015-05-15T04:00:50.761+1000: 63465.322: [CMS-concurrent-mark: 4.347/5.035 secs] [Times: user=1.17 sys=0.09, real=5.04 secs]
(concurrent mode failure): 3494551K->619035K(3853568K), 14.1227260 secs] 3752572K->619035K(4160256K), [CMS Perm : 213966K->213923K(356576K)], 14.7918010 secs] [Times: user=2.49 sys=0.17, real=14.79 secs]
2015-05-15T04:01:02.063+1000: 63476.624: [GC2015-05-15T04:01:02.063+1000: 63476.624: [ParNew: 272640K->34048K(306688K), 0.0398180 secs] 891675K->716467K(4160256K), 0.0406330 secs] [Times: user=0.14 sys=0.00, real=0.04 secs]
Resolution
-
Increase amount of memory available to VMWare VM
-
Consider using pre-allocated HugePages for Java memory allocations, as HugePages cannot be swapped
This requires memory to be pre-allocated and the Linux system rebooted to ensure HugePage memory is allocated as one contiguous block:
How do I enable large page support on Linux?
We recommend leaving at least 1 GiB overhead for the operating system to function. eg: If you're giving 4 GiB to the JVM, then the system needs at least 5 GiB memory. Depending on workload, more may be required.
Then add "-XX:+UseLargePages" to the EAP JVM options. This is done in "bin/standalone.conf" if in standalone mode, or in "domain.xml/host.xml" if domain mode.
- Install "tuned" and run the "virtual-guest" performance profile.
Diagnostic Steps
JVM Configuration Options
Check for JVM options
-server
-Xms4096m
-Xmx4096m
-XX:MaxPermSize=512m
-XX:+UseConcMarkSweepGC
-XX:MaxGCPauseMillis=50
-XX:GCTimeRatio=10
-XX:+CMSClassUnloadingEnabled
-XX:+UseTLAB
-XX:TLABSize=256K
-XX:-UseGCOverheadLimit
-XX:-TraceClassUnloading
-XX:+HeapDumpOnOutOfMemoryError
-verbose:gc
-Xloggc:/app/jboss6/gc.log
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-Djava.net.preferIPv4Stack=true
-Dorg.jboss.resolver.warning=true
-Dsun.rmi.dgc.client.gcInterval=1200000
-Dsun.rmi.dgc.server.gcInterval=1200000
-Dsun.lang.ClassLoader.allowArraySyntax=true"
-Djboss.modules.system.pkgs=$JBOSS_MODULES_SYSTEM_PKGS
-Djava.awt.headless=true
-Duser.timezone=Australia/Melbourne
-Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false
-Djboss.server.default.config=standalone-full-ha.xml
- "-XX:CMSInitiatingOccupancyFraction" is not specified ( where this is actually used to determine when to start CMS collections , which defaults to 92% if not specified )
- -XX:MaxGCPauseMillis=50 is quite low . Setting its value too small will cause the system to spend an excessive amount of time doing garbage collection as well .
Test below options- CMSInitiatingOccupancyFraction (option to a value lower than the 92% default value. This is the tenured generation threshold that triggers a concurrent collection)
- UseCMSInitiatingOccupancyOnly
eg.
-XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=85
So when CMS collector run is based on the promotion rate ,this will help calculate an estimate of how long it will take to fill the Old generation , ie the objects promoted from the young generation.
Setting a lower value would mean more work and low throughput ,and setting it to higher value would mean (setting late) old gen might be exhausted.
Recommend ruling any issues around JVM configurations. If the issue still persist investigate Kernel side of things.
Kernel
-
Is this hosted onto VM box
-
Gathering system baseline resource usage for IO performance issues
Configure it to collect every 5 seconds, and archive every hour. -
Install "tuned" daemon and setting the "virtual-guest" profile, this will set the CPU scheduler to something better suited for running as a virtual machine.
How do I use 'tuned' to apply tuning profiles to my machines?
Things to check
- At the slow time, capture the top CPU-using processs is Java.
- Check if CPU core is pinned on iowait.
- Is I/O being done to the swap LVM Logical Volume.
- Check if Java is causing the system to swap fairly heavily.
- Collect sosreport , check memory usage
For example below Memory usage in the "sosreport" (below usecase )indicates almost all memory is allocated to Java:
$ xsos -m .
MEMORY
Stats graphed as percent of MemTotal:
MemUsed ▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊.. 96.8%
Buffers .................................................. 0.1%
Cached ▊................................................. 2.8%
HugePages .................................................. 0.0%
RAM:
4.7 GiB total ram
4.6 GiB (97%) used
4.4 GiB (94%) used excluding Buffers/Cached
HugePages:
No ram pre-allocated to HugePages
$ xsos -p .
PS CHECK
Top users of CPU & MEM:
USER %CPU %MEM RSS
jboss 5.5% 86.7% 4.10 GiB
root 4.2% 3.8% 0.26 GiB
Top MEM-using processes:
USER PID %CPU %MEM VSZ-MiB RSS-MiB TTY STAT START TIME COMMAND
jboss 22643 5.5 86.7 8512 4200 ? Sl May18 61:52 /app/j2sdk/bin/java -D[Standalone] -XX:+UseCompressedOops
root 16836 0.1 1.8 2380 91 ? Sl Apr13 71:31 /app/j2sdk/bin/java -Djava.endorsed.dirs=/app/rhq-agent/lib/endorsed -Djava.library.path=/app/rhq-agent/li
and evidently that isn't enough.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
