Java application periodic high latency / processing times due to NUMA page reclaim on RHEL
Issue
- JBoss server periodically consuming high CPU and experiencing pauses.
- Periodic (1 out of 100) garbage collections take an excessive amount of system time.
- Java based web application experiences periodic (approximately 5 times out of 100) slow application response times
- Application response is < 100ms 95% of the time; the other 5%, response may take up to 100 seconds.
- Unresponsiveness is seen across several processes (JBoss, Oracle, etc), and slowness appears to be system-wide.
- Periodically, processes such as 'uname', 'grep', and 'perl', take an exceptional amount of time to execute, and all seem to be using an exceptional amount of system time.
- Oracle responds to Jboss calls in less than 1s 90% of the time, but a few times Oracle takes 30-40s, and may exceed the 60s query timeout resulting in Oracle error ORA-01013.
Environment
- Red Hat Enterprise Linux 5.4
- kernel 2.6.18-164.11.1.el5.x86_64
- CPU / memory
- 24 CPUs total, 6 cores
- 16 GB ram, 8 GB swap
- 2 Node NUMA system, with 8GB RAM on each NUMA node
- Jboss (running in its own JVM), jbossas, jboss-messaging
- Jboss interfaces with Oracle via local TCP (port 1521)
- Web application (running in its own JVM)
- JSF Based Web application (TCP / HTTP 1.1) using RichFaces & a4j components.
- Oracle Version: 11gR1 11.1.0.7
- Running with AMM, which forbids the use of HugePages
- Veritas VCS, VxVM, VxDMP
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.