`oom-killer` problems on RHEL 5.5 32bit (w/ PAE kernel)

Latest response

First, I know that RHEL 5.5 is kinda ancient. We're running an application that the vendor requires 32-bit RHEL 5 for. The app's recommended memory minimums are 16GB (Oracle with a Java app-stack). The deployed system has 48GB of RAM installed.

 

Unfortunately, every few hours, the system *thinks* that it's running out of memory. This triggers oom-killer, which, in turn, kills Oracle, kills the app-stack, kills monitoring tools, kills our AD authentication connector ...pretty much kills everything.

 

I've set up sadc to collect every 2 minutes (previously was set to 10 minute probes). At the time that oom-killer gets triggered, sadc is reporting that memory utilization is only at about 10-20% and that swap is pretty much unused. CPU, during these evens, spikes at about a 4 load-average (which, on a dual quad-core system, is nothing). Only real indication of something odd going on is that the sadc probe at the time oom-killer starts going nuts is the system's context-switching goes up by more than 10x.

 

This cswitch-spike is the only point of consistency. There's no time or date consistency to the events. There's nothing in the system or audit logs indicating some process has gone rogue and triggering oom-killer from memory starvation. Hell, even when oom-killer kicks off and sites "low memory", it's claiming low memory but indicating that MAX MEM/SWAP and FREE MEM/SWAP are just about equal.

 

Anyone seen this behavior before?

Responses

 

From: 

Red Hat dropped the "hugemem" (4/4G split model) kernel after RHEL4.  The 4/4GiB model drastically relieved the pressure on the amount of restrained LOWMEM (in the 3/1G model) required to support paging in 36-bit PAE (above 4GiB).

 

Unfortunately, as I understand it, upstream never supported it.  It also caused a major performance hit since the 4/4G model for kernel and userspace pages required many extra TLB flushes.  I know with "hugemem" the 32-bit JRE used to take up to a 30% performance hit (at least that was my experience in the FSI realm).  At the same time, we used to see all sorts of issues with the 36-bit PAE kernel beyond 8GiB until a number of patches were made late in RHEL4.  After that 16GiB was usable with the "smp" kernel.

 

Even with these patches in late RHEL4, and also in RHEL5, to the enduring 3/1G model, it's still not viable once you cross 16GiB.  It starts eating so much LOWMEM that many processes will just start to exhibit exactly what you describe.  I'm surprised you've had any success with a system beyond 32GiB.  If you need more than 16GiB, you need to be running x86-64 as of RHEL5.  Period.

 

IIRC, mapping the >4GiB pages for 36-bit PAE in the Linux/x86 kernel requires 16MiB per 1GiB.  That's 512MiB of kernel -- half of kernel's LOWMEM in the 3/1G model IIRC -- for 32GiB.  I could be rusty on this information (it's been almost 5 years).  Remember that paging, kernel, select memory mapped I/O, etc... must fit in this constrained area under the 36-bit PAE model.

 

x86-64 became well profilerated by 2007, so I don't blame Red Hat for dropping the 4/4G option as of RHEL5.  The x86-64 platform introduces a new "flat" 48-bit addressing and 52-bit PAE model (making it compatible with segmented, 48-bit pointers, 32-bit flat addressing and 36-bit PAE), without all of the legacy paging to LOWMEM.  Again, if you need more than 16GiB, you need to be using x86-64 for RHEL5.

 

If you must stay RHEL5 x86, then consider physically removing memory, or artificially limiting memory with the kernel boot parameters highmem=/mem=.  Try 32GiB, then drop to 16GiB if required.  I really don't recommend more than 16GiB, and Red Hat's own link above spells that out too.

 

Another option is to talk to your ISV.  They should at least support EL x86-64, even if you have to load a 32-bit JRE/JDK and other 32-bit libraries (as they cannot use libraries written for the native 52-bit PAE model in x86-64).  Red Hat ships a number of 32-bit libraries that are compatible with 32-bit and 36-bit PAE software in EL x86-64.  For those that Red Hat does not, your ISV should provide them.

You're honestly in uncharted territory with RHEL5+ on 32-bit beyond 16GiB (let alone 32GiB).  Come to think of it, I don't know of any x86 OS that supported more than 32GiB, other than Red Hat EL3/4 with the split 4/4G model.  Even Microsoft stopped at 32GiB (although their docs can be conflicting, some 2003R2 Datacenter state 32GiB while others state 64GiB, but others clarify that is only for x64) because of the sheer paging overhead below 4GiB (using a 29-bit/512MiB paging model IIRC).

Given your reply, what I may do is try those tunes (limiting to 32GB), first, then see if I can get the hardware team (host is several states away from me) to remove the memory down to 32GB. Dunno why they built to 48GiB: I'd requested 32GiB and 32GiB had worked in the other four environments without issue.

 

As to the ISV: this was software that was purchased prior to my tenure and handed to me in late 2009 to get working within our environment. It uses a packaged Oracle database. Unfortunately, that Oracle database does a kernel check to ensure that you're running in 32bit mode (hardware it's running on, like all of our x86 hardware, is actually 64bit Xenon multi-core processors). Worse, even though it's using an Oracle back-end, in the 2009 release version, use of external Oracle data stores was strictly prohibited.

 

At any rate, when I'd done our initial engineering exercises to ready it for deployment, I was stuck on a VM with 4GB of allocated memory. The installer's config-checker red-flagged that memory, but allowed me to proceed, noting that I *really* should have at least 16GiB of memory with 32 preferred). At 4GiB, it ran horribly slow, but it was enough to do documentation/certification against (it was a lab system, so, performance wasn't really a concern at the time).

 

It wasn't till we put it into production that I noticed the "use the PAE kernel" messages in the logs when I was trying to figure out why it was still performing horribly and only using 4GiB of the installed 32GiB. I'm a recentish convert from "big UNIX" systems to RedHat. I'd always sorta taken PAE support as a given as it's part and parcel to Solaris x86 (RIP) and even Windows 2003 Enterprise editions. At any rate, I'd grabbed the PAE RPM from RHN and it fixed the problem on the systems already in production. I made the note in the docs to use the PAE kernel and moved on to other projects.

 

Later, one of our remote sites wanted to do an implementation to tie into our global infrastructure. They (mostly) followed my docs ...with the exception of building on a 48GiB rather than 32GiB system. I'd not specifically tested to that large of memory (limited lab resources), so, didn't anticipate any problems - most systems I'd dealt with, to that point, didn't generally choke on "too much" memory (at worst, they generally just politely ignored it). Oh well, live and learn. :P

 

At any rate, we're generally a 64bit, RHEL 5.6 environment moving soon to 6.1 with new builds. Subsequent to all the various issues, the vendor has released a new version of the application that both better accommodates more modern hardware and allows one to leverage an external Oracle database. However, to move to it would first require another 6+ month engineering and approval cycle to get it certified to go in our farms and I'm still tied up with another messy product. So, for now, I'm kind of stuck making this beast work "as is".

 

Thanks for the pointers: I'll see if I can implment them either this week or next.

 

Any way, thanks for

Yeah, PAE is a PC thing.  It's a long story.  All i686 compatible processors support 36-bit PAE (64GiB, P3 and later actually support 36-bit PSE, but that's another story).  Unfortunately, you still have the constraints of LOWMEM, 32-bit flat (4GiB) memory allocation.

 

Depending on the kernel memory model, there are either kernel v. memory constraints, or paging performance hits as kernel gets its own 4G separate from 4G user pages.  This is because, again, x86 is still limited to a 32-bit flat memory allocation.  Red Hat removed the 4/4G model as of RHEL5, as I mentioned.

 

Most everyone today is running x86-64, which is 52-bit PAE (4EiB), 48-bit flat (256TiB), to be compatible with both 36-bit PAE (64GiB) and 48-bit segmented pointers normalized to 32-bit flat (4GiB).  Of course, pointers aren't compatible across memory models, hence the 32-bit v. 64-bit library incompatibilities.

 

Again, x86-64 kernels can run both types of programs and libraries.  The problem is when you have a legacy x86 program, but not the required, x86 libraries.  Red Hat does not bundle all x86 libraries in x86-64, for various reasons.  So it falls to your ISV to do such, so it works on both x86 and x86-64.  There can be other issues as well.

 

Don't know what to tell you other than leveraging your ISV and working with them.

You might be starved of low memory.

Look at /proc/meminfo or free -l under periods of heavy activity and see if you are running low on "low memory."

If so, you should adjust /proc/sys/vm/lower_zone_protection. 100-250 is reasonable.

To set this option on boot, add the following to /etc/sysctl.conf:
vm.lower_zone_protection = 100

Here is an article on Memory Management in the Linux 2.6 kernel.
http://www.ibm.com/developerworks/library/l-mem26/

Linus wrote a nice post on lower_zone_protection on LKML a while back, but I can't seem to find it.

BTW: to answer your question -- we saw this commonly enough in our environment that we set vm.lower_zone_protection to 100 out of the box (and have increased it higher under circumstances where there was additional memory pressure.)

Phil -- I didn't even know about this tuning.  Thanx for pointing it out.  All I know is that LOWMEM can be contrained in 36-bit PAE (x86).  I know with the PAE model, the more you are beyond 4GiB, the more LOWMEM it consumes.

 

The one quesiton I have is if you were running with large memory like the OP (48GiB), or were you at 16GiB (or lower)?

 

I have not looked at this since the RHEL4 days and when we benchmarked "hugemem" v. "smp" and looked at how much LOWMEM was being consumed with 8GiB, 16GiB and 32GiB systems.  In a nutshell, it always seemd that I should use "hugemem" kernel with anything over 8GiB as a rule of thumb, and absolutely beyond 16GiB (even after some of the PAE patches in 2.6.1x and later).

You bet. We have both appliance and VM installs that use a myriad of memory footprints that are EL4/5 based. We notice it primarily in large messaging clusters with more than 16GB of RAM and that are I/O constrained (running kernels 2.6.9 and 2.6.18).

Would it be better to use the sysctl approach or just declare uppermem in GRUB?

We've used the sysctl approach on a very large quantity of shipped appliances and internally hosted boxes without event.

Ok, in reading around about that sysctl setting, all it does is makes low-memory page-reclamation more aggressive (and reduces the likelihood of exhausting the lower memory regions)?

 

What I've gotten from the whole thread is that the PAE memory-mapping algorithms are chewing up a bunch of the low-memory pool, putting the system at greater and greater risk of an out of memory condition as memory size goes upward (32GB seems workable; 48GB seems to push it past that risk threshold). It feels like upping the page-reclamation rate potentially just pushes off potential problems (sorta like playing the sytem-optimization bottleneck game). It feels like the better solution is to either physically remove the extra 16GB of memory or fool the OS into thinking that memory's not there (by using GRUB options to constrain it).

 

Not trying to look a gift horse in the mouth, here. Just trying to fully understand the problem and the relative merits and risks of a given approach. In either case, if the proposed fixes make the problem go away, I'll simply try to have the offending/extra memory physically removed. I just have an academic curiosity on "best" methods, even if they ultimately are made moot by other changes.

This is one of those things that you really need to talk to your ISV about and/or test yourself.  If the application was bundled with EL, or commonly deployed, we may be able to provide more input.

 

But other than explaining why you're having issues generically, and offering several, plausible solutions to mitigate your risks, it's really up to you on which will work better for your ISV's product and your systems running it.

Finally got remote access to the system, a few minutes ago. With RHEL 5.5 (2.6.18-194.8.1.el5PAE) installed, neither the sysctl key nor the grub method are available. Has this been deprecated in favor of another setting or am I just stuck with having to remove the offending additional memory?

 

By available, I mean that sysctl tells me it's an invalid key and doesn't create the /proc/sys/vm.* entry and if I try to pass uppermem= via GRUB, it's still showing the full 48GiB of memory in /proc/sys/meminfo and other query methods.

Kernel tunables are always best in /etc/sysctl.conf, managed by enterprise CM.  Most can be tuned dynamically as well, unlike kernel boot paramters.

With the x86 architecture the first 16MB-896MB of physical memory is known as "low memory" (ZONE_NORMAL) which is permanently mapped into kernel space. Many kernel resources must live in the low memory zone. In fact, many kernel operations can only take place in this zone. This means that the low memory area is the most performance critical zone. For example, if you run many resources intensive applications/programs and/or use large physical memory, then "low memory" can become low since more kernel structures must be allocated in this area. Under heavy I/O workloads the kernel may become starved for LowMem even though there is an abundance of available HighMem.  As the kernel tries to keep as much data in cache as possible this can lead to oom-killers or complete system hangs.

 

In 64-bit systems all the memory is allocated in ZONE_NORMAL. So lowmem starvation will not affect  64-bit systems. Moving to 64-bit would be a permanent fix for lowmem starvation.

 

Source: https://access.redhat.com/kb/docs/DOC-52977

That's pretty much the long-term intention. Unfortunately, the particular release of the vendor's software, that we're on (which was released/bought around the time of the RHEL 4-5 transition, which probably accounts for the app's weird memory recommendations), requires 32bit hosts. Fortunately, the next version of the software is all 64bit (and, even if the app itself were 32bit-limited, the new version allows using an external DB server rather than being tied to their poorly pre-packaged 32bit Oracle install).

 

Needless to say, I'm quite looking forward to getting off this version of the software.

At this point I think the IHV/ISV has some answering to do on the SLA.

 

With 48GiB of RAM in a "static" EL5 32-bit appliance (or so it sounds, since you cannot modify it), they clearly did not sell you a supported configuration for EL5.

The original compatibility Matrix had things a bit muddy. They had columns for "OS", "CPU Architecture", "RAM" and "Disk Space". The RAM column had "12GB min" (and a note about needing PAE being required), but the OS column had RHEL 4, RHEL 5 and SLES 10 in it. Given that most of the vendor's reps I came into contact with didn't have a lot of non-Windows knowledge, it's entirely possible that their compatibility process was to make sure that the RHEL 4 packages continued to install on RHEL 5 and didn't actually go beyond that.

 

At any rate, our standard hardware order starts at 32GB (for HP G6-class servers and blades; considerably higher for G7s). The particular site the problematic system was installed in did their own bulk hardware buy and had ordered with 48GB standard load. The bulk order was in support of a virtualization effort and the site had less rack space (so they were purchasing systems to support higher consolidation ratios to make up for the lack of space). The system at the offending site has been brought back into what was standard for the other deployments.

 

Have you tried yanking out 32GiB yet, and dropping back to 16GiB?

 

Or is another solution working for you to keep from exhausting LOWMEM?

Traced the events back to something weird in the application's pre-packaged RMAN exports setup. Disabled the RMAN exports and the exhaustion problems went away. Going to try to push through getting the next release of the software approved for production. This year's version of the software (the release we were on EOL'ed earlier this year), is capable of running on 64bit RHEL 6.